Embedded audio and video

I am going to make a plugin for Markdown-it , commonmark ref implementation is not ready for plugin/extensions , yet right?

1 Like

The reference implementation is only for CommonMark syntax I believe. But @jgm has not yet confirmed if an audio and video extension will be supported, or if the syntax we’ve discussed here will be used.

would be nice if @jgm can comment on this.

I am implementing one using markdown-it @chrisalley @jgm

I think it makes a lot of sense to use the


syntax not just for images but for embedded audio and video, and let the renderer create tags appropriate for the resource, based on the extension.

This might call for some changes in the spec, renaming the element from “image” to “media” or something more generic, and adding some words about the flexibility in rendering. I’m not sure. In any case, as an extension this idea is completely natural.


Thanks a lot @jgm.

Here is what i’ve done by monkey-patching markdown-it’s rule for image. Not proper plugin yet but working very well for me.

  var markdownit = window.markdownit() 
  var defaultRender = markdownit.renderer.rules.image
  markdownit.renderer.rules.image = function(tokens, idx, options, env, self) {
  var  vimeoRE = /^https?:\/\/(www\.)?vimeo.com\/(\d+)($|\/)/;
  var audioRE = /^.*\.(ogg|mp3)$/gi
  var videoRE = /^.*\.(mp4|webm)$/gi
  var token = tokens[idx]
  var  aIndex = token.attrIndex('src');
  console.log('aindex of idx' + idx)
  var matches_audio = audioRE.exec(token.attrs[aIndex][1])
  var matches_video = videoRE.exec(token.attrs[aIndex][1])
  if (vimeoRE.test(token.attrs[aIndex][1])) {

    var id = token.attrs[aIndex][1].match(vimeoRE)[2];

    return '<div class="embed-responsive embed-responsive-16by9">\n' +
      '  <iframe class="embed-responsive-item" src="//player.vimeo.com/video/' + id + '"></iframe>\n' +
  } else if (matches_audio !== null) {
    console.log('matches audio')        
    return ['<p><audio width="320" controls class="audioplayer"',
      '<source type="audio/' + matches_audio[1] + '" src=' + matches_audio[0] + '></source>',

  } else if (matches_video !== null) {
    console.log('matches video')

    return ['<p><video width="320" height="240" class="audioplayer" controls>',
      '<source type="video/' + matches_video[1] + '" src=' + matches_video[0] + '></source>',
  }else {
    console.log('matches img')
    return defaultRender(tokens, idx, options, env, self);

Yes that is very resonable. This will make markdown a lot richer.


@v3ss0n, are you going to shape your code as a plugin? We at diaspora need it to implement audio/video embedding. If not, I could probably make the plugin by myself basing on your code.


I will make a repo , i still learning how to do a proper markdown-it plugin.
We are also building something similar to diaspora but aimed for Group Conversation , instead of social network.

1 Like

I think it would be best if the core spec continues to describe the element as “image”. The HTML5 spec refers to the category of elements as embedded content. If the element is called “embedded content” or “media” in the core spec, people might be lead to believe that other embedded content is available as part of CommonMark-core. But this is intended as an extension, correct?

I was actually not thinking of it as an extension. More of a terminology change.

Currently the spec attempts to define parsing (transformation from source to AST) without being too detailed about exactly how each element should be rendered into HTML (or other formats). Loosening up the terminology would invite renderers to do something useful with movie and sound URLs in ![]() contexts, without requiring anything specific.

A further step would be to say explicitly that renderers must be sensitive to the URL or file extension (if there is one) and render the media in a way that makes sense. But I’m thinking this might be a bit much to require. And maybe the terminology change doesn’t make sense without this, I’m not sure.

1 Like

If file extensions are used to determine the media type, I think the mapping of file extensions to particular media types should be strongly specified.

Consider the following Markdown:


Generally, this would refer to the music video of “everyhome” because of the file extension .mp4. However, the .mp4 extension can also be used for audio - the file extension can be used for AAC encoded music in iTunes, for example (even though .m4a is the norm for audio). So the file could just contain the audio of “everyhome” or both audio and video. If the implementation developer chooses the render it using the HTML <audio> tag (and the file is indeed video), the video will fail to display. This is a problem for cross-compatibility between implementations. Enforcing the .mp4 = video rule via a whitelist of file extensions for each media type would resolve this.

@chrisalley, you raise a good point. I’m reluctant to enforce .mp4 = video, though, if it might just be audio. I suppose we could abuse the title field to resolve ambiguities?

![](chihiro-onitsuko-everyhome.mp4 "audio: real title here")

This would degrade fairly well. Renderers that support video and audio could strip off the “audio:” from the title.

Or maybe we should leave ![]() for pictures, and let people use raw HTML for audio and video. After all, these are not going to be supported in many output formats besides HTML.

1 Like

I can see why you’re not keen on enforcing .mp4 = video.

As a modification of my original proposal, the whitelist of file extensions could consist of only unambiguous file extensions. That would rule out .mp4 being used to represent either audio or video, but .m4a and .m4v would be valid extensions (representing <audio> and <video> respectively). Similarly, .ogg is ambiguous, but .oga and .ogv are not.

In cases where there is a .mp4 or .ogg file, the file could either be renamed or if this is not possible, the writer could fall back to using HTML.

.mp3, .wav and .flac are only used for audio as far as I am aware, and .webm is only used for video.

The use of English doesn’t seem very Markdownish.

Some kind of lightweight markup for audio and video is preferable. Video, especially, is becoming increasingly common on the web, but the HTML to markup a video is not easy to write (or remember).

True, but to do it correctly you may need the HTML tags, which allow you to specify width, height, controls, autoplay, poster, subtitles, looping, muting, and alternative sources.

From a writer’s perspective, I just want a simple and quick way to express the intent to “embed my video at this point in the article”.

Many of the extra attributes for videos are application features. I’m not sure that writers should (or would want to) control these settings. The autoplay setting, for example, could be set at an application level and apply to all videos used by the application, removing the need for writer-controlled configuration.

Images also have height and width attributes, but these aren’t part of Markdown. There are often cleaner ways to set these than explicitly declaring them, such as the application checking the file’s dimensions and updating the generated HTML automatically. I’m inclined to think the same approach should apply to most of the <video> tag’s attributes.


Another thing to consider here is the upcoming <picture> tag that will allow multiple sources for images to be specified. This is planned for HTML 5.1. What <video>, <audio>, and <picture> all have in common is the ability to specify multiple sources via the <source> tag.

<picture> builds upon <img>, providing further reason to use ![]() for this family of elements; a lightweight syntax for specifying the media type and multiple sources, building upon the ![]() syntax already established for images, would be intuitive.

Do you have any data on how common this is in the real world, though? I’ve never seen an mp4 file that was audio only…

I have no real world examples or data. It’s possible to use .mp4 for audio, but this appears to be an edge case. .mp4 for video and .ogg for audio are the norm in my experience. Perhaps this informal convention should be adopted for CommonMark if the alternative is not used in practice? Users can always fall back to HTML if for some reason they cannot (or will not) follow the convention.

@v3ss0n, any progress so far?

@v3ss0n, ok, I’ve taken your code and made the plugin. Thanks for the code.

1 Like