Embedded audio and video


Markdown includes syntax for embedding images, but not audio and video files. Embedded video in particular has become a lot more commonplace since John Gruber wrote the original Markdown spec. It would be good to have a standard, er common, way to embedded audio and video using Markdown.

I currently use the following syntax, transforming image tags based on particular file extensions:



<audio controls="controls">
  <source type="audio/mp3" src="filename.mp3"></source>
  <source type="audio/ogg" src="filename.ogg"></source>
  <p>Your browser does not support the audio element.</p>

And for video:



<video controls="controls">
  <source type="video/mp4" src="filename.mp4"></source>
  <source type="video/webm" src="filename.webm"></source>
  <p>Your browser does not support the video element.</p>

Markdig Markdown processor for .NET
Generic directives/plugins syntax
Transclusion or including sub-documents for reuse
Popular and needed extension

I think this would go under Generic directives/plugins syntax about general directives.

Personally think we should explicitly say if it’s a video or not. Makes parsing a heck of a lot more easier to do e.g.

!video[ title ]( url ){ size=10 }
!audio[ title ]( url ){ size=10 duration=10 cycle=forever }
!youtube[ title ]( url ){ size=10 cycle=forever }

This is since I think ![]() is by default considered to be an image.


Thanks, I’ll check out that other topic.

I’m not sure if ease of parsing should be the main aim here though. With Markdown, the goal was for readability to be emphasised above all else.


I see well that could work. What if the url don't say .mp3 but should be treated as audio? I guess we could do both


The solution I posted works because the system deals with very particular file formats and extensions. I can see it creating problems with more flexible systems.

Something like the solution you posted seems good. I would use sensible defaults for the options, rather than getting the user to explicitly define them.


Need to necro this topic , coz worth it .
As @chrisalley did. How about we agree on letting the implementation decide what to display, and just use common syntax for everything?

![](whatever it is.png/jpg/gif/mp4/webm/mp3/audio)

that way we do not need a different syntax.


I like the idea of



<audio controls="controls">
  <source type="audio/mp3" src="filename.mp3"></source>
  <source type="audio/ogg" src="filename.ogg"></source>
  <p>Your browser does not support the audio element.</p>

I would like to suggest that we use

![](filename.mp3 filename.ogg)

using white space to separate the files and having the first file to dictate the HTML element.
For example

![](filename.mp4 filename.mp3)

will create

<video controls="controls">
  <source type="video/mp4" src="filename.mp4"></source>
  <source type="video/mp3" src="filename.mp3"></source>
  <p>Your browser does not support the video element.</p>


I agree, hence ![foo](bar) should not be seen and described as “image syntax”, but as “embedded resource syntax”. Embedding is special kind of hyperlinking.

Back in the day, when <img> was new to HTML and terminals were more common than GUIs and bandwidth was scarce, images – most likely figures then – would often be shown on demand only and not embedded within the text, although the name of the src attribute already suggests that it should be treated differently than those with href. Anyhow, an embedded resource is a special kind of link, whether it be a static image, video, audio, text (<iframe>) or an interactive component (<embed>).

A third type of links are transclusions. This kind is not necessary to support in a front-end format such as HTML, but should be a feature of a back-end format like Commonmark. Otherwise one needs another tool layer, i.e. a preprocessor. A well-known example are {{templates}} in MediaWiki. Transclusions in CM/MD should use a syntax similar to embeds, probably just switch the exclamation mark for a different punctuation character, maybe require them to be in a line or block by themselves, e.g.:


This does not solve external code listings well, though, because you would have to put markup inside them since fences and indents won’t work:

<[Bubble Sort Class](bubble.c)

    <[Bubble Sort Class](bubble.c)


Agreed on a common syntax. There should be a sensible default rendering as well so that CommonMark documents are compatible across different websites and apps.

Explicitly listing the file types like this is probably the safest route. Otherwise the parser would have to know in advance (or check at the time of rendering) which files to include. If another file format is added in the future (say, a flac version), the document could either be updated manually (or perhaps programmatically if a flac version is added for every audio track in the larger file set).

I’m in agreement (as a CommonMark extension, not as part of the core spec).

If the extension is enabled, the ![]() syntax should render the content based on the specified file extension, e.g. ![](file.mp4) would render the HTML <video> tag. If the extension is not enabled, the syntax will attempt to render the HTML <img> tag regardless of the specified file extension.

For the extension to be viable we would need a white list of file extensions that would be used to render as the particular content types - image, audio, video, and perhaps other content types.


Good ideas ! for whitelist i suggest whitelsiting html5 compatible formats


I am going to make a plugin for Markdown-it , commonmark ref implementation is not ready for plugin/extensions , yet right?


The reference implementation is only for CommonMark syntax I believe. But @jgm has not yet confirmed if an audio and video extension will be supported, or if the syntax we’ve discussed here will be used.


would be nice if @jgm can comment on this.


I am implementing one using markdown-it @chrisalley @jgm


I think it makes a lot of sense to use the


syntax not just for images but for embedded audio and video, and let the renderer create tags appropriate for the resource, based on the extension.

This might call for some changes in the spec, renaming the element from “image” to “media” or something more generic, and adding some words about the flexibility in rendering. I’m not sure. In any case, as an extension this idea is completely natural.

Issues we SHOULD resolve before 1.0 release

Thanks a lot @jgm.

Here is what i’ve done by monkey-patching markdown-it’s rule for image. Not proper plugin yet but working very well for me.

  var markdownit = window.markdownit() 
  var defaultRender = markdownit.renderer.rules.image
  markdownit.renderer.rules.image = function(tokens, idx, options, env, self) {
  var  vimeoRE = /^https?:\/\/(www\.)?vimeo.com\/(\d+)($|\/)/;
  var audioRE = /^.*\.(ogg|mp3)$/gi
  var videoRE = /^.*\.(mp4|webm)$/gi
  var token = tokens[idx]
  var  aIndex = token.attrIndex('src');
  console.log('aindex of idx' + idx)
  var matches_audio = audioRE.exec(token.attrs[aIndex][1])
  var matches_video = videoRE.exec(token.attrs[aIndex][1])
  if (vimeoRE.test(token.attrs[aIndex][1])) {

    var id = token.attrs[aIndex][1].match(vimeoRE)[2];

    return '<div class="embed-responsive embed-responsive-16by9">\n' +
      '  <iframe class="embed-responsive-item" src="//player.vimeo.com/video/' + id + '"></iframe>\n' +
  } else if (matches_audio !== null) {
    console.log('matches audio')        
    return ['<p><audio width="320" controls class="audioplayer"',
      '<source type="audio/' + matches_audio[1] + '" src=' + matches_audio[0] + '></source>',

  } else if (matches_video !== null) {
    console.log('matches video')

    return ['<p><video width="320" height="240" class="audioplayer" controls>',
      '<source type="video/' + matches_video[1] + '" src=' + matches_video[0] + '></source>',
  }else {
    console.log('matches img')
    return defaultRender(tokens, idx, options, env, self);

Yes that is very resonable. This will make markdown a lot richer.


@v3ss0n, are you going to shape your code as a plugin? We at diaspora need it to implement audio/video embedding. If not, I could probably make the plugin by myself basing on your code.


I will make a repo , i still learning how to do a proper markdown-it plugin.
We are also building something similar to diaspora but aimed for Group Conversation , instead of social network.


I think it would be best if the core spec continues to describe the element as “image”. The HTML5 spec refers to the category of elements as embedded content. If the element is called “embedded content” or “media” in the core spec, people might be lead to believe that other embedded content is available as part of CommonMark-core. But this is intended as an extension, correct?


I was actually not thinking of it as an extension. More of a terminology change.

Currently the spec attempts to define parsing (transformation from source to AST) without being too detailed about exactly how each element should be rendered into HTML (or other formats). Loosening up the terminology would invite renderers to do something useful with movie and sound URLs in ![]() contexts, without requiring anything specific.

A further step would be to say explicitly that renderers must be sensitive to the URL or file extension (if there is one) and render the media in a way that makes sense. But I’m thinking this might be a bit much to require. And maybe the terminology change doesn’t make sense without this, I’m not sure.