Embedded audio and video

This feels like a hypothetical problem only. In practice, the use of .mp4 for audio is rare. Users can rename the audio file’s extension to the established convention of .m4a in these cases. In rare cases where this is not possible, HTML can still be used to embed the file. But this objection was already discussed at length earlier in the topic, so I’ll stop. If we introduce new syntax for audio and video we’re putting an additional burden on users to learn the new syntax, which is an actual problem as it increases complexity; I don’t think this edge case outweighs that.

Internally displayed content (such as images) that is embedded in the document is the opposite of external context that remains outside of the document (links). The exclamation mark before the link syntax []() means “not”, so ![]() represents content that is the opposite of externally linked content. With this is mind, I think it’s in keeping with the meaning expressed by Markdown’s syntax to extend the ![]() syntax to other kinds of embedded content besides images.

1 Like

GitLab Flavored Markdown (GFM) reuses the image syntax for video:

Image tags with a video extension are automatically converted to a video player. The valid video extensions are .mp4, .m4v, .mov, .webm, and .ogv.

1 Like

This feels like a hypothetical problem only. In practice, the use of .mp4 for audio is rare.

Rare is not enough. I don’t feel like the language should guess the user’s purpose and only describe what the user means.
Another problem is that, then implementations must determine which extension are videos, which ones are audios and keep everything constently up to date with browser supports. For instance, I’m confident that FLAC format will soon be supported by all major browsers. Same for MKV video format. It means each implementation has now to update code, not only when a new MD standard is released, but when the browsers media support change. This introduce coupling between the language itself and browsers support. It definetely feels like a hack to me.

GitLab Flavored Markdown (GFM) reuses the image syntax for video

I’m not sure this is relevant. I’m pretty confident this choice has been made to prevent non-recoverable divergences from future CommonMark standard.

It does create some extra complexity on the developer’s side, a price that has to be paid for writers having less syntax to learn. I think this trade off is worth it, since the coupling is hidden from the end user. We have coupling with HTML block elements as well; these are whitelisted in the spec, saving the writer from having to manually specify that an element is a block element with additional markup. We’ll run into a similar issue if future versions of HTML introduce new block elements, but it’s nothing that can’t be resolved with an add only, never remove strategy.

The syntax I am proposing here follows the principle of least surprise. The widely established convention is to use .mp4 for video, with .m4a being used for audio. So it’s likely to be unsurprising to a user with an existing knowledge of this convention for the proposed markup to render as video and audio respectively.

I just mentioned this because CommonMark aims to be highly compatible, so it’s inclusion in another flavour acts as a contributing reason to use the proposed syntax.

1 Like

This is not relevant. In most cases, the difficulty of a new syntax is leveraged by an editor. Most of the users that don’t have the capacity to learn the new syntax (mostly the non-techs) will use buttons. The others won’t have any problem in learning a new syntax. Especially if it’s carefully chosen. Syntax in a false problem.

On the other hand, guessing the embedding based on extension is one. There may be valid reasons not to have a supported extension or not to have at all. You cannot always control how someone will format a URL. If the URL is something stupid like /kYdDFK5x4xLwL with no extension or even with .php extension, has it has been mentionned earlier in the discussion, how does it work?

I think we should follow the keep it simple principle. Not add any useless complexity. Not try to guess what the user mean. This is not how, for instance, LaTeX works. In LaTeX, you write what you mean. Not what you think the compiler should understand of what you mean. One syntax should do only one thing, and do it properly. I understand your point of it working in most cases. My problem is the “most cases”. Good enough is not the right way.

Now, nothing prevents, for compatibility reasons, the usage of the ![]() syntax also to embed audios and videos. But this should be optionnal and there should be a dedicated syntax for audios and videos.


Assistance from an editor may not be present. This is common, even amongst dedicated Markdown editors, many which which have a minimalist interface with no buttons. A plain text editor may be used that does not know about Markdown’s syntax. This reminds me a bit of the old argument about auto-generated code in programming languages being okay because “the IDE will take care of that” - it’s all well and good until you stop using an IDE.

Technical users do have the capacity to learn a new syntax, but they may not want to look up a new syntax, especially if they only use audio and video infrequently. The proposed syntax is a natural extension of the image syntax so it can be guessed fairly easily.

It would require some kind of override to explicitly override the default media type.

To use an analogy, I would compare this an operating system where by default the file extension .txt opens the file in Notepad, but a file without an extension can be optionally opened in Notepad as well - it just requires a configuration on the user’s part (e.g. right click, open in Notepad).

I think an override would function as a dedicated syntax, as it’s a way of explicitly telling the parser the intended media type. For example:

![](video.mp4){type: video/mp4}
![](audio.mp4){type: audio/mp4}

The main objection to this was the use of English words, which is un-Markdown like. So what I am proposing here is an override, but not necessarily this override. As stated before, ![]() represents embedded content, so it would make sense to keep the opposite of the link syntax as the basis for all embedded content.

Of course, if new symbols are used instead of English words for the override syntax, it may still be just as incomprehensible to non-English speakers (without referencing a manual). So there’s some reason to use English words for advanced properties/consistent attribute syntax when we consider the total amount of overrides and embedded media types that might need to be supported, because it means that at least English speakers will understand their meaning.

1 Like

So you’re not ok with adding a new syntax unless for covering the corner cases. You’re right: ![](audio.mp4){type: audio/mp4} is not markdownish at all. CommonMark should avoid by all means introducing that kind of exception in the language because, has I’ve stated earlier, “audio” or “type” are word that make sense in french and english but that’s pretty everything. And I’m totally convinced that ![](audio.mp4){type: audio/mp4} will not be easier to remember for a russian that $[](). Especially if you explain that $[]() was chosen because $ recalls a violin, hence music, hence audio.

![](){} is an ok syntax for extensions, because it leaves the possibility to the user to choose what extension to support and extension’s parameters may even be translated.

But english words should definitely not pass into the standard.

Now, that this requirement is established, you still have to cover the corner cases. Because ![](https://example.org/media.php) is a valid usage of the syntax and must not be excluded from the standard.

And the only way to cover the corner cases is a new syntax.

To use an analogy, I would compare this an operating system where by default the file extension .txt opens the file in Notepad, but a file without an extension can be optionally opened in Notepad as well - it just requires a configuration on the user’s part (e.g. right click, open in Notepad).

Your example is invalid. Windows is the only OS that uses extension to open a file. Unix-like OSes do not. And this way of doing, of course causes problems in Windows. Exploits relies on this flaw of the system. This is definitely not the way to go.

My position is that same base syntax should be used for embedded content (images, audio, video, embedded CSV tables, etc), since embedded content is the opposite of linked content.

1 Like

I understand your position, but I pointing that it has unrecoverable flaws and should not be priviledged or at least not alone. Now I don’t know about the usage of embedding CSV tables, but I know that embedding audios and videos is a very important one in social networks. If CommonMark takes this path, Socialhome will be forced to diverge from CommonMark. If CommonMark aims to be a standard unifying every Markdown syntax, you have to consider every use case of the syntax. ![]() is not the way to go in embedding audios and videos in social networks because a lot of users may post audios and videos on their personnal websites or blog under a .php extension and be willing to embed in their SN posts.

Our solutions are not totally incompatible. I’m not against using ![]() to embed medias base on extension. Actually, I don’t care as long as embedding audios and videos also has its own syntax.

1 Like

We should be able to resolve the different extension case with a decent override that extends the existing ruleset, but doesn’t use English words. Perhaps ![](something.mp3) as a default, with ![](something.php){$} for an override (as a quick thought). The idea is to build general syntax rules that we can build upon, in this case embedded content syntax ![]() mixed with consistent attribute syntax {}, the latter of which is also under heavy discussion.

It looks like this discussion is stalled on corner cases. However, it’s worth noting that it would be possible to move forward with the image syntax in a manner that causes no backwards compatibility problems:

  • For common video extensions (including mildly ambiguous ones like MP4), interpret image syntax as <video> embed
  • For common audio extensions, interpret image syntax as <audio> embed
  • In all other cases, continue to interpret image syntax as image.

For the content of the <video>/<audio> element, I would suggest:

  • Recommend that implementers render a standard “cannot be played” warning with a download link, as in the examples here.
  • Recommend that the contents of square brackets ![desc](url) be shown in addition to this text, as a content description.

It’s important to note that the fallback text is different from the ALT text for an image. It is only shown if the browser cannot handle video or audio tags at all. Replacing the text entirely and thereby not showing an informative message or download link would likely have confusing effects for markdown authors, and for impacted users.

This is the logic I’ve implemented in https://www.npmjs.com/package/markdown-it-html5-media, a compact alternative to cmrd-senya’s plugin.

Settling on the image syntax for the basic use case now would allow us to have a more focused conversation. The above approach does not seem to conflict with current real-world usage of the image syntax, given that it would always be the fallback, e.g., if a PHP URL was specified.

This basic approach could be added to the CommonMark specification, giving some much needed clarity to implementers (hopefully making both my and cmrd-senya’s plugin obsolete soon – this should really be core functionality for a CommonMark parser). The following questions could be discussed separately, and addressed in subsequent spec updates (the answer is highly unlikely to conflict with the basic principles expressed above, so I see no problem moving iteratively):

  • how to handle more than one source URL (seems relatively uncontroversial, but the behavior when not all source URLs are of the same apparent type should be specified)

  • how to force a particular source URL to be interpreted as being of a certain MIME type or subtype (seems most in need of some kind of poll since it will likely involve some new syntax, and/or English language strings in the markup language)


Please do NOT put anything in the alt but alt text. This is becoming more important than every with conversational user interface spoken renditions of CommonMark and its rendered forms.


I think generic directives are the way to go here. Their syntax is general enough to use for pretty much anything, including embedded and/or external content. And since they specify the media type, there’d be no guessing games or corner cases.

As for generic directives’ dependence on English words, let’s fix that with some upside-down syntactic sugar…

| Media | Directive    | Shorthand |
| Image | :image[](){} | ![](){}   |
| Video | :video[](){} | ^[](){}   |
| Audio | :audio[](){} | @[](){}   |

Both the “directive” and “shorthand” syntaxes should be available, so we’d have the best of both worlds. My guess is most people would use the shorthand form.

One issue is, some extensions might already be using these characters for other things – like ^ for superscripts, footnotes, etc. A parser that implements the “shorthand” syntax above would need to give it precedence over those other features.

1 Like

Personally, I would say that ![]() is the best solution as it already is a media embedding, on the other hand, for backend rendering of markdown this will hurt rendering time a lot as they have to check source type for video/mp4 or audio/mp3 to determine the source type which is 900ms slower than just a normal embed. So 3 simple images will add a time of 2700ms while the standard ![]() only use 300ms (100ms/image).

Tested in PHP 7.3

I don’t like the idea of adding new syntax but I have to admit that $[]() and @[]() isn’t a bad idea

what do you think of this? markua by Leanpub.com

I read through this thread a few weeks ago and couldn’t come up with a case in which mp4 was used for audio only.

Today after downloading an audio note from WhatsApp saw that, even though it’s an audio note, it’s using the mp4 format. Definitely not a common practice but it’s out there.

A standard way to generate video tags in Markdown would be awesome. At the moment I’m using regular expressions for video (and image to figures using the alt as caption) expansions.

1 Like

After coming back to this topic which I created years ago, I’ve been able to look at this from a fresh perspective.

There’s a few corner cases like what @nonoesp posted which might create potential confusion for a syntax that originally file extension-independent. If there’s ever an image extension that clashes with an audio or video extension in the future (e.g. if someone introduces a .ogg, .mp4, or .webm for images) we’d have a problem.

Perhaps these issues are not enough to be blockers, but there’s also a nicer syntax that we could use for file extension based content blocks:

Because this syntax is intended to be file extension based from the start, there’s no question confusion about the existing behaviour of the Markdown image syntax - it’s a clean break from existing syntax that is also simpler for writers. It also improves how images are embedded by placing them inside figures. I’d be in favour of uniting around this syntax (as an extension to the main spec) for embedded content instead.

1 Like

Using content blocks would really simplify things.

Here’s a link with the initial announcement of that specification, which were introduced in iA Writer 4 some time ago.


The major distinction is between “online” and “local” embeddings (and I’d guess that what makes more sense here are “online” embeddings).

1 Like

For local embedded content, the iA Writer content block syntax would work well:


For online embedded content (from another domain), the spec could support just pasting in a URL ending with a video file extension:


The latter syntax was suggested by John Gruber for images, instead of the current image syntax, which he describes as his “biggest mistake” with Markdown.

1 Like

That makes a lot of sense, @chrisalley.