Embedded audio and video

OK, I admit that using plain words like video and audio as part of the syntax isn’t ideal. But relying only on lists of file extensions is IMHO much worse. Especially since it is very common for (auto-generated) URLs to have no extension at all.

I’ve read through the whole thread again and I think the most reasonable solution would be a combination of the two ideas as suggested in this post. This would allow the nice and simple syntax with file-extension-detection like ![](my_file.mp3), but it would also allow the (IMHO very important) disambiguation in cases where the file extension is ambiguous, not recognized or just missing, like in

1 Like

This is a fine solution from my perspective. The complaint that

but .mp4 could mean audio only!

does not hold much weight. It could… but almost never will. And if so, use some other filename.


There are a lot of parallels between the comments in this topic and the discussions about adding support for image dimensions and alignments. While it may be useful to override the rules and have more control over specific elements in some scenarios, these scenarios often seem like special cases.

In scenarios where the filename cannot be changed, we have two solutions:

  1. Use raw HTML. The HTML5 spec is already very well defined and provides the user with a lot of power for customisation.
  2. Use the consistent attribute syntax extension to explictly define the source type, e.g. ![](audio.mp4){type: audio/mp4}.

Using either of these approaches requires no additional syntax to the specification for embedded audio and video in CommonMark; both (1) and (2) are seperate specifications. Defining audio and video can remain lightweight, a majority of documents can remain uncluttered with additional syntax, and (perhaps most importantly) we get syntatic continuity with images.

I also understand the appeal of having some means to explictly state the media file’s source type so that media can be embedded that does not follow the file extension convention. But because this is desirable in some (edge) cases, does this mean that writers should be forced to explictly state the source type every time they want to embed a video? That isn’t appealing.


Happy New Year!
Any update on this?
Any considertions finalized ?

1 Like

I’ve read throught all the replies of this post and I’m not completely satisfied with the proposed solutions. As a foreword, I must say that we should all keep in mind that not all Markdown users are accomplished developers or technician. So, we should make our possible to keep the language as simple and concise as possible. Example usage are social networks. The federated ones in particular (diaspora, Ganggo, SocialHome, Friendica).

I don’t think using ![]() for all media contents with media type guess is a good solution. One presented corner case is the the MP4 which can be audio only or video. This means the user should specify the media type anyway. Plus, it is an established fact that ![]() is for images. Extend it to support all media type could disturb the user (it’s a different use case).

Using generic attributes to disambiguate seems not an good idea either. While it is a nice feature, I don’t think it should be mandatory for embedding medias. It’s yet another syntax logic to learn and not all users know what an attribute or a mime type is.

Generic directives are the worst to my eyes. We should keep in mind that audio and video are words that make sens in french and english but not in swaili. While it works for the :youtube[title]{vid=09jf3ow9jfw} example, because Youtube is a brand, :audio[title]{vid=09jf3ow9jfw} makes no sens in russian.

What I propose is a new syntax for audios and a new one for videos, close to the one for images: #[]() for videos and ~[]() for audios. it is elegant and sementically right. # symbolises a screen while ~ symbolises a sound wave.


~ is often used as a delimiter for <s>, <sup>, or <sub>, so I expect that would be problematic.

# is already in use for headings, so I’d say # is out because there are many headings that are also links.

^[]() might work (I’ve seen ^ used for <sup> so same problem as ~)
@[]() might work as most usages of @ are mentions or email addresses, neither of which would conflict with this syntax.
%[]() would probably work
|[]() might work (| is used in tables and I’ve proposed using it for <aside>, so maybe not).
{[]() and }[]() might work but I think they’d look weird and conflict with the proposal for an attribute syntax
+[]() might conflict with links in lists, or at least cause confusion
=[]() might work
&[]() looks silly to me.

If I had to pick, I’d probably go with ^[]() for video, and @[]() for audio. To me they seem to match the pattern of incorporating the first letter into the symbol (upside-down i = !, upside down v = ^, a = @).

1 Like

~ is often used as a delimiter for <s>, <sup>, or <sub>, so I expect that would be problematic.

# is already in use for headings, so I’d say # is out because there are many headings that are also links.

I would say it’s not a problem to have multiple usage for the same symbol. For instance, * is used for lists, empahse and bold. I think the spec is clear enough not to introduce corner cases. # requires a space to produce a heading and ~[audio](audio.mp3), ~sub text~, ~~striked text~~ is not that different from *emphasized text*, **bold text**, * list item.

@[]() should be excluded and reserved for a potential future mentionning usage, IMHO.

As embedding audios and videos would be a very common usage, I propose only easily produced symboles are used. I think + and = do not require complex combination on any keyboard (QWERTY, QWERTZ, German QWERTY and AZERTY)? So, I’m all for +[]() and =[]() if #[]() and ~[]() are not suitable. I would reserve =[]() for films (= can symbolize a video tape). I’m not completely satified with +[]() for audios. Maybe %[]()?

This feels like a hypothetical problem only. In practice, the use of .mp4 for audio is rare. Users can rename the audio file’s extension to the established convention of .m4a in these cases. In rare cases where this is not possible, HTML can still be used to embed the file. But this objection was already discussed at length earlier in the topic, so I’ll stop. If we introduce new syntax for audio and video we’re putting an additional burden on users to learn the new syntax, which is an actual problem as it increases complexity; I don’t think this edge case outweighs that.

Internally displayed content (such as images) that is embedded in the document is the opposite of external context that remains outside of the document (links). The exclamation mark before the link syntax []() means “not”, so ![]() represents content that is the opposite of externally linked content. With this is mind, I think it’s in keeping with the meaning expressed by Markdown’s syntax to extend the ![]() syntax to other kinds of embedded content besides images.

1 Like

GitLab Flavored Markdown (GFM) reuses the image syntax for video:

Image tags with a video extension are automatically converted to a video player. The valid video extensions are .mp4, .m4v, .mov, .webm, and .ogv.

1 Like

This feels like a hypothetical problem only. In practice, the use of .mp4 for audio is rare.

Rare is not enough. I don’t feel like the language should guess the user’s purpose and only describe what the user means.
Another problem is that, then implementations must determine which extension are videos, which ones are audios and keep everything constently up to date with browser supports. For instance, I’m confident that FLAC format will soon be supported by all major browsers. Same for MKV video format. It means each implementation has now to update code, not only when a new MD standard is released, but when the browsers media support change. This introduce coupling between the language itself and browsers support. It definetely feels like a hack to me.

GitLab Flavored Markdown (GFM) reuses the image syntax for video

I’m not sure this is relevant. I’m pretty confident this choice has been made to prevent non-recoverable divergences from future CommonMark standard.

It does create some extra complexity on the developer’s side, a price that has to be paid for writers having less syntax to learn. I think this trade off is worth it, since the coupling is hidden from the end user. We have coupling with HTML block elements as well; these are whitelisted in the spec, saving the writer from having to manually specify that an element is a block element with additional markup. We’ll run into a similar issue if future versions of HTML introduce new block elements, but it’s nothing that can’t be resolved with an add only, never remove strategy.

The syntax I am proposing here follows the principle of least surprise. The widely established convention is to use .mp4 for video, with .m4a being used for audio. So it’s likely to be unsurprising to a user with an existing knowledge of this convention for the proposed markup to render as video and audio respectively.

I just mentioned this because CommonMark aims to be highly compatible, so it’s inclusion in another flavour acts as a contributing reason to use the proposed syntax.

1 Like

This is not relevant. In most cases, the difficulty of a new syntax is leveraged by an editor. Most of the users that don’t have the capacity to learn the new syntax (mostly the non-techs) will use buttons. The others won’t have any problem in learning a new syntax. Especially if it’s carefully chosen. Syntax in a false problem.

On the other hand, guessing the embedding based on extension is one. There may be valid reasons not to have a supported extension or not to have at all. You cannot always control how someone will format a URL. If the URL is something stupid like /kYdDFK5x4xLwL with no extension or even with .php extension, has it has been mentionned earlier in the discussion, how does it work?

I think we should follow the keep it simple principle. Not add any useless complexity. Not try to guess what the user mean. This is not how, for instance, LaTeX works. In LaTeX, you write what you mean. Not what you think the compiler should understand of what you mean. One syntax should do only one thing, and do it properly. I understand your point of it working in most cases. My problem is the “most cases”. Good enough is not the right way.

Now, nothing prevents, for compatibility reasons, the usage of the ![]() syntax also to embed audios and videos. But this should be optionnal and there should be a dedicated syntax for audios and videos.


Assistance from an editor may not be present. This is common, even amongst dedicated Markdown editors, many which which have a minimalist interface with no buttons. A plain text editor may be used that does not know about Markdown’s syntax. This reminds me a bit of the old argument about auto-generated code in programming languages being okay because “the IDE will take care of that” - it’s all well and good until you stop using an IDE.

Technical users do have the capacity to learn a new syntax, but they may not want to look up a new syntax, especially if they only use audio and video infrequently. The proposed syntax is a natural extension of the image syntax so it can be guessed fairly easily.

It would require some kind of override to explicitly override the default media type.

To use an analogy, I would compare this an operating system where by default the file extension .txt opens the file in Notepad, but a file without an extension can be optionally opened in Notepad as well - it just requires a configuration on the user’s part (e.g. right click, open in Notepad).

I think an override would function as a dedicated syntax, as it’s a way of explicitly telling the parser the intended media type. For example:

![](video.mp4){type: video/mp4}
![](audio.mp4){type: audio/mp4}

The main objection to this was the use of English words, which is un-Markdown like. So what I am proposing here is an override, but not necessarily this override. As stated before, ![]() represents embedded content, so it would make sense to keep the opposite of the link syntax as the basis for all embedded content.

Of course, if new symbols are used instead of English words for the override syntax, it may still be just as incomprehensible to non-English speakers (without referencing a manual). So there’s some reason to use English words for advanced properties/consistent attribute syntax when we consider the total amount of overrides and embedded media types that might need to be supported, because it means that at least English speakers will understand their meaning.

1 Like

So you’re not ok with adding a new syntax unless for covering the corner cases. You’re right: ![](audio.mp4){type: audio/mp4} is not markdownish at all. CommonMark should avoid by all means introducing that kind of exception in the language because, has I’ve stated earlier, “audio” or “type” are word that make sense in french and english but that’s pretty everything. And I’m totally convinced that ![](audio.mp4){type: audio/mp4} will not be easier to remember for a russian that $[](). Especially if you explain that $[]() was chosen because $ recalls a violin, hence music, hence audio.

![](){} is an ok syntax for extensions, because it leaves the possibility to the user to choose what extension to support and extension’s parameters may even be translated.

But english words should definitely not pass into the standard.

Now, that this requirement is established, you still have to cover the corner cases. Because ![](https://example.org/media.php) is a valid usage of the syntax and must not be excluded from the standard.

And the only way to cover the corner cases is a new syntax.

To use an analogy, I would compare this an operating system where by default the file extension .txt opens the file in Notepad, but a file without an extension can be optionally opened in Notepad as well - it just requires a configuration on the user’s part (e.g. right click, open in Notepad).

Your example is invalid. Windows is the only OS that uses extension to open a file. Unix-like OSes do not. And this way of doing, of course causes problems in Windows. Exploits relies on this flaw of the system. This is definitely not the way to go.

My position is that same base syntax should be used for embedded content (images, audio, video, embedded CSV tables, etc), since embedded content is the opposite of linked content.

1 Like

I understand your position, but I pointing that it has unrecoverable flaws and should not be priviledged or at least not alone. Now I don’t know about the usage of embedding CSV tables, but I know that embedding audios and videos is a very important one in social networks. If CommonMark takes this path, Socialhome will be forced to diverge from CommonMark. If CommonMark aims to be a standard unifying every Markdown syntax, you have to consider every use case of the syntax. ![]() is not the way to go in embedding audios and videos in social networks because a lot of users may post audios and videos on their personnal websites or blog under a .php extension and be willing to embed in their SN posts.

Our solutions are not totally incompatible. I’m not against using ![]() to embed medias base on extension. Actually, I don’t care as long as embedding audios and videos also has its own syntax.

1 Like

We should be able to resolve the different extension case with a decent override that extends the existing ruleset, but doesn’t use English words. Perhaps ![](something.mp3) as a default, with ![](something.php){$} for an override (as a quick thought). The idea is to build general syntax rules that we can build upon, in this case embedded content syntax ![]() mixed with consistent attribute syntax {}, the latter of which is also under heavy discussion.

It looks like this discussion is stalled on corner cases. However, it’s worth noting that it would be possible to move forward with the image syntax in a manner that causes no backwards compatibility problems:

  • For common video extensions (including mildly ambiguous ones like MP4), interpret image syntax as <video> embed
  • For common audio extensions, interpret image syntax as <audio> embed
  • In all other cases, continue to interpret image syntax as image.

For the content of the <video>/<audio> element, I would suggest:

  • Recommend that implementers render a standard “cannot be played” warning with a download link, as in the examples here.
  • Recommend that the contents of square brackets ![desc](url) be shown in addition to this text, as a content description.

It’s important to note that the fallback text is different from the ALT text for an image. It is only shown if the browser cannot handle video or audio tags at all. Replacing the text entirely and thereby not showing an informative message or download link would likely have confusing effects for markdown authors, and for impacted users.

This is the logic I’ve implemented in https://www.npmjs.com/package/markdown-it-html5-media, a compact alternative to cmrd-senya’s plugin.

Settling on the image syntax for the basic use case now would allow us to have a more focused conversation. The above approach does not seem to conflict with current real-world usage of the image syntax, given that it would always be the fallback, e.g., if a PHP URL was specified.

This basic approach could be added to the CommonMark specification, giving some much needed clarity to implementers (hopefully making both my and cmrd-senya’s plugin obsolete soon – this should really be core functionality for a CommonMark parser). The following questions could be discussed separately, and addressed in subsequent spec updates (the answer is highly unlikely to conflict with the basic principles expressed above, so I see no problem moving iteratively):

  • how to handle more than one source URL (seems relatively uncontroversial, but the behavior when not all source URLs are of the same apparent type should be specified)

  • how to force a particular source URL to be interpreted as being of a certain MIME type or subtype (seems most in need of some kind of poll since it will likely involve some new syntax, and/or English language strings in the markup language)


Please do NOT put anything in the alt but alt text. This is becoming more important than every with conversational user interface spoken renditions of CommonMark and its rendered forms.


I think generic directives are the way to go here. Their syntax is general enough to use for pretty much anything, including embedded and/or external content. And since they specify the media type, there’d be no guessing games or corner cases.

As for generic directives’ dependence on English words, let’s fix that with some upside-down syntactic sugar…

| Media | Directive    | Shorthand |
| Image | :image[](){} | ![](){}   |
| Video | :video[](){} | ^[](){}   |
| Audio | :audio[](){} | @[](){}   |

Both the “directive” and “shorthand” syntaxes should be available, so we’d have the best of both worlds. My guess is most people would use the shorthand form.

One issue is, some extensions might already be using these characters for other things – like ^ for superscripts, footnotes, etc. A parser that implements the “shorthand” syntax above would need to give it precedence over those other features.

1 Like