Embedded audio and video

eloquence · August 6, 2016, 2:11am

@vitaly, that seems equally true for images – can we really “know” that a given URL represents an image without inspecting it? Right now when I use a CommonMark compliant parser and embed an MP4 URL, I get an <img> tag, even though the URL clearly provides useful information to the parser suggesting that this is not in fact what the user intended.

I also think it’s important to distinguish between YouTube/Vimeo style embeds (which rely on third party code) and browser-supported HTML5 video/audio playback (which is an open standard, just like <img> tags).

As a markdown user, I would find a system that inserts standards-compliant <video> or <audio> tags for appropriate embed URLs on the basis of extensions more than sufficient for most practical purposes. Having special syntax to force the embed type might be useful for special cases. But if I want to embed a locally uploaded MP4 video in my markdown blog post, that should “just work” on the basis of the URL.

I do agree that any advanced embedding of third party libraries doesn’t belong in the spec, but embedding a video or audio file in a standards compliant manner is, as far as I can tell, not meaningfully different from the already supported image use case. Am I missing something?

vitaly · August 6, 2016, 6:34am

@vitaly, that seems equally true for images – can we really “know” that a given URL represents an image without inspecting it?

You missed the end goal - output markup. If it is <img> always, no need to analyze src, because it does not afftects output. But if output can be img/audio/video - some additional criteria required to choose right. Assumptions about file extensions in URL are too weak for spec IMHO.

eloquence · August 6, 2016, 7:44am

I think a case could be made not to maintain a list of file extensions mapped against <img>, <video> and <audio> in the spec, just because what file types frequently updated browsers like Chrome do support natively can change pretty often.

The spec should IMO, however, make the recommendation to implementers to vary the output of what’s currently image-only markup based on the file extension of commonly natively supported file formats (with <img> being the default if the extension is not recognized, or if there is none), and provide implementers with sample output for <img>, <video> and <audio>. All three are first-class citizens in HTML5 and it makes no sense to me that markdown would not support them as such.

The behavior that markup like ![some text](https://someurl.com/some.mp3) produces an <img> embed of a URL that’s obviously not an image is counter-intuitive, and the whole point of using a language like markdown is IMO to provide the most intuitive result for the common cases, using markup that people can keep in their head. To the extent that the spec produces obviously counterintuitive results when implemented, I would argue that it needs to be revised not to do so.

I agree with previous commenters who have suggested extending the syntax to override that behavior on an as-needed basis, but if there’s no consensus what that should look like, I would still recommend changing the default behavior of ![x](y) type invocations in the way described above, i.e. make intelligent guesses based on commonly used extensions, and fall back to image markup if no intelligent guess can be made. I don’t think that would qualify as AI.

Although I disagree with you here, let me thank you for all your hard work on markdown-it; it’s a pleasure to use! And again - I may be missing something obvious and am mostly speaking from the perspective of someone using other people’s markdown parsers in a few different codebases.

vitaly · August 6, 2016, 9:02am

Such recommendations as definitions with partial/incomplete coverage do not work well in programming. That quickly become “one more standard”, as it already happened with markdown.

I understand people who need audio/video with the same markup, but don’t see ways for correct implementation.

BTW, we use embedza for post-processing to beautify links (not images) an that works very well on practice. And more convenient on forums than use image tags.

eloquence · August 6, 2016, 8:44pm

Well, I would argue that leaving handling of even basic HTML5 features like <video> and <audio> tags entirely up to implementers is far worse from the perspective of implementation proliferation (and also anachronistic). After thinking about it a bit more, I do think it would be best to include a non-exhaustive list of extensions in the spec, with a note that implementers are free to add formats natively supported by widely used browsers to that list.

The list, AFAICT, would be:

<audio>: default for URLs that end with .wav, .mp3, .ogg (see below)
<video>: default for URLs that end with .mp4, .ogv, .webm
<img>: default for all other URLs

All matching would be case-insensitive.

.ogg is the most challenging since it is a container format associated with both video and audio in the wild, which has led to proliferation of the .ogv extension. This and similar edge cases would be a reason to offer an override in the markup, but in the most common cases, the above matching should work just fine, IMO.

chrisalley · August 24, 2016, 12:07pm

I think all formats natively supported by widely used browsers should be included in the list to avoid ambiguity. As mentioned earlier, this is moving target, but we can attempt to keep the list up to date. As browsers add support for new formats, file extensions could be added to the list using a never remove, only add strategy.

Flaburgan · September 10, 2016, 3:49pm

We can discuss about it infinitely. What’s the final decision? We really want to merge @cmrd_senya PR in diaspora* core, waiting for the standard to be decided…

mgeier · September 13, 2016, 2:15pm

I think relying on the file name extension is much too brittle. Also, any list of file types/extensions will be quickly out-of-date.

I think it would be best to keep ![]() for images and introduce new syntax for “video” and “audio” (and possibly more in the future).

I think that images are used far more frequently, and it wouldn’t hurt if the others were a bit more verbose, e.g. !video[...](...) and !audio[...](...) (as was suggested before in this thread).

Crissov · September 13, 2016, 7:40pm

Literal keywords like video and audio are absolutely not the CM/MD way.

mgeier · September 14, 2016, 7:45am

OK, I admit that using plain words like video and audio as part of the syntax isn’t ideal. But relying only on lists of file extensions is IMHO much worse. Especially since it is very common for (auto-generated) URLs to have no extension at all.

I’ve read through the whole thread again and I think the most reasonable solution would be a combination of the two ideas as suggested in this post. This would allow the nice and simple syntax with file-extension-detection like ![](my_file.mp3), but it would also allow the (IMHO very important) disambiguation in cases where the file extension is ambiguous, not recognized or just missing, like in

!audio[](some_url?id=superhit)

codinghorror · September 16, 2016, 10:29pm

This is a fine solution from my perspective. The complaint that

but .mp4 could mean audio only!

does not hold much weight. It could… but almost never will. And if so, use some other filename.

chrisalley · September 17, 2016, 12:04am

There are a lot of parallels between the comments in this topic and the discussions about adding support for image dimensions and alignments. While it may be useful to override the rules and have more control over specific elements in some scenarios, these scenarios often seem like special cases.

In scenarios where the filename cannot be changed, we have two solutions:

Use raw HTML. The HTML5 spec is already very well defined and provides the user with a lot of power for customisation.
Use the consistent attribute syntax extension to explictly define the source type, e.g. ![](audio.mp4){type: audio/mp4}.

Using either of these approaches requires no additional syntax to the specification for embedded audio and video in CommonMark; both (1) and (2) are seperate specifications. Defining audio and video can remain lightweight, a majority of documents can remain uncluttered with additional syntax, and (perhaps most importantly) we get syntatic continuity with images.

I also understand the appeal of having some means to explictly state the media file’s source type so that media can be embedded that does not follow the file extension convention. But because this is desirable in some (edge) cases, does this mean that writers should be forced to explictly state the source type every time they want to embed a video? That isn’t appealing.

v3ss0n · January 8, 2017, 1:01pm

Happy New Year!
Any update on this?
Any considertions finalized ?

christophehenry · August 4, 2017, 9:57pm

I’ve read throught all the replies of this post and I’m not completely satisfied with the proposed solutions. As a foreword, I must say that we should all keep in mind that not all Markdown users are accomplished developers or technician. So, we should make our possible to keep the language as simple and concise as possible. Example usage are social networks. The federated ones in particular (diaspora, Ganggo, SocialHome, Friendica).

I don’t think using ![]() for all media contents with media type guess is a good solution. One presented corner case is the the MP4 which can be audio only or video. This means the user should specify the media type anyway. Plus, it is an established fact that ![]() is for images. Extend it to support all media type could disturb the user (it’s a different use case).

Using generic attributes to disambiguate seems not an good idea either. While it is a nice feature, I don’t think it should be mandatory for embedding medias. It’s yet another syntax logic to learn and not all users know what an attribute or a mime type is.

Generic directives are the worst to my eyes. We should keep in mind that audio and video are words that make sens in french and english but not in swaili. While it works for the :youtube[title]{vid=09jf3ow9jfw} example, because Youtube is a brand, :audio[title]{vid=09jf3ow9jfw} makes no sens in russian.

What I propose is a new syntax for audios and a new one for videos, close to the one for images: #[]() for videos and ~[]() for audios. it is elegant and sementically right. # symbolises a screen while ~ symbolises a sound wave.

zzzzBov · August 5, 2017, 5:07am

~ is often used as a delimiter for <s>, , or , so I expect that would be problematic.

# is already in use for headings, so I’d say # is out because there are many headings that are also links.

^[]() might work (I’ve seen ^ used for  so same problem as ~)
@[]() might work as most usages of @ are mentions or email addresses, neither of which would conflict with this syntax.
%[]() would probably work
|[]() might work (| is used in tables and I’ve proposed using it for <aside>, so maybe not).
{[]() and }[]() might work but I think they’d look weird and conflict with the proposal for an attribute syntax
+[]() might conflict with links in lists, or at least cause confusion
=[]() might work
&[]() looks silly to me.

If I had to pick, I’d probably go with ^[]() for video, and @[]() for audio. To me they seem to match the pattern of incorporating the first letter into the symbol (upside-down i = !, upside down v = ^, a = @).

christophehenry · August 5, 2017, 8:58am

~ is often used as a delimiter for <s>, , or , so I expect that would be problematic.

# is already in use for headings, so I’d say # is out because there are many headings that are also links.

I would say it’s not a problem to have multiple usage for the same symbol. For instance, * is used for lists, empahse and bold. I think the spec is clear enough not to introduce corner cases. # requires a space to produce a heading and ~[audio](audio.mp3), ~sub text~, ~~striked text~~ is not that different from *emphasized text*, **bold text**, * list item.

@[]() should be excluded and reserved for a potential future mentionning usage, IMHO.

As embedding audios and videos would be a very common usage, I propose only easily produced symboles are used. I think + and = do not require complex combination on any keyboard (QWERTY, QWERTZ, German QWERTY and AZERTY)? So, I’m all for +[]() and =[]() if #[]() and ~[]() are not suitable. I would reserve =[]() for films (= can symbolize a video tape). I’m not completely satified with +[]() for audios. Maybe %[]()?

chrisalley · August 5, 2017, 1:29pm

This feels like a hypothetical problem only. In practice, the use of .mp4 for audio is rare. Users can rename the audio file’s extension to the established convention of .m4a in these cases. In rare cases where this is not possible, HTML can still be used to embed the file. But this objection was already discussed at length earlier in the topic, so I’ll stop. If we introduce new syntax for audio and video we’re putting an additional burden on users to learn the new syntax, which is an actual problem as it increases complexity; I don’t think this edge case outweighs that.

Internally displayed content (such as images) that is embedded in the document is the opposite of external context that remains outside of the document (links). The exclamation mark before the link syntax []() means “not”, so ![]() represents content that is the opposite of externally linked content. With this is mind, I think it’s in keeping with the meaning expressed by Markdown’s syntax to extend the ![]() syntax to other kinds of embedded content besides images.

chrisalley · August 5, 2017, 1:52pm

GitLab Flavored Markdown (GFM) reuses the image syntax for video:

Image tags with a video extension are automatically converted to a video player. The valid video extensions are .mp4, .m4v, .mov, .webm, and .ogv.

christophehenry · August 5, 2017, 2:39pm

This feels like a hypothetical problem only. In practice, the use of .mp4 for audio is rare.

Rare is not enough. I don’t feel like the language should guess the user’s purpose and only describe what the user means.
Another problem is that, then implementations must determine which extension are videos, which ones are audios and keep everything constently up to date with browser supports. For instance, I’m confident that FLAC format will soon be supported by all major browsers. Same for MKV video format. It means each implementation has now to update code, not only when a new MD standard is released, but when the browsers media support change. This introduce coupling between the language itself and browsers support. It definetely feels like a hack to me.

GitLab Flavored Markdown (GFM) reuses the image syntax for video

I’m not sure this is relevant. I’m pretty confident this choice has been made to prevent non-recoverable divergences from future CommonMark standard.

chrisalley · August 8, 2017, 11:23am

It does create some extra complexity on the developer’s side, a price that has to be paid for writers having less syntax to learn. I think this trade off is worth it, since the coupling is hidden from the end user. We have coupling with HTML block elements as well; these are whitelisted in the spec, saving the writer from having to manually specify that an element is a block element with additional markup. We’ll run into a similar issue if future versions of HTML introduce new block elements, but it’s nothing that can’t be resolved with an add only, never remove strategy.

The syntax I am proposing here follows the principle of least surprise. The widely established convention is to use .mp4 for video, with .m4a being used for audio. So it’s likely to be unsurprising to a user with an existing knowledge of this convention for the proposed markup to render as video and audio respectively.

I just mentioned this because CommonMark aims to be highly compatible, so it’s inclusion in another flavour acts as a contributing reason to use the proposed syntax.