Embedded audio and video

While I understand that assuming .mp4 = video/mp4 may occasionally produce unexpected results, the times when this assumption is not accurate appear to be an edge cases. So perhaps consistent attribute syntax could be used to cover just the edge cases. The problem with requiring consistent attribute syntax is that it is an advanced feature and more difficult to remember for general writers. By following the convention, .mp4 = video/mp4, we allow users to simply write ![](video.mp4) and get the results that, in the majority of cases, they would expect. For edge cases they could use consistent attribute syntax to specify the content type, or raw HTML if the consistent attribute syntax extension is not enabled.

As an addendum to the proposed syntax, I suggest using the full source type (e.g. “video/mp4”) for consistency with HTML. Perhaps the order could be significant if multiple video sources are defined. So the following Markdown:

![](video.mp4 video.webm){type=video/mp4,video/webm}

Would render as:

<video>
  <source type="video/mp4" src="video.mp4">
  <source type="video/webm" src="video.webm">
</video>

There was also mention of using the contents of the square brackets for representing the media type. I disagree with this because it goes against the priniciple of uniformity (or at least the philosophy behind it). I suggest that square brackets are also used for alternative text for audio and video. E.g.

![Your browser does not support the video element.](video.mp4 video.webm)

Renders as:

<video>
  <source type="video/mp4" src="video.mp4">
  <source type="video/webm" src="video.webm">
  <p>Your browser does not support the video element.</p>
</video>
2 Likes

btw I thought that the [] in ![]() is for alt description for blind users etc…?

Correct, it corresponds to an image’s alt attribute (alternative text). My reasoning is that we should keep the usage of [] in ![]() relatively consistent for other embedded media and use it for an alternative text if the media itself cannot be rendered. Another option might be to use this to specify subtitles which are a form of alternative text. E.g. this Markdown:

![subtitles.vtt](video.mp4 video.webm)

Generates an HTML track element:

<video>
  <source src="video.mp4" type="video/mp4">
  <source src="video.webm" type="video/webm">
  <track src="subtitles.vtt">
</video>
3 Likes

Another reason to use the ![](video.mp4) syntax for embedded resources is that Markua will also use this syntax:

https://leanpub.com/markua/read#leanpub-auto-video

Cross compatibility is a good thing, even if Markua is not aiming for full compatibility with Markdown or CommonMark. The Markua spec linked here includes a whitelist of file extensions similar to what I suggested in earlier posts. .mp4 = video, .ogg = audio, as is the convention.

1 Like

We are not in disagreement. Yes, in most cases you use ![](video.mp4) but in edge cases (e.g. new video format, like webm), then ![](video.webm){type=video/webm} would be useful for when the whitelist doesn’t catch it. And if you care about being recognized as a missing video if nothing is found, then use the full form of !video[](video.webm){type=video/webm} can kick in (Which is the form that WYSIWYG commonmark editor might write as).

I don’t understand why audio/video should be covered by the spec. This task can be completely automated by external preprocessor. For example, in nodeca:

  1. Inline links to youtube/vimeo/… are replaced with video title
  2. Links to youtube/vimeo/… on separate paragraph are replaced with video player

Rules can be extended to ANY url pattern, without spec change. So, i don’t know why audio/video files should be special case.

For node.js we did https://github.com/nodeca/embedza to automate things.

Discouse has similar ruby package, “onebox”, but AFAIK it does not supports inline links & flexible fallback (try block format if available, if not - try inline)

http://example.com/song.mp3

Does this link to the song or embed an audio player in the page? It is not clear what the author’s intention was. If we use an explicit syntax, ![](http://example.com/song.mp3) in the case of embeds and <http://example.com/song.mp3> in the case of links, both are possible to describe unambigiously.

@vitaly, how do I distinguish between this cases within the markdown-it plugin? This is the code I wrote some time before. It overrides “md.renderer.rules.link_open” renderer. How can I get if the link I parse is inlined or on a separate paragraph?

In nodeca we do such processing on separate stage, after markdown. We just load resulting html into AST (with jquery/cheerio), and check if link/image is the only content of paragraph or not.

I see no reasons to do everything at once in markdown parser, until you have very special requirements to speed.

Well, I already have working code, and it would be simpler to modify what I have and not rewrite everything from scratch, if it’s possible. Especially if it is faster :slight_smile:

So is it possible within the renderer?

No. Renderer operates with tokens and is not expected to analyze/modify combinations.

You can write plugin for core chain to scan for paragraphs, anylize if content is link_open + text + link_close and do replace. Or manually call parse + scan/replace + render.

1 Like

Anything new in this ? Any standard way to do now?

Yes I agree.

To spec makers, they don’t need to create a new expression of insert audio/video url, that could go to a complex debate because it’s hard to find a simple and elegant way to do that.

To end users, they don’t need to think more and learn more, just copy/paste the code given by audio/video websites, like how we do now.

To developer, just make extensions to finish that work if they want. Once there is a better way to be created, we can decide whether or not put it into the spec.

Regarding this comment in the Issues we SHOULD resolve before 1.0 release topic:

Should we change the spec so that instead of “images” it talks of media more generally, and allow the image syntax to be used for audio and video? (Renderers could render appropriately for the media.)

My thoughts: for 1.0, rename this to a generic “embedded media” type, with image being the default media type that is rendered and used in the spec examples. The spec could mention that the embedded media syntax may be used to render other types of media based on file extension, the same way that the parser may render soft breaks as hard breaks (with soft breaks being the default). E.g. the spec could state:

A renderer may also provide an option to render the embedded media as a media type other than image, based on the file extension of the media file.

The whitelist of valid URL schemes for autolinks was removed in CommonMark 0.24. Omitting mention of particular file extensions/media types would be consistent with this. A formal list of file extensions is a moving target; keeping this list up to date falls quite far outside the scope of formalising Gruber’s Markdown, but perhaps a later extension could formalise how the media (based on file extension) should be rendered.

2 Likes

You will not be able to select proper wrapper without info about media type. And you can not trust extension in URL. Data from URL should be loaded and analyzed. There are 2 problems:

  • You can not define clear method to resolve media type
  • Download from remote is async by nature. That will be ass pain for parser architecture.

My suggestion is to avoid adding AI to spec.

2 Likes

@vitaly, that seems equally true for images – can we really “know” that a given URL represents an image without inspecting it? Right now when I use a CommonMark compliant parser and embed an MP4 URL, I get an <img> tag, even though the URL clearly provides useful information to the parser suggesting that this is not in fact what the user intended.

I also think it’s important to distinguish between YouTube/Vimeo style embeds (which rely on third party code) and browser-supported HTML5 video/audio playback (which is an open standard, just like <img> tags).

As a markdown user, I would find a system that inserts standards-compliant <video> or <audio> tags for appropriate embed URLs on the basis of extensions more than sufficient for most practical purposes. Having special syntax to force the embed type might be useful for special cases. But if I want to embed a locally uploaded MP4 video in my markdown blog post, that should “just work” on the basis of the URL.

I do agree that any advanced embedding of third party libraries doesn’t belong in the spec, but embedding a video or audio file in a standards compliant manner is, as far as I can tell, not meaningfully different from the already supported image use case. Am I missing something?

@vitaly, that seems equally true for images – can we really “know” that a given URL represents an image without inspecting it?

You missed the end goal - output markup. If it is <img> always, no need to analyze src, because it does not afftects output. But if output can be img/audio/video - some additional criteria required to choose right. Assumptions about file extensions in URL are too weak for spec IMHO.

2 Likes

I think a case could be made not to maintain a list of file extensions mapped against <img>, <video> and <audio> in the spec, just because what file types frequently updated browsers like Chrome do support natively can change pretty often.

The spec should IMO, however, make the recommendation to implementers to vary the output of what’s currently image-only markup based on the file extension of commonly natively supported file formats (with <img> being the default if the extension is not recognized, or if there is none), and provide implementers with sample output for <img>, <video> and <audio>. All three are first-class citizens in HTML5 and it makes no sense to me that markdown would not support them as such.

The behavior that markup like ![some text](https://someurl.com/some.mp3) produces an <img> embed of a URL that’s obviously not an image is counter-intuitive, and the whole point of using a language like markdown is IMO to provide the most intuitive result for the common cases, using markup that people can keep in their head. To the extent that the spec produces obviously counterintuitive results when implemented, I would argue that it needs to be revised not to do so.

I agree with previous commenters who have suggested extending the syntax to override that behavior on an as-needed basis, but if there’s no consensus what that should look like, I would still recommend changing the default behavior of ![x](y) type invocations in the way described above, i.e. make intelligent guesses based on commonly used extensions, and fall back to image markup if no intelligent guess can be made. I don’t think that would qualify as AI.

Although I disagree with you here, let me thank you for all your hard work on markdown-it; it’s a pleasure to use! And again - I may be missing something obvious and am mostly speaking from the perspective of someone using other people’s markdown parsers in a few different codebases.

Such recommendations as definitions with partial/incomplete coverage do not work well in programming. That quickly become “one more standard”, as it already happened with markdown.

I understand people who need audio/video with the same markup, but don’t see ways for correct implementation.

BTW, we use embedza for post-processing to beautify links (not images) an that works very well on practice. And more convenient on forums than use image tags.