Creating figures with a caption depending on context (no markup natural convention)

Hello everyone. I’m a new user, although not a new reader of this forum. I would like to propose a new extension addressing figure-like constructs in markdown.

First my stance on markdown: what is neat in markdown is the fact that it reuses well ingrained written conventions in order to provide semantics and do a good job interpreting as HTML. In several discussions talking about figurification (sorry I made up a word) and adding captions I was struck by the unnatural constructs proposed to do so, that is: new kinds of markups. I think this should be avoided when possible.

And what is a figure? In my opinion it is a standalone construct with a text written below it. If I wanted to represent a figure in plain text I would therefore present it as such.

A paragraph.

![my alt text](https://www.example.com/image.png "my title")
A convenient caption below my image standing alone in a paragraph.

Because if I wrote the code for an image inside a paragraph, my intention would therefore have been that the image integrates with the flow of text. But if I write it that way, my intention is that the image is a thing in itself, and we know it is a figure because I took the time to give a text with it below (aka a caption).

In core markdown it would render gracefully as such (fixed example):

<p>A paragraph.</p>
<p><img src="https://www.example.com/image.png" alt="my alt text" title="my title">
A convenient caption below my image standing alone in a paragraph.</p>

Which is neat and breaks neither the flow for the one who reads the markdown nor adds some weird stuff for the one who reads the rendered HTML. But with an extension it could be rendered as this:

<p>A paragraph.</p>
<figure>
<img src="https://www.example.com/image.png" alt="my alt text" title="my title">
<figcaption>A convenient caption below my image standing alone in a paragraph.</figcaption>
</figure>

And it would generalize well with code blocks, tables (for extensions that knows about tables), embedded content (for extensions that generalize image constructs), and even blockquotes, although in this case it could serve a footer of the blockquote and not make a figure of it, because blockquotes are not the kind of things that goes in a figure in the first place. For example (fixed example and rendering to correspond what I intended):

A paragraph.

> May the force be with you, always.
>
Master Obi-wan Kenobi.

By default renders as this:

<p>A paragraph.</p>
<blockquote>
<p>May the force be with you, always.</p>
</blockquote>
<p>Master Obi-wan Kenobi.</p>

Which is readable and understandable in the context. But with this extension it would render as follows:

<p>A paragraph.</p>
<blockquote>
<p>May the force be with you, always.</p>
<footer>Master Obi-wan Kenobi.</footer>
</blockquote>

I think this proposal respects the philosophy of markdown by being lightweight, not imposing new markups to people but interpreting better the way we write in plain text. The context does the job.

What do you think?

1 Like

Following the “principle of least surprise,” what might people expect when they see non-marked-up text beneath blockquoted text? The author’s intent is unclear. It could be:

  • <footer>
  • <cite>
  • Just plain text

I’d favor adding a little syntax in order to distinguish between different elements.

For example, figure with figcaption:

![my alt text](https://www.example.com/image.png "my title")
[A convenient caption below my image standing alone in a paragraph.]

Blockquote with footer:

> May the force be with you, always.
>
> [Master Obi-wan Kenobi.]

And an idea for blockquote with cite:

> May the force be with you, always.
>
> -- Master Obi-wan Kenobi.

I like you’re suggestion because even without the extension, the caption is visible.

But is it always good enough to have a caption as a paragraph after the captioned element? How is it regarding accessibility, if we consider HTML rendering, for example?

Another way to extend (or “abuse”) standard syntax, at least for images, would be to consider the title to be a kind of caption. I personally do that in some projects. People can write title attributes in images, and they are extracted as <figcaption>.

I would render HTML from your first image example:
![my alt text](https://www.example.com/image.png "my title")

As this:

<figure>
  <img src="https://www.example.com/image.png" alt="my alt text">
  <figcaption>my title</figcaption>
</figure>

I know it’s an abuse of the syntax meant for title attributes, but I feel like it’s better for accessibility when the extension is not there.

So, I don’t want to say it’s better than your suggestion in any way, but I wanted to show what solution I use on my side, if it can help discuss the topic.

Another option would be to use a caption attribute if this other extension is accepted some day:

Actually, <cite> is reserved for citation of a work, not for author attribution. So it would either mean <footer> or plain text.

I advocate to interpret a single line of plain text (and nothing else) following a figure worthy element (image, table, code block, blockquote, you name it) that starts a paragraph, to mean that we intend to present a figure with a caption, or a blockquote with a footer.

I think the context beats adhoc syntax.

Predefined named (English) attributes are incompatible with the basic design goals of Markdown and Commonmark. The OP is absolutem right in wanting to avoid that, although the markup still might need to be a little bit more explicit. I could come up with a lot of formats for figures in markdown that would be quite obvious to a human reader in many or most cases. The problem is to describe it sufficiently so we’ve cases also make sense and it does not become impossible to implement.

A paragraph

![text](example.png "title")
caption

B paragraph
A paragraph

![text](example.png "title")
-- caption

B paragraph
~~~~~~~~ markdown 
A paragraph

-- caption? 
![text](example.png "title")

B paragraph
A paragraph

![text](example.png "title")

-- caption? 

B paragraph
A paragraph

![text](example.png "title") -- caption? 

B paragraph
A paragraph

![text](example.png "title")
[caption?]

B paragraph
A paragraph

[caption?]
![text](example.png "title")

B paragraph
A paragraph

![text](example.png "title")
: caption

B paragraph
A paragraph

![text](example.png "title")
! caption

B paragraph
A paragraph

caption:
![text](example.png "title")

B paragraph
A paragraph

::: caption? :::
![text](example.png "title")

B paragraph
A paragraph

**caption**
![text](example.png "title")

B paragraph
A paragraph

![text](example.png "title")
**caption**

B paragraph
A paragraph

####### caption 
![text](example.png "title")

B paragraph
A paragraph

![caption](example.png "title")

B paragraph
A paragraph

![text](example.png "caption")

B paragraph
A paragraph

________________________
![text](example.png "title")
caption
________________________

B paragraph
A paragraph

![text](example.png "title")
    caption

B paragraph

In many ways, I think HTML considerations pollute the debate around lightweight markup languages such as markdown. I mean, they are separate languages adapted for their own use cases. You cannot shoehorn one completely into the other. Accessibility is a good example: HTML5 makes marvels into this field. Use it when it is your concern.

That is why I advocate for a (hopefully) better interpretation of existing writing conventions, with the least amount of markup helpers possible.

Still, on the debate around accessibility, there is no way to produce captions today. If anything, my proposed extension makes things better in this regard.

Indeed, there are multiple sensible ways we would tend to represent a caption in plain text. But somehow these two feel wrong for at least two reasons:

  • they abuse otherwise perfectly sound semantics (alt and title respectively),
  • they are image centric (or embed centric, for those who extend the concept), and therefore ignore every other figure worthy elements.

The other ones make more sense to me.

There are a lot of complexities in the handling of figures. There’s a long relevant discussion on the pandoc bug tracker, which shows some of the features people have wanted. Among other things:

  • the ability to have block-level content in captions
  • the ability to specify a short caption for inclusion in generated lists of figures
  • the ability to specify a reference so the figure can be referred to
  • the ability to include multiple images in a single figure, or even subfigures

Of course these have to be balanced against the design goal of having something that looks readable in plain text. But it’s a tricky problem.

See? That’s the problem when we try to express in a language everything that’s possible/easy to express in another. We lose focus. That said…

the ability to specify a short caption for inclusion in generated lists of figures

I would say that’s out of scope of markdown… Better use a system dedicated for edition…

the ability to specify a reference so the figure can be referred to

Another extension could manage that, maybe.

the ability to include multiple images in a single figure, or even subfigures

We could extend my proposed extension to detect a sequence of figure-worthy elements starting a paragraph and followed by a single line of text, understood as a caption, so that each figure-worthy element is a subfigure. As a consequence…

the ability to have block-level content in captions

Following the principles behind my first idea, I think it works well. Because:

  • What would be the implicit meaning of a sequence of figure-worthy elements followed by a single line of text?
    • Probably that each is a subfigure and the text is a caption.
  • What would be the implicit meaning of a sequence of figure-worthy elements followed by a single block element?
    • Probably that each is a subfigure and the block is a caption.

With these models, no block element can be a caption, but this would not be the obvious intent so that’s cool.