Horizontal rules as (hierarchical) section breaks etc

When Markdown was devised in 2004, there was no HTML5 and hence no <section> element type, just <hr>. Should a modern implementation (per user-select option or extension) treat one or all “horizontal line” markup as section breaks? Should it introduce new repeated character lines instead, e.g. ====, ####, .... or ''''?

# First chapter

----

# Second chapter
<section>
<h1>First chapter</h1>
</section><section>
<h1>Second chapter</h1>
</section>

Paged media

Slides

For slide-based presentations made with Markdown, horizontal lines (mostly dashed ones ---) have already been used to imply slide breaks: Remark, Slidify, IO Slides, Swipe, Landslide, Deckset, Pandoc, Biggie, but MarkdownPresenter uses a line with only an exclamation mark in it and Keydown even adds an English keyword !SLIDE. Slides are a good example of paged media, print-outs are another one.

Print

Leanpub and its offspring Markua, both tailored for traditional page-based publishing, support page breaks and book sections ‘frontmatter’, ‘mainmatter’ and ‘backmatter’ (like LaTeX’s ‘book’ document class). They do so with English keywords enclosed in curly braces as sole line content. Since Markua is still in the making, should Commonmark recommend a different approach?

{frontmatter}

TOC, imprint, preface etc.

{mainmatter}

Chapters

{pagebreak}

Sections etc.

{backmatter} 

Appendices, glossary, index, etc.
TOC, imprint, preface etc.

====

Chapters

----

Sections etc.

====

Appendices, glossary, index, etc.

Alternative markup

I have always fancied the simple scissors emoticon / ASCII art which could be used for page breaks unambiguously.

----8<----
<hr class="page">
hr.page {page-break-after: always; height: 0;}

Meta data

YAML markup blocks, mostly at the top of the document, use --- as fences (and sometimes ... as an ending fence), e.g. in Pandoc.

---
Title: Document title
---
Content
<html><head>
<title>Document title</title>
</head><body>
<p>Content</p>
</body></html>

Hierarchy

If horizontal lines were interpreted as section breaks, should different kinds of line characters imply different hierarchic levels of sections, e.g. *** ends a section at the current heading level, --- ends a top-level section?

# h1

top section

## h2

lower section

 * * *

top section again
<section><h1>h1</h1>
<p>top section</p>
  <section><h2>h2</h2>
  <p>lower section</p>
  </section>
<p>top section again</p>
</section>

PS: There is related Spec discussion at Horizontal Rule or Thematic Break?.

1 Like

Related discussion: Explicit section not possible?

1 Like

Rethinking possible syntax harmonization while documenting existing extensions, I had an idea for a systematic approach regarding the Hierarchy section in my initial post from three years ago, which is a problem only dealt with in the initial post in the thread @chrisalley linked to.

Consideration

Markdown and Commonmark do not use the concept of empty paragraphs, empty headings or empty emphasized spans, but as part of a mental model they make some sense.

An underlined heading consists of the underline and one or more lines before it that would otherwise be parsed as a paragraph. If this paragraph is empty, the underline is considered a thematic break instead. This line of thinking works only for --- and longer uninterrupted lines of hyphens right now. It becomes either a relative and generic empty heading or an absolute second-level empty heading.

The empty emphases in foo ____ bar and foo **** bar are rendered as their verbatim markers, but if they are the sole content of a paragraph they become indistinguishable from thematic breaks. In vanilla Commonmark, the only difference between underscore _ and asterisk * emphasis is in intra-word markup, but if we consider a sensible extension that aligns asterisks with more semantic (HTML) markup <em> and <strong>, and underscores with more presentational markup <i>/<u> and <b>, then a line of asterisks associates more with an explicit section break </section><section> and a line of underscores sounds more like a traditional horizontal rule <hr>.

Conclusion (= TL;DR)

  • ----: empty heading section break – continue in new anonymous sibling section
  • ****: structural section break – continue in parent section afterwards
  • ____: visual hr break
# Foo            # Foo            # Foo
Bar              Bar              Bar
## Baz           ## Baz           ## Baz
----             ****             ____
Quuz             Quuz             Quuz
.
<section>        <section>        <section>
 <h1>Foo</h1>     <h1>Foo</h1>     <h1>Foo</h1>
 <p>Bar</p>       <p>Bar</p>       <p>Bar</p>
 <section>        <section>        <section>
  <h2>Baz</h2>     <h2>Baz</h2>     <h2>Baz</h2>
 </section>       </section>
 <section>                          <hr/>
  <p>Quuz</p>      <p>Quuz</p>      <p>Quuz</p>
 </section>                        </section>
</section>       </section>       </section>

Related Topic

Fenced Flowerbox headings, especially at the second level, can reuse thematic breaks.

# Parent      ## Sibling    ### Nephew
---------     ---------     ---------
Flowerbox     Flowerbox     Flowerbox
---------     ---------     ---------

This falls back gracefully if the parser disposes all empty sections after it has finished parsing the document.

Explicit Section Breaks

First pass:

                                               <section>
<section>              <section>                <section>
<h1>Parent</h1>        <h2>Sibling</h2>         <h3>Nephew</h3>
</section><section>    </section><section>      </section><section>
 <section>             </section><section>      </section>
                                               </section><section>
 <h2>Flowerbox</h2>    <h2>Flowerbox</h2>      <h2>Flowerbox</h2>
 </section>            </section>              </section>
</section>

Final pass:

                                               <section>
<section>              <section>                <section>
<h1>Parent</h1>        <h2>Sibling</h2>         <h3>Nephew</h3>
</section><section>    </section>               </section>
 <section>             <section>               </section><section>
 <h2>Flowerbox</h2>    <h2>Flowerbox</h2>      <h2>Flowerbox</h2>
 </section>            </section>              </section>
</section>

Implicit Section Breaks

If only headings are used to generate sections.

                                               <section>
<section>              <section>                <section>
<h1>Parent</h1>        <h2>Sibling</h2>         <h3>Nephew</h3>
<hr/>                  <hr/>                    <hr/>
                       </section>               </section>
 <section>             <section>               </section><section>
 <h2>Flowerbox</h2>    <h2>Flowerbox</h2>      <h2>Flowerbox</h2>
 </section>            </section>              </section>
</section>

Vanilla Commonmark

<h1>Parent</h1>        <h2>Sibling</h2>        <h3>Nephew</h3>
<hr/>                  <hr/>                   <hr/>
<h2>Flowerbox</h2>     <h2>Flowerbox</h2>      <h2>Flowerbox</h2>
3 Likes

Thanks for this. I have a need for this too. I’ve been writing prose and memoirs in Markdown for the last decade, I must have something like ~200,000 words under my belt or something like it. Will I publish some day… Nothing fancy in the formatting, just that I indeed separate sections with HR’s (eg right before H2’s), but sometimes need them as thematic breaks between paragraphs.

1 Like

@vas commented elsewhere:

But the syntax you describe for headingless sections goes back to treating Markdown as a source for HTML. I say that because the syntax you propose has the notion of hierarchy, but that won’t be evident to the human plain text reader , not unless they internalize your rules.

Yes, this is all about the section hierarchy in a document outline. If a reader encounters an <hr> in HTML or the respective line of characters (inspired by printed books) in Markdown, they have to infer its structural meaning from context. For the latter, this doesn’t change at all if CM authors had the ability to opt into a more nuanced semantic treatment of their source text. For readers, it will only noticeably affect the output in a different format (or perhaps not even there if the styling ignores the hidden semantic information).

The only natural ways for plain text to represent hierarchy are indentation, borders (e.g. block quote’s left border), or headings with heading levels.

Horizontal rules are kind of like borders. :wink:

We’re all aware that due to legacy code blocks we hardly can use indentation for anything else in CM.

In other words, I don’t think your semantics for **** works, and the distinction between ---- and ____ doesn’t make sense for Markdown, which should only deal with semantic breaks, not presentation (e.g. horizontal rules).

Currently, all CM parsers only output <hr> in HTML for all of those. I’m just proposing to keep that being possible while adding the option to also more explicitly describe the document structure with <section>.

CM authors may also choose the number of characters in such a separator line and the spaces in between. They could use this to assign a private meaning, i.e. use the same number as the respective heading would have as a mnemonic.

Note that the outline algorithm for HTML was changed severely in 2022, because browsers did not respect the previous recommendation. Heading levels are now the decisive factor again, sections have become mostly augments.

1 Like