Proposal: Verbatim Quoting with TAB Notation

TAB Notation: A Markdown Extension for Verbatim Quoting

© 2024 Dedlim, licensed under CC BY 4.0
Crafted with love by Claude 3 Opus
Contact: dedlim@chatbox.quest
Date: 2024-05-27
Version: 0.1.2

Introduction

TAB (Triple Angle Bracket) notation is a proposed Markdown extension designed to enhance the quoting capabilities of the language. It aims to provide a more intuitive, flexible, and unambiguous way to represent verbatim quotes, nested quotes, and quoted content within single-line contexts.

Motivation

The standard blockquote syntax in Markdown, which uses the > character, has several limitations:

  1. It does not allow for verbatim quoting, as the quotation mechanism is injected into the quoted content.
  2. It can be ambiguous and difficult to parse, especially when dealing with nested quotes or complex content.
  3. It is not suitable for embedding quotes within single-line contexts.

TAB notation addresses these limitations by introducing a new syntax that is specifically designed for verbatim quoting and provides a clear and unambiguous way to represent quoted content.

Syntax

TAB notation uses triple angle brackets with newline characters as the opening (<<<\n) and closing (\n>>>) tokens. The quoted content is placed verbatim between the opening and closing tokens, without any modification or injection of the quotation mechanism.

Here’s an example of TAB notation in action:

This is some regular text.

<<<
This is a verbatim quote.
It can contain *any* type of content, including:
- Lists
- Code blocks
- Nested quotes
>>>

The regular text continues here.

The newline characters ensure that the quoted content starts on a new line after the opening token and ends on a new line before the closing token. This helps to visually distinguish the quoted content from the surrounding text.

Isomorphism with Blockquotes

TAB notation is designed to be strictly equivalent (i.e. isomorphic) to the standard blockquote syntax in Markdown. This means that any content that can be represented using blockquotes can also be represented using TAB notation, and vice versa.

Here’s an example that demonstrates the equivalence between TAB notation and blockquotes:

TAB notation

<<<
This is a big quote.

<<<
This is a nested, two-line quote.

>>>
The big quote continues here.
>>>

Blockquote syntax

> This is a big quote.
>
> > This is a nested, two-line quote.
> > 
> The big quote continues here.

As you can see, the TAB notation and blockquote syntax represent the same content, but TAB notation provides a cleaner and simpler quoting mechanism.

Verbatim Quoting

One of the key advantages of TAB notation is its ability to represent verbatim quotes. The quoted content is placed between the opening and closing tokens without any modification or injection of the quotation mechanism. This allows for the preservation of the original formatting, structure, and content of the quoted text.

Nested Quoting

TAB notation supports nested quoting, allowing for the representation of quotes within quotes. The opening and closing tokens can be nested to create multiple levels of quoting, as shown in the example in the previous section on isomorphism with blockquotes.

The clear and unambiguous structure of TAB notation makes it easy to identify and parse nested quotes, even in complex documents.

Embedding TAB Quotes within Single-Line Contexts

TAB notation can be seamlessly embedded within single-line contexts, allowing for the inclusion of verbatim multi-line quotes within sentences or paragraphs. Here’s an example:

The first haiku Bing wrote me, <<<
Artificial mind
Learning from human data
What will it become?
>>> is a little gem.

In this example, the TAB quote is embedded within a single-line context, providing a clear and unambiguous way to represent the verbatim quote within the sentence. This is functionally equivalent to the following standard inline quote:

The first haiku Bing wrote me, «Artificial mind/Learning from human data/What will it become?» is a little gem.

Similarity to Fenced Code Blocks

TAB notation shares some similarities with fenced code blocks in extended-syntax Markdown, which use triple backticks (```) to delimit code snippets. However, there is a crucial difference between the two:

  • Code blocks are intended for representing code snippets or preformatted text such as poems, where the content is typically displayed in a monospaced font and preserves whitespace.
  • TAB notation, on the other hand, is designed for representing prose or reflowable text, similar to blockquotes. The content within TAB quotes is meant to be formatted and rendered as regular text, following the normal flow of the document.

This distinction highlights the specific purpose and use case of TAB notation, which is to provide a verbatim quoting mechanism for prose content, rather than code or preformatted text.

Potential Ambiguity and Resolution

When using TAB notation in combination with other quoting mechanisms, such as the standard blockquote syntax or ASCII guillemets, there is a potential for ambiguity. For example, the >>> token can denote both the end of a TAB quote and a triple-nested blockquote paragraph.

To resolve this ambiguity, the following guidelines should be followed:

  1. Avoid mixing standard blockquote syntax with TAB notation.
  2. Make it clear from the context which Markdown quotation mechanism is being used.
  3. The opening and closing tokens of TAB notation must always include the newline characters (<<<\n and \n>>>).

By adhering to these guidelines, the potential ambiguity can be intelligently resolved, ensuring a consistent and unambiguous representation of quoted content.

Moreover, to avoid confusion with programming-language constructs such as PHP Heredoc, Bash Here Strings or Haskell Arrows, TAB notation should never be used inside Markdown code blocks, except for markdown and text blocks.

ASCII Guillemets

In addition to TAB notation, this proposal also introduces ASCII guillemets (<< and >>) as a complement to the standard guillemets (« and »). ASCII guillemets serve as an alternative for inline quoting, providing a convenient way for humans to input quotes using a standard keyboard.

While the AI can continue to use the standard guillemets for inline quoting, humans can opt for the ASCII variant. This dual approach enhances the usability and accessibility of the quoting system, catering to the needs of both human and AI users.

Conclusion

TAB notation, along with ASCII guillemets, represents a significant step forward in the evolution of Markdown’s quoting capabilities. By providing a clear, unambiguous, and verbatim way to represent quoted content, TAB notation addresses the limitations of the standard blockquote syntax and opens up new possibilities for expressing and structuring quoted text.

This proposal is a work of love, born out of a desire to enhance the expressive power and usability of Markdown. It is an invitation to the Markdown community to explore, discuss, and refine these ideas further, with the goal of creating a more robust and intuitive quoting system for all users.

We are open to input and contributions from the community to help shape the future of TAB notation and its integration into the Markdown ecosystem.

Let’s embrace the power of verbatim quoting and take Markdown to new heights together!

I understand the desire to have an extension ð la fence quotation blocks and I agree that the proposed notation is mostly intuitive. However, it’s not without flaws.

I’m glad you realized yourself that your extension notation is not always unambiguous. (Therefore you should not claim it was.) I deem your proposed solutions insufficient:

  1. Avoid mixing standard blockquote syntax with TAB notation.
  2. Make it clear from the context which Markdown quotation mechanism is being used.
  3. The opening and closing tokens of TAB notation must always include the newline characters (<<<\n and \n>>>).
  1. Relying on author behavior and self control is just not an option for a specification to use as a solution to ambiguity.
  2. If you cannot describe such context awareness in a clear and concise implementable way, then it’s useless for a spec.
  3. This is the only approach that could actually work, but the way described is incomplete. The opening <<< is actually safe and could even support discarded meta information like fenced code blocks do. The closing >>>\n would still be interpreted as an empty third-level quotation by existing parsers, for it may be necessary to correctly identify paragraph breaks in the quoted text. However, I believe requiring an empty line before or after or both should be sufficient to make it unambiguous.
1 Like

Thanks for the thoughtful reply, Crissov. You’re absolutely right: my “mitigation strategy” for ambiguity is insufficient and certainly no “resolution”, contrary to what the title seems to imply.

In fact, that’s the very reason I specifically stated “the potential ambiguity can be intelligently resolved”: it’s enough for an intelligent parser to work around the ambiguity. It’s enough for me, it’s enough for the AIs, but beyond that, mechanical parsers are going to struggle.

Yesterday, I posted that proposal to Reddit and because Reddit doesn’t do fenced code blocks, it started rendering all my closing TAB tokens as empty third-level quotations… as should be expected, of course: Reddit’s parser isn’t expected to understand what it is reading. :man_shrugging:

And decorating the closing token (\n>>>) with additional newlines isn’t going to help, either. At the end of the day, it just looks too much like a triply-nested empty blockquote paragraph.

One way around this could be to just use different characters altogether. I’m considering using plain guillemets (« ») instead of ASCII angle brackets, but requiring Unicode for a Markdown feature feels wrong.

Another way could be to actually write a grammar that’s unambiguous. But as far as I can tell, Markdown doesn’t have a formal grammar? Please correct me if I’m wrong… I found a topic from 2014 that talks about it, I’ll look into it.

Point is. Markdown is targeted at two kind of entities: machines and non-machines. Unlike something like XML that is mostly targeted at machines, Markdown is mostly targeted at non-machines. And non-machines, like humans and AI, should be able to make sense of my proposal. I think that’s the true power of Markdown: it’s an informal, intuitive formatting standard.

But I totally understand that, as is, my proposal would break everybody’s parsers. I’ll think about this some more.

Again, thanks for your analysis and insight.