reStructuredText Block Structure¶
Read time: 2.2 minutes (224 words)
The specification document for reStructuredText was published by David Goodgear in 2003 as part of the docutils project. That document details the components of a reStructuredText file. This specification was extended somewhat by with the release of Python Sphinx. The changes were not major, and mostly in the form of new directives that allow a set of documents to be processed as a complete “document”.
PyLiT starts off looking at any reStructuredText document as a linear set of basic blocks. Most often, these blocks are separated from each other by blank lines, but the structure gets more complex as different types of elements are seen.
In this note, we will look at the specification, and the form of the top-most elements of a document file.
Text Lines¶
docutils processes a reStructuredText file line by line. Each line has zero or more white-space characters, followed by text, and ending with a newline marker.
Several elements depend on indentation, similar to that used in Python code, to identify additional blocks that need to be processed. A simple example is a literal block, where the line structure must be preserved, but each line is indented.
Using Antlr for Grammar Tests¶
At one point, I was thinking about building a formal grammar for reStructuredText and building a new parser for the language using python code generated by Antlr. However, that may be overkill for what i want to start with. still, using Antlr let’s us think through the top-level structure.