A little about my interests and what I hope to learn from this group.
I previously helped design the PreTeXt markup language, which is
a human-readable human-writable XML vocabulary for writing scholarly
documents, with an emphasis on textbooks. PreTeXt converts to any
output format (in theory, anyway). We have a focus on accessibility,
and the HTML output has many accessibility features (it is better than
a PDF, for example). PreTeXt also converts to Braille.
The math in PreTeXt uses the same syntax as LaTeX. This is an issue
because LaTeX (and also any other reasonable option) is not able
to distinguish between common ambiguous expressions. For example,
does g(x + h) mean “the function g evaluated at x plus h” or does
it mean “the quantity g times the quantity x + h”? The answer is that
it is impossible to tell without context. My motivating use case is
a screen reader (assistive technology which pronounces the contents
of a web page, for people who cannot see the screen). How can
a screen reader correctly pronounce g(x + h)? Currently the only
option is to guess. But guessing is not a good solution, so what we
need is a way to encode the meaning of the expression.
The author who wrote g(x + h) knew exactly what they meant. So what
I want to do is develop a markup language for mathematics which makes
it easy and natural for the author to indicate the intent of what they
are writing. Emphasis on “easy and natural”.
I have made good progress in designing the syntax, so now it is time
for me to think about parsing the expressions. This is scary to me,
because it seems complicated, and I have not done this before!
I have read some introductory material (including one written by the
people who organize this forum). I do no think it is plausible that
I can use a general purpose parsing program. One reason is that
the grammar is not context free. (Probably not: I have not written
it out in detail. But every expression is unambiguous, or else it
cannot be used for its intended purpose.)
Another reason I think I need to write a custom parser is that I need
to retain information from the original markup, which goes beyond
the meaning of the symbols.
Here is an example (writing this in ASCII. Pretend that “x” is the
“times” sign and “.” is a dot raised above the baseline.) All of
these expressions have the same meaning:
3 x 5 = 15
3 . 5 = 15
3 5 = 15
All of them say “3 times 5 equals 15”. The written material as
presented to the reader has to preserve whether the multiplication
is indicated by a “times”, or a “dot”, or the implied multiplication
indicated by a space, because there may be good pedagogical
reasons the author chose to write it that way. So (it seems to me)
a parse tree does not capture all the information I need.
Any pointers on writing a parser from scratch, and preserving information
in the original source, would be welcome!
Here is a description of this new math markup language (which I call