Hi, I'm Rob Pulsipher

Hello. My name is Rob Pulsipher. I’m a software developer located in New Mexico, United States. I am working on a new reporting engine which uses an ANTLR parser, which has been a lot of fun.

I look forward to participating in the forum and learning more from everyone here.

3 Likes

welcome, Rob! Do elaborate on the parser if possible

Sure, no problem. The goal of my reporting engine is to generate PDF documents. The language is kind of a cross between HTML/CSS (and various templating languages), LaTeX, Markdown. It uses Python style indentation (mainly so that verbatim code can be included naturally). I’m still working on the parser, which is pretty complicated. Since I’m using ANTLR4, I’m essentially implementing a “pre-parser” to implement the indentation and some other features.

Here is a sample of the syntax:

\# Here is and example with implicit markup:

! Here is a header

!! Here is a subheader

This is a paragraph.

This paragraph has \b{bold}; and \u{underlined}; and \i{italicized}; text.

And here is a bulleted list:

* item
* another item
** and a sub item

And here is a numbered list:

. item
. another item
.. and a sub item

\# Here is equivalent with explicit markup:

\h:\t:Here is a header
\h(level=2):\t:Here is a subheader
\p:\t:This is a paragraph.
\p:
  \t:This paragraph has
  \b:\t:bold
  \t:and
  \u:\t:underlined
  \t:and
  \i:\t:italicized
  \t:text
\p:\t:And here is a bulleted list:
\ul:
  \li:\t:item
  \li:\t:another item
  \ul:
    \li:\t:and a sub item

\p:\t:And here is a numbered list:

\ol:
  \li:\t:item
  \li:\tanother item
  \ol:
    \li:\t:and a sub item
1 Like

I think from a design point of view using the backslash character is a mistake. Firstly it has a very long tradition of being used to enter non-typeable characters, and if you steal the backslash for that purpose you will end up eventually being forced to use some other punctuation mark. So you haven’t gained anything by doing this.

Also for comments, why not just use single # character, and any time you want an actual #, just force the user to put in ##. You could consider a # to only mean comment on the first character, but that introduces problems with indented code. Pound sign is an extremely low frequency character, and commenting will be a significant percentage of lines in an actual program, so best to make it easier to comment. # is an extremely ugly sequence.

Since there are so few punctuation characters that are fast to type, it always a tricky thing to manage. You are already extensively using the colon, because that is a 1.5 character key. I don’t blame you for wanting to avoid , /t: is faster to type of course.

Thanks for the feedback. It’s really valuable to get different perspectives as I’m developing the grammar, and I really appreciate it.

My goal with putting everything under the backslash was to avoid having to escape any other character. I kind of deviated from that with allowing Markdownesque formatting for headers and lists, but it’s only necessary to escape these if you want to place a literal ! or . at the beginning of a line. So, generally, as is, the backslash is the only thing you have to worry about escaping.

In the context of the prose text, I don’t think there is any need for standard escape sequences (e.g. for whitespace), since formatting is provided by other means (styles/stylesheets). (In the context of string literals, I’m planning to have the escape sequences work as usual.)

I didn’t mention in my post above, but using triple-colon instead of colon starts a preformatted indented block, so it’s pretty easy to insert code samples in a document.

LaTeX, for example, uses backslash like this prolifically. I’m curious, have you run into a problem with that kind of feature?

I debated the # for comments. I definitely waffled on that one. I agree it’s kind of ugly. And I agree commenting is very common, and much more so than using pound. But HTML drives me crazy with having to escape lt, gt, and ampersand. I don’t use them very often (unless talking about HTML), but for me, this is almost always an oversight that requires a correction cycle. The low frequency of use actually causes the problem for me. So I guess I’m trying to avoid having syntactic gotchas at the expense of having to type an extra character. But I’m still debating this one. There are definitely some issues with it. For example, what if there is a comment inside of the markup? Should that have a backslash in front of the pound too, or omit it since it’s already markup?

As an aside, I did debate using pound for headers and backslash-pound for comments, and decided that THAT was REALLY ugly, and opted for exclamation for headings to avoid confusion.

I didn’t understand the “1.5 character key” terminology you used. Do you have a reference for that terminology?

This information comes from working on the WordStar 2000 word processing project ages ago. The designer of the product was very senior engineer knowledgable in ergonomics from the Navy, and they counted keystroke that required a shift or control key to be 1.5 characters, and characters that are not normally memorized by typists count as 2 (like &). You could consider function keys to be almost 3 characters in terms of cost, because of the stupid DEC now ISO standard keyboard layout which puts them way at the top, instead of the superior two columns of 5 that the IBM AT keyboard used. A huge step backwards in ergonomics IMHO. Note that double characters only count as 1.5 because people can type those very fast.

As for Latex, as a Knuth product, you can assume it has terrible ergonomics, but due to the lack of a free competitor, people still torture themselves with LaTeX. I wouldn’t copy anything from LaTeX; it is infamous for spaces dramatically changing the meaning.