Virtual Meetup - "Parsing macros in legacy languages – the example of SAS"

Hi Community,

I am happy to announce that this Thursday (the 5th of May), Alessio Stalla will hold the discussion about “Parsing macros in legacy languages – the example of SAS”.

Also, remember that we changed the link to join the Meetup!

Parsing legacy languages can be challenging because most of them were designed without following good practices for parsers, compilers and supporting tools. As an example, we can find macro systems that make it very hard to implement a high-quality parser. The SAS language is such a case.

Alessio Stalla is a Language Engineer who designs and builds languages and supporting tools. Most of the time, you can find him crafting textual or projectional editors/IDEs, static analysis tools or model-to-model transformations. Also, between projects, he writes articles on Strumenta’s blog about what he’s learned. He is also the lead developer of the open-source web application framework, Portofino. He contributed to ABCL, the Common Lisp implementation on the JVM, and marginally to the Groovy language. He knows the JVM ecosystem quite deeply and routinely and actively participates in the local Java User Group and follow several JVM development mailing lists. In the past, he has designed and implemented Java enterprise applications using a variety of technologies, including Spring and JEE applications, Alfresco customizations, Liferay portlets and extensions, Camel/ServiceMix, BPMN engines, and more.

Registration for the Virtual Meetup

After registering, you will receive a confirmation email containing information about joining the meeting. It will also permit you to add it to your calendar.

Time

It is hosted on Zoom at 6 PM GMT+1/CEST (you can use this link to figure out which time is in your timezone: https://www.thetimezoneconverter.com/?t=6%3A00%20PM&tz=Roma&).

Cheers,
Elisa

P.S. We get a recurring question: “Are presentations recorded?”. The answer is not, and the reasons are explained here On recording Virtual Meetups - #7 by voelter

Here there is the transcript of the chat:

17:51:03 From Rene to Everyone:
	Hello!
17:51:20 From Alessio Stalla to Everyone:
	Hello Rene!
17:51:21 From Federico Tomassetti to Everyone:
	(Sorry for speaking Italian Rene!)
17:53:15 From Rene to Everyone:
	Hey Mike! these slides are very nice
17:59:02 From Peter Wasilko to Everyone:
	I made it back in time!
17:59:10 From Federico Tomassetti to Everyone:
	Welcome Peter!
18:01:42 From Peter Wasilko to Everyone:
	I have been writing Crystal macros to simplify working with XML.
18:06:03 From Federico Tomassetti to Everyone:
	I never tried Crystal, is it the statically typed version of Ruby or is something else?
18:07:52 From Peter Wasilko to Everyone:
	Just as you said. It relies on macros where ruby would use runtime reflection.
18:10:25 From Peter Wasilko to Everyone:
	So it has a static typeof() and .as() method to narrow an instance of a union type so its methods can be invoked
18:10:45 From Federico Tomassetti to Everyone:
	I should try it, I used to like Ruby and JRuby a lot
18:13:08 From Peter Wasilko to Everyone:
	Kick its tires on a side project. It generates blissfully small binaries a little over 2 meg in size and has a growing ecosystem of “shards” based on the most popular Ruby gems.
18:14:39 From Robin Way to Everyone:
	legacy ~= declining market adoption
18:14:42 From Peter Wasilko to Everyone:
	It has standard library support for JSON, YAML, and XML and there are a couple of parser generators on tap
18:15:56 From ALAN CHURCHILL to Everyone:
	SAS is approx. 50 years old and is based on PL1. The Base language, which macros are part of, has been mostly stale for 20 years. However, it is used heavily in large corporations and govts. It also has to be licensed yearly so companies have to continue using it due to their existing investment. Without paying for it, you can't run the code. And...it costs a lot.
18:16:40 From ALAN CHURCHILL to Everyone:
	At the time, SAS was ground-breaking but it is hard to change due to its basis being C
18:17:14 From Peter Wasilko to Everyone:
	For me it would be a language I inherited an inadequately documented code base in that would take too long to reverse engineer to rewrite in something I like better.
18:17:46 From David Benn to Everyone:
	From the semantics/best practice viewpoint, type safety is perhaps an example, e.g. thinking about Rust vs C. On the tooling front, e.g. cargo vs make, dependency management (e.g. in Go)  being “part of the language system” or not.
18:18:33 From Federico Tomassetti to Everyone:
	I think one aspect is also the availability of developers: if you cannot find anyone younger than X years old that can program in the language, that is a sign that the language is legacy
18:18:39 From Paul Spencer to Everyone:
	is that ordering not the IDE "tail" wagging the Language "Dog"
18:19:31 From Robin Way to Everyone:
	@Federico: to your point, I routinely share with clients that most SAS programmers look like me (eg, greybeards), and I have yet to get any reaction to the contrary
18:19:47 From ALAN CHURCHILL to Everyone:
	SQL is upside down. LINQ fixes it in C#. Not juse an IDE issue
18:20:04 From Paul Spencer to Everyone:
	^ which indicates that ide's and languages are separate, as the text based language engineers keep trying to tell us :)
18:20:19 From David Benn to Everyone:
	yes, for example, the occasional job advert for IBM mainframe OS assembly programmers
18:23:35 From Federico Tomassetti to Everyone:
	If SAS is older than C, maybe we should call them SAS-style macros and not C-style macros
18:23:44 From Herman Peeren to Everyone:
	Robin Way: ctrl-c ctrl-v?
18:25:06 From Peter Wasilko to Everyone:
	On the down side, Crystal macros give you less syntactic flexibility than Scheme and they have to return complete AST nodes so you can't just generate a bare chained method invocation or when clause to inject into a case statement.
18:26:44 From Federico Tomassetti to Everyone:
	The talk about parsing C++ was given by Botond Ballo https://strumenta.community/meetup/virtual-meetup-botond-ballo/
18:27:33 From Peter Wasilko to Everyone:
	But they can nest and parameters support a virtual AST class with body and id methods that greatly simplify things.
18:28:16 From Robin Way to Everyone:
	The SAS platform probably contains at least 5+ languages; viz, Ted Lasso: "How many countries are in this country?"
18:29:16 From ALAN CHURCHILL to Everyone:
	Robin: I think I found 6 meta-languages on top of Base SAS. I need to list them out (macro, DS2, dataset modifiers, etc.)
18:29:58 From Robin Way to Everyone:
	AF/SCL, ODS, proc model, IML, Graph/Annotate...
18:30:15 From ALAN CHURCHILL to Everyone:
	We need to get together and get a list ;-)
18:30:34 From Gregg Irwin to Everyone:
	Red is the same way. We have a number of standard dialects (DSLs) in the core language.
18:32:45 From Meinte Boersma to Everyone:
	We could call the grammar ".* EOF" the Pangea-parser. (cf. “island grammar/parser”)
18:33:24 From Federico Tomassetti to Everyone:
	This is an idea for a very commercially successful parser 🙂
18:33:26 From Gregg Irwin to Everyone:
	"Universal Parser" to parallel Chomsky's Universal Grammar?
18:39:16 From Robin Way to Everyone:
	Many instances of real world SAS macro code written by naïve, inexperienced programmers, needs to be refactored before applying a grammar and parser to it; there are many, many examples of anti-patterns, for which grammars shouldn't have to tackle
18:39:23 From Glen Braun to Everyone:
	Sounds like the halting problem to me
18:39:34 From ALAN CHURCHILL to Everyone:
	Thank you for saying no formal grammar. SAS handles macros first then the base language it generates.
18:39:53 From ALAN CHURCHILL to Everyone:
	The way I handle macro code parsing is to let SAS do it. You can tell SAS to decipher the macros and remove macros. It writes out to a standalone file and converts the macros to regular SAS code. Not perfect but removes that macro parsing complexity (which is enormous).
18:40:23 From Gregg Irwin to Everyone:
	Robin, same for us, and probably any tool that let's you be very creative. With great power comes great refactoring?
18:41:27 From Patrick Viry to Everyone:
	What's the behavior when you encounter an unsupported case ? Rely on error recovery ?
18:42:07 From Federico Tomassetti to Everyone:
	The behavior I think is: the developer extends the parser to cover also that case
18:44:39 From Peter Wasilko to Everyone:
	I wonder if anyone ever used M4 in SAS rather than SAS's macros.
18:45:39 From Peter Wasilko to Everyone:
	I particularly love M4's Divert and Undivert.
18:46:25 From Federico Tomassetti to Everyone:
	I cannot think of a language where comments can be nested
18:46:38 From Gregg Irwin to Everyone:
	C
18:46:40 From ALAN CHURCHILL to Everyone:
	Anything that can generate SAS code is fair game. I use SAS constructs to generate SAS code and then execute it vs macro reliance. I have also used outside tools to generate SAS (ex. C#). The macros just generate SAS
18:47:02 From Peter Wasilko to Everyone:
	https://www.gnu.org/software/m4/manual/m4.html#Intro
18:48:42 From Glen Braun to Everyone:
	I've seen lexers that need to understand strings so you can have a string with a comment in it.
18:50:10 From Peter Wasilko to Everyone:
	I have seen the topic of nested comments come up in the wild with respect to their possible implementation in PEG parsers and the Literate Programming context where one would want to code comments that could embed LaTeX Math Mode.
18:51:10 From ALAN CHURCHILL to Everyone:
	How would you handle macro functions that affect the way the SAS elements are parsed?
18:53:21 From Federico Tomassetti to Everyone:
	Alan, I will ask this question to Alessio when he finishes the slides
18:55:06 From Gregg Irwin to Everyone:
	Q. What have been some of the end goals for the projects where macro parsing was used?
18:56:09 From Robin Way to Everyone:
	the sas macro capability was published after the core or "base" sas capability was published. this is because SAS "base" is tightly bound to a dataset (like how sql or Spark dataframe or Pandas is also bound tightly to a dataframe); the sas macro capability frees the sas programmer from worrying about being bound to data at all.
18:56:41 From Robin Way to Everyone:
	and in doing so, sas macro is much like the role of Python for declaring globals outside of the context of pandas or PySpark dataframe operations
18:57:38 From Gregg Irwin to Everyone:
	Thanks Robin, but I mean in Alessio's projects. What are they doing with the parse macros or code they're in?
18:58:36 From Robin Way to Everyone:
	sorry @Greg, my post and your question appeared at the same time. I wasn't trying to address your question. I was merely providing some clarity on something we covered earlier.
18:58:58 From Gregg Irwin to Everyone:
	Ah, got it. Async is hard. :^)
19:00:12 From Peter Wasilko to Everyone:
	The worst thing about Crystal is that it makes Google assume you are interested in Chemistry and Materials Science, giving you tons of noise in search results.
19:00:32 From ALAN CHURCHILL to Everyone:
	I will try and answer Greg. The end goal is normally to figure out legacy code so it can be converted into something else. Either a logical structure or a different language (or tool like PowerBI). That is what I see in consulting
19:01:47 From Gregg Irwin to Everyone:
	Thanks @Alan, I've done a lot of legacy migration code generation as well, but haven't had to address macros. Since Red supports macros, I'm looking ahead.
19:04:38 From ALAN CHURCHILL to Everyone:
	Look at the function SUPERQ as an example
19:04:39 From Mike Cargal to Everyone:
	Now I know enough to stay FAR away from SAS!  (Wow, just wow)
19:05:12 From Glen Braun to Everyone:
	makes me think of the song "I am my own grandpa"
19:05:47 From Meinte Boersma to Everyone:
	Is SAS worse than MUMPS?
19:08:05 From Gregg Irwin to Everyone:
	Thanks for the answer Alessio!
19:10:13 From ALAN CHURCHILL to Everyone:
	Every proc is its own language. There are more 400 procs
19:11:33 From ALAN CHURCHILL to Everyone:
	It is used extensively in businesses and governments for doing analytics, ETL, reporting.
19:11:53 From Óscar Fernandez Sierra to Everyone:
	Thanks, Alan
19:12:14 From Paul Spencer to Everyone:
	when landin wrote "The next 700 languages" paper SAS said, we've got you covered!
19:12:40 From ALAN CHURCHILL to Everyone:
	Due to its expense, it is primarily used in large companies. Most government statistics come from SAS systems (that I am aware of). It is used a lot in pharma for drug trials, etc
19:13:36 From Paul Spencer to Everyone:
	gotta leave - .Thanks Alessi!
19:13:37 From Sérgio Ribeiro to Everyone:
	From Wikipedia:
	«The SAS language is a computer programming language used for statistical analysis, created by Anthony James Barr at North Carolina State University. It can read in data from common spreadsheets and databases and output the results of statistical analyses in tables, graphs, and as RTF, HTML and PDF documents.»
19:13:39 From ALAN CHURCHILL to Everyone:
	SAS is not modern and is very expensive. It is also hard to find new coders. Hence, large companies are hitting walls and paying a lot to keep their investment running
19:14:16 From Herman Peeren to Everyone:
	BTW: nice "legacy style" slides, Alessio! Thank you for the nice presentation. Nice to see the more general conclusions about language design and being pragmatic.
19:14:23 From Peter Wasilko to Everyone:
	Thanks for the superb presentation!
19:14:42 From ALAN CHURCHILL to Everyone:
	Thanks Alessio.
19:14:48 From Glen Braun to Everyone:
	Very interesting, thank you