Reinventing the wheel

Lotes · April 4, 2020, 8:14am

A little story you may know…

I have an idea. I have a favorite language. All problems are basically clear and solvable. So let’s write the project down…

Some hours later…

Damn there is no NPM package for computational geometry… port it? Write it yourself? As binding? How to link a C library… and so on. So normally I end up with a depth-first search along missing functionality, because it is only implemented in C, Python, Javascript…

Why? In compiler construction I learned that there are M frontends (parsers) and N backends (optimizers & generators). To overcome M × N transformations, we use intermediate code for only M+N transformations. But in general that does not happen…

I think we as language engineers are able to clean up that mess. I have some ideas… but before presenting anything I would like to hear your opinions.

Do you feel the same?
How do you overcome this situation?
Do you see other problems that I did not realize, yet?

Maybe I am a bit naive. I know there are some little solutions like the language Haxe and, I guess, LLVM. Actually every language covers a certain purpose. There are also different paradigmas like functional, imperative, logical… but somehow they influence each other and become more similar to each other.

digital-ember · April 4, 2020, 9:07am

In order to really give you a meaningful answer, can you define a problem statement? I’m not certain I understand the actual problem you are addressing.

Are you saying there are too many languages/technology stacks/ecosystem? A general lack of their interoperability? Or is the problem that people only have a favorite language and are not willing to learn something new even if it would enable them to create their project quicker?

In short: what problem do you want language engineers to solve?

Lotes · April 4, 2020, 9:35am

For example I want to write a web application that does some geometry visualization. Let’s assume that the library which is solving parts of my problem only exists in C++. Like computing a Delaunay triangulation. In the web I am limited to Javascript. C++ is not an option, only on the backend.

I mean why can’t we solve a problem once and use it everywhere in every language? Maybe I am a little “Language Nazi” who wants to do everything in one language.

I hope I could clarify my problem.

pjmolina · April 4, 2020, 9:49am

Every language was designed with a set of goals or requirements. Or at least, in ausence of design, languages grow organically to solve some kinds of problems.

Therefore, that’s why there is no “Silver Bullet” and why good architects uses to respond “It depends” when asked where is the best tech to do X? It always depends on the requirements to fullfill.

Given said that, in generic, let’s go concrete: you can always port code to other environments using emulation or running code inside containers or virtual machines. Of course it is a cost/benefits approach to measure if this makes worthy.

For example, you can port C or C++ code to run in the web with WebAssembly (a virtualachine) and interop with JavaScript. See for example https://medium.com/@tdeniffel/pragmatic-compiling-from-c-to-webassembly-a-guide-a496cc5954b8

digital-ember · April 4, 2020, 9:50am

You’re still not describing a problem, but a very specific situation.
C++ was developed at a time where the web wasn’t a thing, so of course it is designed with a different use case in mind. It’s like complaining why you cannot use a rotary phone to send a SMS.
Also, there is cross compilation from C++ to Js, or C++ Web toolkit. I don’t know these technologies in any detail, but again, I’m not sure what problem you are trying to solve, or how you would imagine how this should work.

meinte.boersma · April 4, 2020, 11:46am

I think the challenge with this problem (insofar it’s well-defined) is to find a good common (level of) abstraction for the N+M things. Frameworks like LLVM solve this, but not necessary in a way that fits all of those N+M things well simultaneously. After all, you have to take the “minimum” (insofar that’s well-defined!) of the levels of abstraction across N+M things. With an approach like this, you might end up with fewer combinations (N+M), but the sum (!, another not-so-well-defined thing) of the complexities of those might end up being larger than if you were to do M x N.

At the same time, I too am frustrated by the state of things were every utility domain (like computational geometry) has to be re-implemented for all languages, and sometimes even versions of those languages.

Lotes · April 4, 2020, 11:55am

I will not fight against your arguments. You are all right. But problems that I solve in C, I solve almost the same way in:

Javascript… I can even remove cleanup of runtime objects
Haskell… I “just” have to remove the notion of mutable state.
C++… I can even group stuff to classes

Somehow I wish there was a metamodel or language for every problem that is generic or reusable enough. Write once and convert it to every other language or platform.

Porting from language A to B must be of course not that easy. At least not for a machine.Think about deciding which data structures to use.

cristian.vasile · April 4, 2020, 1:31pm

I understand what you are trying to communicate.
I 'll do an analogy with financial system, where if you take a loan with a small interest rate, at the maturity of contract you shall pay back the amount you had borrow plus the interest.
Seems fine, but there are brilliant minds who tell us that interest rate multiply with millions of loans, that money should be printed by central banks, so any credit trigger an invisible action to create fiat money.

Now, on software business, based on above story every line of code trigger a small debt called technical debt, and multiply each line of code with N years and one can see the invisible mountains of technical debt we create.

Maybe that WASM stuff could be an answer, looks promising, I mean, transpile packages/libraries to wasm, then transpile wasm to machine code and use code on that server/computer. (https://wavm.github.io/)

The main problem I see is that a lot code is redundant and more or less full of bugs, this is maybe the reason big companies spend millions and millions of USD/EUR/GBP to have their own army of coders.

One path down the road might be to have 2,3,4 modern languages, with focus on distinctive domains of business or engineering (see Chapel created by Cray engineers), able to transpile to C (this will be a huge advantage) or Clang IR / MLIR and from this compile the code to low level assembly.

digital-ember · April 4, 2020, 1:56pm

Ever heard of what my former boss, Charles Simonyi, envisioned with “Intentional Programming”? Sounds like you are looking for a solution along these lines, maybe:

Wikipedia article (not really very insightful, though)
1995 tech report
2006 paper
New York Times article
TechRadar piece

rafael · April 4, 2020, 5:26pm

Do you feel the same?

Yes! Or at least I used to.

How do you overcome this situation?

My personal choice (back in 1996, when I was in my 6th language in 6 years) was to adopt a general purpose bytecode-based language (Java/JVM) with a comprehensive class library and a diverse community (so, effectively, vendor-independent), which I have used happily for all sorts of applications (from business applications to middleware and development tools, from desktop to mobile and web applications), including using other JVM-supported languages such as Python, Groovy, Kotlin and Xtend, and running surprisingly consistently on many different OSes and architectures. CLR/CLI users have enjoyed similar benefits as well.

Initiatives such as GraalVM seem to take the idea even further by not only providing a common runtime, but tools for building compilers targetting it (and relying on LLVM for producing native executables for multiple architectures).

Finally, I figured the main challenge is that programming languages in general provide an insufficient level of abstraction. So, instead of getting all those low-abstraction languages to inter-operate, I moved on to try to build languages (or encounter existing ones) that provide a more appropriate level of abstraction. In order to do that, you invariably need to constrain your target problem space, vertically and/or horizontally.

Do you see other problems that I did not realize, yet?

Isn’t this a problem more or less solved by now? What shortcomings do you see in things such as rhe JVM, CLR/CLI, LLVM and GraalVM?

Also, it seems the technical problem may be just as challenging as the “marketing” problem - not only you need to get people to want to use your language/runtime, but tool developers to write frontends and backends for it.

Maybe I am a bit naive.

I believe some naiveté is actually required for people to achieve true breakthroughs, so that would not be a problem per se.

But what do you have in mind?

Lotes · April 4, 2020, 7:45pm

Thanks for participating this discussion… web assembly, LLVM and others are a good step. When they enable me to use C code in my Javascript I am a bit more happy :)… to be honest I only used Java and C# so far, but good to know.

I also liked the technical dept argument.

Intentional software reminds me on Jetbrains MPS. But that was not the thing that I had in my mind.

It is just an idea and perhaps full of mistakes…
Basically it is about porting code automatically from one language A to language B…
You will need an intermediate language, like XML. Each language may have its own XML namespaces for keeping lexing and parsing separated. Then comes a high-level normalization… like classes, closures,… concepts that will be modelled differently in every target language.
You will need a translation of system libraries… like System.out.println("hello"); is translated to printf("hello")…
Duplicated code created by this translation could be detected and moved out to methods automatically or even be removed.

It is maybe not very smart and would need a lot of effort… but I am somehow desparated… it was a general observation I wanted to share.

In German I would starting to milk mice… a German idiom to express that you are full of anger… es ist zum Mäusemelken…

rafael · April 5, 2020, 1:11am

Does this fit the bill, Markus?

https://www.graalvm.org/docs/reference-manual/languages/llvm/#interoperability

rafael · April 5, 2020, 1:57am

Right, correct me if I am wrong, but you are basically transpiling twice, first to a common intermediate language (note: even if it was XML-based, it would still be a language, just having a neutral concrete syntax; concrete syntaxes are the least important part of languages, abstract syntax and semantics are what make them what they are), and then from that intermediate language to the final target language.

That basically requires that we had a universal (abstract) syntax and semantics that form a superset of all all your input languages. Your intermediate language may or may not have some sort of standard library, but remember what is a library feature in one language is a language feature in another, so I guess you would need to represent in the intermediate language/standard library a superset of all languages and the standard libraries they support. You could consider to mapping the standard library of a source language into your intermediate language so that would be translatable to the final target language as well - however, that means you would be rewriting the source language’s library in the target language.

I believe something like what I describe would not work in a practical way for what I hope now are obvious reasons. What I think could work is using an intermediate language that is one level of abstraction above the source and target languages. The translation of the source language would need to figure out the actual intent behind the code (why it does what it does - for instance, it is not a for loop based on a counter, but a collection mapping), and map that intent to the corresponding higher-level concept in the intermediate language (“collection traversals”). Then your tool would map that higher-level representation of the code to the target (lower-level) 3GL.

The second part is the easy part. That is what model-driven generators do. The trick is pulling off the first part. Not even programmers sometimes can fathom the intent of code written by a fellow developer, or by themselves a couple months before (guilty!). This is the same obstacle that prevents effective reverse engineering of code into models, which is an essential part of round-trip engineering. Personally, I think RTE between code and models is a lost cause, so I focus on model-to-code generation only.

Sorry if I rain on your parade. If you feel like you may have found a loophole in this dark picture I painted, I am all ears!

cristian.vasile · April 5, 2020, 8:21am

Take a look at deepcode / https://www.deepcode.ai/
I asked them to create a service able to transpile from C to GO or Java … X lang -> Y lang.

That basically requires that we had a universal (abstract) syntax and semantics that form a superset of all all your input languages.
I like this idea.
As you underline, what it works on language A is implemented in a package in language B.
Other barrier is the multi threading stuff, which in 90% is a nightmare.

cristian.vasile · April 5, 2020, 8:22am

Keep an eye on V language, it’s still in alpha stage but incorporate good ideas.
https://vlang.io/

Lotes · April 5, 2020, 10:12am

I have some other questions that are not in the scope of this topic… maybe I create some “forks” xD

Your arguments are valid, @rafael … for the second part their already exists a solution named Haxe. It has an own standard library which is mapped to structures of the target language.

The first part is hard to solve. Things like for loops can be detected and translated to a map operation. But I see that this has its limits.

Actually I am not sure what I want… I would like to use libraries without porting them. It is not important to have everything in one language. I am trying to accept that.

digital-ember · April 5, 2020, 10:25am

There are similarities, true, but especially conceptually they are very different.
If you were to implement all your languages (C, Java, JavaScript, etc.) using the Intentional Platform, or capture their “intent” there, you’d have, by definition, a common meta model. As @rafael rightfully pointed out, capturing intent of an existing code base is hard, or impossible.
All this has to do with me asking you at the beginning to make the effort and lay out a problem statement. Naturally, if you want to be able to have existing code, written in different languages, to do interop, you’d need some sort of automatic reverse engineering, which is a hard problem, again as has been pointed out be Rafael.

If you think more along the lines of reinventing the wheel, i.e. do something new, from scratch, the conceptual core of Intentional Programming is the closest I have seen to achieve what you are asking. Their logo, mixing the Existencequantor and the Allquantor, was a nod to the fact that their solution is E+A instead a ExA solution, where E stands for Entity and A stands for Aspect, or “behavior”. So instead of implementing the same behavior for an Entity over and over again in different front-ends, you only do it once. That, at least, was my understanding after two years working there.

Lotes · April 5, 2020, 11:17am

Keeping track of the intention while writing down the program sounds like a good idea. Normally we have some knowlegde and project it for example to instructions of some machine language. The idea can get lost.

I think I did not understand intentional software completly. What I took with me was

to write down the program in domain code
Programmer create generators to close the gap between domain experts and final program.
We do not need unambigous grammars when we use intentional trees. So the editor defines the type of language element and not syntax.

At the beginning of the paper I was expecting some kind of notation that helps to understand the code as how it was meant.
You worked with it. Can you show some concrete situation? Maybe drawing a cycle with a for loop is a good example.

digital-ember · April 5, 2020, 11:30am

Due to a pretty strict NDA I can not.

Regarding your example, this is already where you have to make a split:

“drawing a circle” could be considered the intent (declarativley, “what do wou want to do”)
“with a foor-loop” is actually an implemnetation detail, the “how you describe it”

If I understand your posts so far correctly, you want these things separated. So, inside the platform, the intent of drawing a circle would be captured. There is a “thing” called circle, it has certain properties, like a radius and a position, and it can be drawn via instructions. The instuction could be anything. A for-loop, or mouse input like in Paint, where you move the mouse from top left to bottom right to indicate a diameter, or somebody drawing with a stylus on a touch screen an approximation of a circle. The intent is always the same: “draw a circle”. What is left to do is define data transformations.

Lotes · April 5, 2020, 1:01pm

Ok. That reminds me on clean coding. Like: Choose proper names and stay at only one abstraction layer per method. That is an effort used to help human. And only a human will understand it.

In university we once had a talk about the language Idris… see
https://www.idris-lang.org/
The creator has shown us how to sort a list. By providing an implementation and a type in a special notation he was able to proof correctness. It was fascinating to watch… and a bit hard to understand… but in the end he did not need any tests or execution. It was correct.

I am telling this for inspiration. When we can proof correctness (not always but hey…) why we can not complete source with intentions or needed knowlegde and derive implemenations for each programming language?

I do not know your math experience from school, but in Germany, Berlin we were adviced or even forced to justify each computation step. Without explaination: no points for the task. This is a good starting point, I guess.

Are there any other resources in this direction? Is the domain “theorem proofer”?