Two way translations: English to Graql, Graql to English

(I hope I have put this message under the right category, pls advise and I can change it)

Hi all,

So a month or so ago I gave a talk at a conference called Grakn Cosmos 2020, in London, UK - I spent over a week preparing for it but the ideas I expressed in my talk were being contemplated on for several months, since I got to know about the graph database product called Grakn. Graql is their query language used to talk to the Grakn server, similar to we using SQL (or any other database query language).

Here are the resources to find out more about the talk (video of the talk is awaiting preprocessing and reviewing - will share when available):

Slides: https://github.com/neomatrix369/awesome-ai-ml-dl/blob/master/examples/data/databases/graph/grakn/presentations/GraknCosmos2020/Naturally%2C-getting-productive%2C-my-journey-with-Grakn-and-Graql.pdf

GitHub (Grakn scripts): https://bit.ly/grakn-graql-graalvm

The second half of the presentation (starting from the middle of the slides - say slides 44/45 onwards) you will find these two ideas (as mentioned in the title) expressed.

It’s still very early days and I have still proven it via PoC code that it is possible to do such a thing.

What I’m curious about is, has someone in the community (including @ftomassetti) done something like this? If yes, can I see some examples and can I get some constructive feedback on when trying to do such translations - what to keep in mind? I’m also interested in seeing and learning about code related to such translation. Or even learning about libraries and frameworks out there to learn how to do this.

Most would jump to AI/ML or NLP to try to solve the above, but I think it can be a bit too early in many cases - as I mentioned earlier it’s early days. But I’m also open to exploring and learning about angles from the AI/ML perspectives.

Thanks and looking forward for your constructive and useful feedback.

Regards
Mani

1 Like

Hi, If I am understanding correctly you have a mechanism to translate natural language into queries, with a bot presenting possible alternatives to the user, based on previous queries.

In general I have mixed feelings about this because natural language seems not good at expressing ideas clearly. In a way DSLs forse users to think more formally and this is something necessary to some extent. On the other hand it is clear the allure that using natural language would have, making these systems much more accessible.

Some connected things I am seeing are:

  1. The methodology of @jennek.geels: if I understand correctly he is starting from natural language, then it is transforming it in a normalized form until it can be recognized having a clear meaning, so that it can be used as a precise input. I understood this process is partially manual at this stage
  2. This makes me think about @openVALIDATION, however they are aiming to use natural language without checking with the user. This is great when it works, but it seems to me that sometimes the system could misinterpret what the user intended, while your system in a way asks for confirmation

The problem of going from Graql to English is interesting but probably easier to some extent, while the great challenge is the opposite: going from English to Graql. In a way we have generated documentation from DSL code, and this seeems similar. I like the incrementality of the approach a lot and the fact there are several layers.

Do you think your ideal users would have to know Graql or not? If yes, then what advantages would they get from this system? Speed and easiness of use? If they do not Graql how could they check the queries are correct? By using them on an example?

1 Like

Indeed I start from natural language and work towards a normalized form, and this process is not automated by design. To me, a model of system behavior is always the outcome of a negotiation among stakeholders. Moreover, it is a discovery process. Did a stakeholder already think through all the details before coming to the table – so we just have to find a way to put that thinking in a model? I don’t think so. Many details emerge during discovery. That’s why I work with domino as a discovery tool. That’s why I take the role of moderator to stimulate the negotiation.

2 Likes

Thanks @ftomassetti for your detailed response, looks like you have gone through the slides and some of the code (if not all of it). I’ll try to categorically answer your questions.

In general I have mixed feelings about this because natural language seems not good at expressing ideas clearly

Depending on many things we may or may not have an option here. So if the end-user is a business stakeholder or non-technical person or anyone who does not know the query language in this case Graql may only have spoken or written communication as the best way to communicate (even though you state your valid point about not being able to express clearly).

In a way DSLs forse users to think more formally and this is something necessary to some extent.
This can work better when the interface is provided to a developer or technically apt person. PS: you mean force, here not forse right?

My ideas are kinda similar to both @jennek.geels and @openVALIDATION although it is very early days. Also, the purpose of my presentation and the demo in it was to present my ideas before an audience showing them how it would work in a program or via an interface. I also pass hints to both the maintainer/owner of the graph database and the audience (community) that there is room (and food for thought) for both parties to step forward and make the amendments to the technical system (as it’s O/S product) - which means if both sides create the necessary building blocks we can get closer to our goal. You see Graql is quite succinct but with many such languages (Clojure for eg.) it can get hard and complex quickly - just like SQL has become compared to its early days (all my examples are subject to discussions, but for a given context should work fine). Also the ideas developed over time after I saw the product, and raised it as an issue on their github to enhance their system with it, see Enhancement request 1 and Enhancement request 2. So I took it a bit further and modified their products a bit via a PoC (code is available in the above GitHub link) and demonstrated the ideas.

The problem of going from Graql to English is interesting but probably easier to some extent, while the great challenge is the opposite: going from English to Graql.

So very true, but when working on both the directions I found even though on a practical level its hard, it’s quite plausible to achieve the end-solutions incrementally as you said. And it’s like bootstrapping, building on top of the wins of the previous block (or layer). So yes the layers I explained is the highest level and then we go down or into it. Or if we start from inside, we create a layer at a time and move upwards and outwards.

Do you think your ideal users would have to know Graql or not? If yes, then what advantages would they get from this system? Speed and easiness of use? If they do not Graql how could they check the queries are correct? By using them on an example?

Users know Graql to various degrees and hence an interface like this is super useful to switch between what they have in their mind, what’s in the database schema/data and their end skills or experience to achieve their end goal(s). The advantages are plenty – in short: Speed and easiness of use but if we can improve the language understanding part (I used a very naive layer i.e. fuzzy matching, but it was only for PoC purposes, also I wanted to prove that in general we don’t need to build a BERT like NLP model to achieve many simple apps - wait for the video you will enjoy it more). Once we can understand what the query is (written in English) and then hand-held by the end-user by selection process (in some cases). We can data mine both the actual Graql query (still reads English as has many English terms) and also data mine previous English queries, Graql queries and their results - and fine tune the possible questions they have in mind or can ask the system based on previous usages. Think about how a human expert would answer you - often due to ambiguity you could be asked “do you mean this? because I don’t understand what you mean? Or it depends, could be this, is this what you mean? Or you are given multiple answers and you go back and figure out which one works for you” - Ambiguity is normal and even the most expert people on this planet have experienced it.

We can never know if something is correct or not in an expert system unless we are experts ourselves and rest of the time we are assuming the system is correct, that’s the pretext of using this system.

Now such a system can help you learn Graql, because say in an English-to-Graql interface you are shown potential queries in English and when you select one of them, you see the equivalent Graql query so you learn two things:

  • how others ask such questions or similar or related questions in English
  • how an English query translates to Graql (not hard and fast but with time one learns)

With the Graql-to-English the second value above is amplified - you learn to read a slightly succinct / terse language and things begin to make more sense and you can turn the nob of granularity of what level of English you want to know from that query (if you see the layers and examples in my slides - you will hear about it in my video soon). And I find both these translation journeys super useful - one learning comes from it: There is no one hop solutions (refer to slide), it’s a major mistake we make, even I have been subject to it many times and its hard to unlearn it.

I hope I didn’t leave out any queries above or misunderstand or incorrectly answer any of these - pls do let me know if I did.

2 Likes

The wait is over, the video is out!
https://www.youtube.com/watch?v=Cef2nPEmybs

(see above message for links to the source-code and slides)

1 Like

I think about similar problem. The problem is this my final user need build a report but don’t knows a data model, keys other concepto of structured data or programing.

Please i need documentation about introduction of Natural Language Processing.

Good luck

We’ve been also playing with the idea of using chatbots to help users query data sources.

The original idea (and still ongoing work) was using chatbots for open data where the bot translates the NL text to REST API queries.

More recently, we have applied this idea to querying models. From the DSL definition you generate the specification a chatbot that let users query models conforming to that DSL

1 Like