I was recently contacted about what turns out to be a rather non-trivial NLP project. Unfortunately, this is beyond my experience. Please let me know if this is of interest to you, and I will provide you with the contact information for the company. (email@example.com)
Problem description as I understand it:
They have many (hundreds) of human-readable legal document templates of various types, layouts, and languages, in Word format. Some are flowing text, e.g.:
This contract is by and between [insert name here] and [insert name here]
Others contain tables, where there is a description in one column with a space for a value in the opposing column.
There are probably other layouts besides.
They also have a template expansion system that takes a marked-up word document, with placeholders for variables (3000 options) or conditional logic (8000 options).
The goal is to automate, at least partially, the process of marking up the human-readable templates with the appropriate variables/logic tags, inferring the appropriate variable/logic tags based on the text of the document and/or the bracketed instructions.