Lemma matching
Posted by jtakeda on 20 Jan 2018 in Activity log
The lemma matching code is now re-written and rationalized; we no longer create a list of documents and apparatus. Instead, the transforms use a document collection (like MoEML's static build) and uses doc categories to determine whether or not the text needs to be tokenized. It's fairly fast and works quite well.