Lemma matching

Posted by on 20 Jan 2018 in Activity log

The lemma matching code is now re-written and rationalized; we no longer create a list of documents and apparatus. Instead, the transforms use a document collection (like MoEML's static build) and uses doc categories to determine whether or not the text needs to be tokenized. It's fairly fast and works quite well.

This entry was posted by and filed under Activity log.