Names-only dictionary created
Faced with the task of creating a print dictionary consisting only of the name entries, I was initially stumped because of the incidence of duplicate xml:ids across the collection. Previously, my print dictionary processing has depended on inclusion by manual selection of only those files whose status is complete, across which there are no duplicate ids. However, the names are sprinkled throughout the whole collection, including lots of files which have not yet been edited, and therefore have duplicate ids in them.
After some thought, I set up this process, running in dictionary_test:
- First, the auto-orthography transformation is run against all the files in the dictionary directory (the live files) to create auto-orthographized versions in dictionary_test.
- A new file called master_all.xml XIncludes all the entry files in dictionary_test. This file is obviously invalid because of the duplicate ids, but it can be processed with XSLT.
- Next, a transformation called generate_names-only_dictionary.xsl pulls out all the name entries, along with all the completed root, stem and affix entries to which the name entries link in their morpheme elements, and creates from them a file called master_names-only.xml.
- Finally, the moses_master_to_pdf_LINGUIST transformation scenario is run on the master_names-only.xml file to generate the PDF dictionary.
In the process, I found and fixed a couple of errors including a duplicate id between two name entries, and also noticed a new problem we'll have to work on: the TEI Schematron embedded in the RelaxNG schema for the <gloss>
element disallows the presence of @subtype when there is no @type (quite reasonably, perhaps), but we're using @subtype by analogy with what we use on <seg>
, while having removed @type from the schema. I guess we should probably handle this by creating a new datatype and using @type on <gloss>
instead of @subtype.