Missing entries: the transformation sequence
Posted by mholmes on 23 Sep 2010 in Activity log
This is what we've done to get the missing entry items into a state where they can be merged back into the main collection whenever SK finds that one is missing:
- Generated a
rescued.xmlfile starting from the complete set ofunmerged_xmldocuments, collated usingxincluded-complete-list-expanded.xml, and running a new transform calledretrieve_lost_from_unmerged.xsl. - Manually fixed nested
<ENTRY>tags (there were only six instances). - Ran
rescued.xmlthroughremove_empty_tags_from_rescued.xsl, which removes the initial empty tag that caused all our problems; it also turns anychild::infltag into a comment (SK thinks this will be the simplest way to preserve that information). The output is calledrescued_empties_removed.xsl. - Ran
rescued_empties_removed.xslthroughexpand_separated_xml.xslto producerescued_empties_removed_expanded.xml. In this case, I tweaked the XSLT file to preserve comments from the preceding steps, recording the file of origin of the entry, and any<infl>tag value. - Ran
rescued_empties_removed_expanded.xmlthroughglottal_conversion.xslto producerescued_empties_removed_expanded_fixed.xml. This corrects a bunch of Unicode character representations. - Ran
rescued_empties_removed_expanded_fixed.xmlthroughcollapse_forms_etc.xslto producerescued_empties_removed_expanded_fixed_forms_collapsed.xml. This makes a set of changes SK identified as being uniform and helpful to reduce the amount of repetitive work she has to do. - Renamed
rescued_empties_removed_expanded_fixed_forms_collapsed.xmltorescued_final.xml
Where entries are missing from the other files, SK will now go to rescued_final.xml and retrieve them from there; they should be in the same condition as the other entries they're being merged with.
Note to self: it was difficult to reconstruct this conversion sequence because half of it took place before the blog was in place. In future, make sure anything like this is blogged in great detail.