TCCD: OCR and schema development
Posted by mholmes on 01 Feb 2016 in Activity log
More OCR work running all day. We're also, slowly but surely, nailing down some rules for the schema and the encoding practices, based on looking at more and more of the original data. Many reports of proceedings are not in direct speech, but after some consideration we've decided to use the <sp>
element anyway, and use <said direct="false">
to wrap indirect speech. Using the same <sp>
and <speaker>
for both indirect and direct speech will enable us to process both types of data in the same way more easily.