I've made a good start on mapping out how we might do the TEI encoding of the modern dictionary, based on ML's sample data. There are a couple of things I've written to get clarification on, relating to the overall structure of an entry and the possibility of duplicate instances of content from the same source which might happen, and whether that suggests we should keep content separate and simply link to it. A nested entry structure is going to be the basic idea, but lots of details still remain to be figured out. 180 minutes.
I've done five MSS now. They all need more metadata, but the facsimile elements are complete.
This is where the HCMC component of the project is, as of July 2019:
- A "wendat" svn repo has been established at svn/wendat.
- A large collection of source documents in the form of collections of page-images and PDFs has been provided by ML, and is stored (for the moment) on my hard drive, and on Squash.
- One of those documents has been processed into page-image jpegs, stored on people/martin/wendat/ms. The others will find their way there once they've been processed.
- An ODD file has been created, and a schema-build process coded, so that RNG and Schematron files, along with project documentation, are generated automatically. The schema is rudimentary for now, but will be refined as we do our initial encoding.
- A single source document has been encoded as an msDesc + facsimile, pointing at all the page-images on the server, and some basic XSLT is able to render the msDesc into rudimentary HTML. Eventually this will be the pilot for a facsimile viewer providing access to the page images, and will then be ready for transcription/encoding.
- A MySQL database pair (live and dev) have been set up, along with an admin and read-only user, in anticipation of the possible need for an RDB. However, this is not working yet (an error with the MySQL password encoding thing, which sysadmin is looking at), and in any case we may not want or need it.
- A basic plan has been worked out between ML and me: the primary source documents will be encoded as transcriptions, probably using entryFree, and the modern dictionary that is being compiled from the sources will be encoded using the regular entry element. The latter is going to be more complicated than the former; each entry or form will link to its sources in the various MSS, and will also point to cognates in other Iroquoian languages.
- It is anticipated that I will do initial encoding of sample data of all the key types, while developing the schema and documentation, and then in the fall we will hire RAs to do some transcription; ML will probably do the direct editing of the modern dictionary.
- The objective is a SSHRC Insight Development grant application in the spring.
Hours so far: somewhere around 8.