Began planning teiJournal project
The first phase of planning is to get a good idea of the range of tags, attributes and attribute values used in the two projects we've already done which will form the basis for teiJournal: the ACH Abstracts, and the ScanCan project.
The problem is that both projects use a relatively tightly-constrained subset of the TEI P4 DTD, but the project DTDs are large and inclusive because they were generated at the beginning of the process when it wasn't yet clear what elements and attributes would be used. In addition, many attribute values which are simply open string values in the DTD are used in a more constrained and methodical way (with a fixed set of values) in the actual projects. Therefore we need a way to get a clear overview of what elements and attributes are used, and particularly what values the latter actually have. Once we have this, we'll be able to map these directly to P5, and create a restricted P5 schema using ROMA to recreate the same basic document format. Following that, we can convert the ACH docs to P5 (which we need to do anyway), and we can write some detailed instructions for marking up articles, as the basis for all the transformations and other dependent code.
I spent this morning trying a variety of ways to get a useful report out of the old eXist db on Mustard, showing all the data we need. Because of the requirement to retrieve all and only distinct values for element and attribute names, and attribute values (very expensive in processing time), and because of the need to search across two different collections, this is a slow job.
I ended up with nearly 200 lines of XQuery and a query which is takes several minutes to execute, but the results look promising. I need to write some XSLT to produce useful output now, filtering out what's not necessary (id attributes, for instance) and presenting the results in a table.