In preparation for CB joining the project this afternoon, I've written a beginner's-intro-to-Mariage document, which we'll probably expand as time goes on. I've also brought the SVN instructions into the repository, and added SVN Properties to the head of all the XML files, to give us a bit more instant trackability.
Met with CC to go over the grant application, and then to assess the current state of the site and the plan for this semester's work. Made a variety of fixes to site information, menus etc., simplified the text classification system, and made a lot of fixes to metadata and links in several of the more recent long texts.
Over 300 errors in CSS now corrected.
Generated a single-file corpus from the collection, and ran css.xsl on it to generate a "stylesheet" which could be validated. There were over 300 errors, so I've been working through them, fixing typos and other problems with CSS in @rend attributes. I've got about half of them done so far.
Fixed the search bug that was returning multiple copies of the same search hit in the results; it was caused by failing to take account of cases where there were multiple search hits with the same parent. Also found a bunch of bad CSS values in @rend attributes and fixed them. I need to do a formal search through the whole corpus for these.
I noticed the other day that when you clicked on a search hit in the search results, the link took you to the relevant document, but not to the specific hit you clicked on. I've now fixed that, but another one persists; for some hits in some documents, the same hit is being returned multiple times in the results. Working on that now...
We're currently using a rather messy textual classification method based on the use of <textClass> and <classCode> pointing at a non-existent scheme, and what's more, our classification codes seem to overlap a bit, and fall into two distinct classes. I think it's time to revisit this aspect of our encoding, and put it on a sound formal basis. To that end, I have:
- Created a new file in /mariage/ called
global_metadata.xml, in which we can centralize a variety of metadata and link to it (this should include thinks such as availability/licensing, eventually). - Modified the ODD file and generated a new schema to allow for the creation of taxonomies. In the process, I also fixed the oddity whereby
<revisionDesc>/@status was only able to be set to "proofing". We now have a set of document status values which I think will be more useful. - Created an initial taxonomy of textual types which matches what we currently have.
- Summarized the issue for CC and asked for guidance on how to continue.
I think we need two separate taxonomies, one for text types and one for content types (e.g. prose vs religion). Then we can add any number of <textClass> elements to any given document, pointing at the specific scheme and code, and use these to filter documents in specialist TOCs and in the search interface.
We should also presumably look for any existing applicable taxonomies that we could adopt.
This arises out of my preparation of the documents for submission to the TAPAS project, which required some standardization of data in the headers. I also removed the pointless "An Electronic Edition" subtitle from all our documents, and tweaked a couple of other things.
Met with CC to discuss the grant application and the TRUTH presentation in September, and also fixed a couple of things in the db (publishing Le Blanc).
Met with CC to go over plans for the application, and tweak the French translation of the technical description we wrote the other week.
Met with CC to write a preliminary draft of a section of the grant application dealing with the proposed normalization and search functionality. This was a useful exercise, forcing me to make all the details explicit, and explain them in clearer terms than I have been doing to myself. The plan still looks good, and I'm looking forward to making more detailed plans based on this (especially plans for the creation of normalization rules, and an automated system for testing them and evaluating the results.