Progress on redesign and ancillary improvements 2013-04-29 to 2013-05-01
Posted by mholmes on 30 Apr 2013 in Activity log
I've been using the opportunity of the redesign (which gives me a complete new incarnation of the web application working alongside the current one) to fix a whole raft of problems and annoyances going back a long time. Among those completed so far:
- When you ask for a page which doesn't exist, you now see a customized "missing" page (db/data/info/missing.xml), but I also set the HTTP status code, like this (for future reference):
declare variable $dataDoc := if (collection('/db/data')//TEI[@xml:id=$fileId]) then collection('/db/data')//TEI[@xml:id=$fileId] else let $dummy := response:set-status-code(404) return collection('/db/data')//TEI[@xml:id='missing']; - Menu item
<li>elements now have aclass="active"attribute where their target URL matches the current URL. - Schemas (ODD, RNG and SCH) are available through their filenames.
- When the XML view of a document is presented, the teiHeader is automatically expanded to include links to the schemas and a bit more information, to mitigate the current (temporary, I hope) paucity of header information.
- Page contents menus are now generated, not by parsing the XML source document, but by parsing the XHTML rendering of it after expansion and transformation. This is because the content menu has to be generated in a separate process from the original document expansion and conversion, and since
@ids on<div>s are often auto-generated withgenerate-id()during the XSLT transformation, they cannot be matched for linking any other way. - I've begun writing a new module for retrieving information about placenames programmatically. This is largely to support the planned processing of ISE source code through named entity recognition. We will need to be able to do a sort of fuzzy lookup of placenames found in the ISE texts, to identify exact and candidate matches. Right now, the module is producing a gazetteer in the text file format used by e.g. NLTK, as well as a simple lookup text file for ids and matching names; it's also eventually going to be able to take input in the form of a candidate name and produce one or more matches in the form of MoEML ids along with all distinct values of names in MoEML for those ids, with a confidence measure. However, my early tests suggest that the Lucene fuzzy matching (using ft:query with a tilde operator) is actually broken in the build we're using; that's going to be a bit of a problem for us. I might write an XQuery implementation of the USM in order to have something better than Levenshtein Distance, but I don't know how that could be implemented as part of a search. More work to do here.
- We now have the following stylesheets (instead of a single global one):
- global.css (currently empty: may be removed).
- highlights.css (contains rules for search matching and highlighting).
- popups.css (styles for popup boxes).
- primary_source.css (styles specific to the rendering of primary source documents, as opposed to born-digital articles).
- site_page.css (the site chrome, and the main focus of PS's work righ now).
- xml_code.css (styling exclusively for sample code in XML format, which we use in our born-digital documentation files, through
<egXML>elements).