Category: Activity log

17/05/13

Permalink 03:25:22 pm, by mholmes, 43 words, 4 views   English (CA)
Categories: Activity log; Mins. worked: 180

Beginning of tutorial for primary source encoding

Started a tutorial based on SNOW1 (for the moment), and in the process of writing the first bit of it, came up against many annoyances in the rendering of egXML blocks; fixed those rendering issues (in three places, site, redesign, and codesharing. Grrr).

Permalink 09:21:48 am, by mholmes, 26 words, 4 views   English (CA)
Categories: Activity log; Mins. worked: 30

Added handling for dramatic text tags

Added rendering handling for sp, speaker, and p within sp. The stage tag isn't handled yet. Rolled out changes both to site and to redesign codebases.

16/05/13

Permalink 05:31:49 pm, by mholmes, 135 words, 5 views   English (CA)
Categories: Activity log; Mins. worked: 120

Troubleshooting: encoded title page of SNOW1, found and fixed rendering bug

Since SNOW1 was a bit of a mess at the beginning, because of the encoders following obsolete examples, I've manually encoded the title page as an example.

Also found a problem with METR1 which was not really a bug, nor an encoding invalidity: a body element which goes straight to content (e.g. a head) with no intervening div is not invalid, but it triggered rendering problems because it was completely unexpected. As it happens, the encoding should not have been that way -- other divs appear later in the body -- but it wasn't technically wrong, so it would be good to figure out a way to prevent this through the schema or more likely through Schematron. We could change the content model of body so that it can only have divs, of course.

15/05/13

Permalink 05:18:22 pm, by mholmes, 65 words, 6 views   English (CA)
Categories: Activity log; Mins. worked: 120

Fixes and updates

Did some tasks from yesterday and some new ones:

  • Files that used <group> have now been converted to <div>s. (The only exception is stow_1633, which probably does need <group>.)
  • XSLT rendering has been updated to handle this.
  • Extra stray copies of METR1 have been identified in the db and removed. These were causing errors in the redesign pipeline.

14/05/13

Permalink 03:56:07 pm, by mholmes, 168 words, 6 views   English (CA)
Categories: Activity log; Mins. worked: 120

Meeting and tasks

I have these tasks coming out of the team meeting today:

  • DONE: Fix rendering of org popups.
  • DONE: Add Schematron constraint for malformed Julian dates.
  • DONE: Fix rendering of persNames with genName and roleName in them.
  • DONE (for group elements): Make a list of files containing group elements, and other bad old code.
  • DONE: Transform files with group elements into nested divs.
  • Add an attribute value parameter to the CodeSharing interface (will have to be done after July, probably).
  • Add handling for @style on list, along with documentation for it, change existing usage of list/@type to @style, then remove list/@type from schema.
  • DONE: Look at forme works in SNOW1 and figure out why they're not rendering properly.
  • Collapse the myth and fict personography types to a single type "lit". This will involve both data and rendering and must be done simultaneously.
  • Add rendering for sp, speaker and stage for SNOW1.
  • In redesign (with Pat): make page credits work like page TOC (pop-out rather than long list).

13/05/13

Permalink 05:01:20 pm, by mholmes, 156 words, 9 views   English (CA)
Categories: Activity log; Mins. worked: 420

eXist build script

I've spent the whole day working on getting a more flexible and successful build system for eXist. This is what I've added to Greg's script:

  • It now checks for the presence of Saxon and warns if it's not available.
  • It checks for three XSLT files, and in each case, if the file is there, it transforms a target file in the build tree. These are for conf.xml.tmpl, mime-types.xml.tmpl, and controller-config.xml. This should allow us to set up build environments for each of our specific projects.
  • It excludes XML Calabash and includes FOP. The former was blocking the build because its download location is down.

Found a number of problems with eXist, which I've reported, including a bad one once the webapp is running: you can no longer call transform:transform with a relative path to the XSLT file, otherwise you get an error. A full path from /db seems to work.

09/05/13

Permalink 03:27:18 pm, by mholmes, 275 words, 12 views   English (CA)
Categories: Activity log; Mins. worked: 240

Wrote an eXist module for similarity metric comparisons

I've now figured out how to create an extension module for eXist, following the instructions here. These are some things I've learned:

  • The only practical way to do this is to work with your module code in the context of the eXist tree, in $EXIST_HOME/extensions/modules/src/org/exist/xquery/modules.
  • You can use a non-eXist namespace -- I'm using http://hcmc.uvic.ca/ns/usm -- but it seems safest to use the eXist package structure, so my package is in org.exist.xquery.modules.unisimmetric.
  • All the extension modules are built together into a single jar called exist-modules.jar. You can build this jar alone, using build.sh extension-modules, then drop that jar into an existing eXist instance (although if the new jar was built with a substantially different version from the rest of the code, there could well be problems).
  • To turn on your module, you add a line to the conf.xml file like this:
    <module uri="http://hcmc.uvic.ca/ns/usm"                        class="org.exist.xquery.modules.unisimmetric.UniSimMetricModule" />
    
    along with the other modules.

I'm not yet happy with my module, and I'm still working on it. In particular, I'm not happy with the scores it's generating, and I think this might be something to do with other bits that get included in the GZIP stream, such as a header; if I can figure out how big those are, I can remove them from the calculation. The highest difference I seem to get is around 0.53 with completely dissimilar strings, so it seems as though the results are being compressed into a range much smaller than 0-1.

07/05/13

Permalink 04:41:34 pm, by mholmes, 93 words, 9 views   English (CA)
Categories: Activity log; Mins. worked: 240

Meeting and work on page-image-linking

Team meeting, at which we discussed the use of ISE's facsimile viewer in MoEML (which will be easy enough to do, although it's based on a traditional db, and we'll have to replace that with proper TEI facsimile encoding).

People also asked me to clarify how the EEBO linking works, so I've done that in the transcriptions documentation file, and I've also implemented the display of little page-images linking to the EEBO pages. Also, during today, <address> and <addrLine> were added to the schema, with some basic display rendering.

02/05/13

Permalink 01:28:13 pm, by mholmes, 149 words, 12 views   English (CA)
Categories: Activity log; Mins. worked: 180

Implemented a crude similarity metric in XQuery

Lucene-based fuzzy matching seems to be very broken in the build of eXist I'm using, and in any case it's based on Levenshtein distance, so I've implemented a crude version of the USM/NCD algorithm in XQuery. It's a long way from ideal, though, because it's using base64 versions of strings rather than compressing the actual strings (this is all I can do with eXist's exposed gzip access); using zip seems to be punitive because it would require creating a file on the filesystem or in the db and compressing that. I think a simpler approach would be to take my Java class and strip out all the command-line stuff it contains, then call that directly from XQuery (see the xqSearchUtils java project and the way it's called from the Despatches XQuery for an example). A jar file with a simple XQuery module interface might be very handy indeed.

30/04/13

Permalink 03:04:57 pm, by mholmes, 543 words, 15 views   English (CA)
Categories: Activity log; Mins. worked: 300

Progress on redesign and ancillary improvements 2013-04-29 to 2013-05-01

I've been using the opportunity of the redesign (which gives me a complete new incarnation of the web application working alongside the current one) to fix a whole raft of problems and annoyances going back a long time. Among those completed so far:

  • When you ask for a page which doesn't exist, you now see a customized "missing" page (db/data/info/missing.xml), but I also set the HTTP status code, like this (for future reference):
    declare variable $dataDoc := if (collection('/db/data')//TEI[@xml:id=$fileId]) then 
                                  collection('/db/data')//TEI[@xml:id=$fileId] else
                                  let $dummy := response:set-status-code(404)
                                   return collection('/db/data')//TEI[@xml:id='missing'];
    
  • Menu item <li> elements now have a class="active" attribute where their target URL matches the current URL.
  • Schemas (ODD, RNG and SCH) are available through their filenames.
  • When the XML view of a document is presented, the teiHeader is automatically expanded to include links to the schemas and a bit more information, to mitigate the current (temporary, I hope) paucity of header information.
  • Page contents menus are now generated, not by parsing the XML source document, but by parsing the XHTML rendering of it after expansion and transformation. This is because the content menu has to be generated in a separate process from the original document expansion and conversion, and since @ids on <div>s are often auto-generated with generate-id() during the XSLT transformation, they cannot be matched for linking any other way.
  • I've begun writing a new module for retrieving information about placenames programmatically. This is largely to support the planned processing of ISE source code through named entity recognition. We will need to be able to do a sort of fuzzy lookup of placenames found in the ISE texts, to identify exact and candidate matches. Right now, the module is producing a gazetteer in the text file format used by e.g. NLTK, as well as a simple lookup text file for ids and matching names; it's also eventually going to be able to take input in the form of a candidate name and produce one or more matches in the form of MoEML ids along with all distinct values of names in MoEML for those ids, with a confidence measure. However, my early tests suggest that the Lucene fuzzy matching (using ft:query with a tilde operator) is actually broken in the build we're using; that's going to be a bit of a problem for us. I might write an XQuery implementation of the USM in order to have something better than Levenshtein Distance, but I don't know how that could be implemented as part of a search. More work to do here.
  • We now have the following stylesheets (instead of a single global one):
    • global.css (currently empty: may be removed).
    • highlights.css (contains rules for search matching and highlighting).
    • popups.css (styles for popup boxes).
    • primary_source.css (styles specific to the rendering of primary source documents, as opposed to born-digital articles).
    • site_page.css (the site chrome, and the main focus of PS's work righ now).
    • xml_code.css (styling exclusively for sample code in XML format, which we use in our born-digital documentation files, through <egXML> elements).

:: Next Page >>

Map Of London

This project allows literary and scholarly works (primary and secondary) to be associated with locations in London, providing the reader with a richer understanding of the works.

Reports

May 2013
Sun Mon Tue Wed Thu Fri Sat
 << <   > >>
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31  

XML Feeds