I have a working QT class that implements the USM, and a little test GUI for it. I should now compare it with the results from my Java implementation, to see if they differ, and if so in what ways.
Category: "Activity log"
With the proposal for DH accepted, I'm now planning an implementation of the application that I could use to demonstrate and work with real data, and wondering what platform to use. I've already written one implementation in Delphi (not acceptable as a proper solution, because it's not cross-platform), and written a command-line implementation of the basic similarity metric in Java (easy because zip support is built into Java). But I'm now wondering if the benefits of compiled code, and the opportunity to build my QT skills up, merit doing this project in QT, assuming there's time before July. This is why I've decided to try this project in QT:
- QT includes qcompress, which takes a bytestream and returns a compressed bytestream, using zlib. This should be perfect for our purpose.
- C++ is going to be MUCH quicker for this sort of processing.
- I need more practice with QT and C++.
- A cross-platform native app will be as acceptable to the community as Java.
I've set up a new QT project for this, and I'm going to start coding the basic class for zipping, measuring, and calculating the similarity metric over the next couple of weeks.
I've been working slowly on the Laon style, which has now got a bit closer to the other styles in the interests of conformity; I've also made some tweaks to the main stylesheet to space some things out a little. I think progress on the appearance of the site will have to be incremental, because there's not enough time right now for the kind of work that a radical revision would require.
For text documents which have a thumbnail, the thumbnail is linked by including it in a <note>
tag in the <biblStruct>
in the <teiHeader>
:
<biblStruct> [...] <note><graphic url="thumb_varin.jpg"/></note> </biblStruct>
The XQuery finds this because it's looking for the first <graphic>
element in the document. The XSLT is able to distinguish between <graphic>
s in image markup files (which have no preceding "thumb_" in the filename) and these items, which do; the "thumb_" is only supplied in the teaser where it's needed. In the case of image markup files, the <graphic>
the <facsimile>
part of the document links to the main image for the file, and the thumbnail link is constructed from it, on the assumption (always true) that it exists; in the case of text documents, the thumbnail is the only image, so its filename is complete, but the XSLT is sensitive to this.
Talked with CC, and we decided that we want to show thumbnails of title pages in the teasers for text documents whenever we can. To that end, I've extracted and trimmed up teaser-sized images from all of the PDFs I have; CC has a couple more that I don't have, and she'll send them on. Once that's done, I have to determine how to make them work. Right now, the XQuery simply copies the first graphic/@url
into an @n attribute on the <bibl[Struct]>
tag which is generated as part of creating the TOC; this is then used to generate the thumbnail link, on the assumption that if there's an image, there's a thumbnail. I think we're going to have to be more explicit than this, though, so we can link in a thumbnail separately from a parent image (because for the texts, we won't be showing any larger image at all). This will require changes to both XQuery and XSLT, and presumably also to the XML of documents for which we now have thumbnails.
We noticed that the titles of engravings in our db sometimes didn't match the titles as shown in BNF catalogues; this led to some confusion when referencing the documents in articles. CC sent me a list of some of the BNF titles. This is what I've done:
- Where BNF titles are supplied, I've used them verbatim in the
titleStmt/title
,sourceDesc/biblStruct/title
, andsourceDesc/biblStruct/title[@type="trunc"]
elements. I have not created truncated versions of them, because few of them are all that long anyway. - Where the BNF provided two titles, I have used them like this: Title one (Title two).
- Where the filename (and
@xml:id
) of the document did not match the title, I have created a new filename and@xml:id
which does match the title (being a normalized, lower-case, truncated form of its first few words). - For a few engravings for which CC didn't send me the BNF title, I have still normalized the filename and
@xml:id
. The aim here is to make it easier to find the XML document based on the title. - Updated references to all documents which have changed throughout the collection.
I've always been unhappy about the appearance of footnotes in the documents; the over-large "Note 1" headings seemed superfluous, but it's not easy to generate normal ordered lists from these because of the nature of the processing and the popping-up. I've now rewritten that code to produce something that looks much more like a normal ordered list, but still allows the nuber to be displayed when the note is copied into a popup box.
Finished marking up the second article, and also had to make some revisions to XSLT and CSS to handle some new issues thrown up by it:
- Templates for
<ref>
elements can now handle linking between articles in the editorial collection, and handling of<ref>
elements generally is slightly more robust. - Blockquotes are now correctly handled in the context of articles, including better margins when verse stanzas or lines are quoted.
- Popups (for notes, biblio refs etc.) now no longer have a min-height. I can't see why I gave them one originally, but it led to lots of empty space in the popup if the content was short.
On CC's instructions, fixed the dates of the Agremens texts, which were mostly wrong, and tweaked the Milan text. Also changed the display of document titles in the Teaser so that it's using the full title instead of the truncated title used in the TOC.
I've been marking up the second article today, and have done most of it. In the process, I'm standardizing the markup practices, and I'll be in a position to document them in detail when I've finished and tested the document.