Lingard corrections entered; new review encoded

Minor editorial corrections for Lingard translation done; new review by NVD encoded. Due to the small size of the site, my current publication method is simply to rebuild the entire webapp and push it up to eXist.


Old volumes are now indexed and searched

I used jira to create XHTML versions of the OCRed volumes, and integrated the results into the existing search; it now seems to work pretty well, so I pushed it to the live jetty.


Tweak to Lingard translation

The headers for the translation itself were appearing at the bottom of a page in the PDF. I set up a special case of div0[@type='NewPage'] which the XSLT now handles to force a page-break in this case. Republished the XAR.


Volume pages now being created; other updates to site

I've extracted the editorial content from the master volumes, and I'm now building a separate index page for each of the volumes we've created, including the editorial stuff. On the other listings pages, volume numbers are now links to the volume pages. This answers (I think) the last of HT's requirements for the new site, other than a redesign.

I've also tested Apache Tika with the old volume PDFs, and the results are very promising; I think we may be able to process them to ugly XHTML, which eXist could then index, and provide people with search capabilities and linking out to the specific page in which the hit is found.


New translation done; updates to rendering

The new translation is done, thanks to numerous hacky shortcuts to converting ODT to TEI, including search-and-replace on the contents.xml file. The results highlighted a couple of minor layout annoyances, so I've also fixed those. The document comes in at about 100 pages. It's now posted for proofing.


New translation for vol 24

Received a new translation and started working on it. I've used the LibreOffice macro search tool to good effect, enabling me to add some tagging in a semi-automated way to the word-processor doc using styles, and I'm now transferring that content into the XML document. The metadata is done, the bibliography is done, and I'm working through the editorial intro.


301 redirects for old URLs now working

Simple redirect XQuery module that handles both the PDFs and the HTMLs now implemented.


Tweaking and refining the web app

Worked a lot on the search today, constraining it to published documents only, and tweaking how it returns results. I've also added VNU validation of the HTML to the build process, and fixed some problems arising out of that; I've turned popup notes into <aside> elements so that their inline text can be ignored by the indexer, while the footnote rendering at the bottom of the document will be indexed; and I've refined the indexing after using the monex profiler. I also tweaked the P5 output so that it validates with the correct schema links. I think we're more or less there now; what we have is already much better than what's on the site, and I see no problem issues at all.


eXist app finished

I'm now happy with the way everything is working. There are tweaks I could make -- I need to put in place redirects for the old URLs, and I should revisit the collection.xconf and indexing, and there will be more pages that need to be created for the new site -- but it's all basically there, and what we have now could replace the current webapp immediately. Will start that process next week.


eXist app basically working

I've added an eXist app build to the process and I'm now testing and bugfixing locally to bring this project thoroughly into the Enddings fold. Everything is basically working, but I'm finding I now want to enhance some aspects of the rendering and display so that it's a cleaner and simpler setup than the previous app; I'm adding citation stuff in a footer, as well as getting the search working, based on Mariage. Should all be done soon.

Scandinavian-Canadian Studies

This is the blog for volumes 15 to 19 of the journal Scandinavian-Canadian Studies / Études scandinaves au Canada. Our aim is to provide Web-based access to the contents of the print journal in a range of different formats, including PDF, HTML, XML (TEI P5), and plain text (UTF-8).


