Pursuing my work on using Craig's Zeta to characterize participants in a correspondence, I've started trying to reproduce the work I did in the summer using the latest versions of the DH's spreadsheets, and also to extend it. The main thing I've done today is to extract individual sets of data for the correspondence between Douglas and Newcastle, Lytton's successor. Getting a single archive of the plain text out of the db is a relatively simple XQuery:
declare default element namespace "http://www.tei-c.org/ns/1.0"; let $docs := collection('/db/coldesp/correspondence')//TEI[teiHeader/fileDesc/titleStmt/author/name='Douglas'][teiHeader/fileDesc/titleStmt/respStmt/name='Newcastle'], $output := concat('begin... ', string-join(for $d in $docs return $d//div[@type='despatch_to_london']//text()[not(ancestor::head or ancestor::opener or ancestor::closer or ancestor::index or ancestor::note)], ' '), ' ...end') return $outputand
declare default element namespace "http://www.tei-c.org/ns/1.0"; let $docs := collection('/db/coldesp/correspondence')//TEI[teiHeader/fileDesc/titleStmt/author/name='Newcastle'][teiHeader/fileDesc/titleStmt/respStmt/name='Douglas'], $output := concat('begin... ', string-join(for $d in $docs return $d//div[@type='despatch_from_london']//text()[not(ancestor::head or ancestor::opener or ancestor::closer or ancestor::index or ancestor::note)], ' '), ' ...end') return $output
The time-consuming bit is creating all the individual files (768 and 469 respectively) through XSLT, but that's now done too; I don't know if I'll actually need those, because I'll probably do what I did before and run the analyses on aggregated content segmented into uniform blocks. I've created a text set for them in the Intelligent Archiver program and generated a word list, and I'm now trying to figure out how to do a cluster analysis with this data using Minitab, and how to use the latest incarnation of the Craig's Zeta spreadsheet. I'll need to move more quickly on this next week to get my abstract done.
Last minute request to do the CSS part of yesterday's workshop in JL's class.
Did some sprucing up of the CSS, added a couple of extra samples/examples, and spent about 90 minutes in the class.
The DH 2010 CfP is out, and my initial submission will be using text-analysis on some of the Coldesp texts. I've spent the morning reading through the text I originally wrote and filling in some gaps (not finished yet, but it's well over 2,000 words already, so it will need some condensing at the end). I've also started preparing to re-run the analyses I did in June, and discovered that the CraigZeta spreadsheet (latest version, which is Excel 2007) actually seems to work on OpenOffice Calc, which is a relief (you have to make sure that VB macros are turned on in the Tools / Options / Load/Save / VBA Properties / Executable code). I've also confirmed that the Intelligent Archiver Java app works OK; and I already purchased a copy of Minitab (although for that I'll have to switch to Windows, unfortunately).
The next stage is to repeat the original analyses; then it would be a good idea to extract similar text for Douglas corresponding with Lytton's successor. There are 470 Newcastle-to-Douglas documents and 768 Douglas-to-Newcastle. These are far larger datasets, and might be even more interesting.
Posting the time spent over the last week or so on back-and-forth about a grant application related to the IMT. These things eat so much time...
BH has requested that HCMC do a version of our XML workshop for his HUMA150 course on October 2 and 6. We have agreed to the workshop, but need to finalize details, including timing and content.
More details to follow from BH.
Spent most of the day working on the DH presentation.
I also ported our presentation style over to the new alpha version of the S5 code. It's appealing in so far as it provides speaking notes functionality. Try it out.
Resurrected my 2006 SSHRC CV and updated the attachments document to show 2003-2009 publications etc., then sent it off to CM for the RT project proposal.