At AB's request, I've done new versions of the WordCloud from tweets during the conference. One includes all retweets and the content is relatively un-curated; in the other two, I stripped out all the retweets, got rid of &, URLs, "th" and "st" after numerals, hash tags, and handles, and expanded "St" to "Saint" (some of these requested by AB). The sites I used were: https://tagul.com/create and http://www.wordclouds.com/
Turned these into basic TEI with a lot of regexps and crafty style-based search-and-replaces in LibreOffice.
Meeting with GL and DH; following that, I used a range of hacky approaches to convert Word doc content from the English and French Convention of 40 docs to basic TEI, for GL to split out into debate days.
Met with JP to discuss state of ACVI project and hoped-for products.
So far, a bunch of contributors have submitted material (text, images, audio, video). Most of the stuff is on a sync.com account, but the large A/V files are on a personal computer (which John assured me had been backed up). It sounds like the collection of source documents is a bit of a mixed bag, but at least there is a digital file of some sort for each document. I've since got an an account with sync.com and John is going to allow my account to have access to the ACVI repository. One of their issues is storage and access of the large AV files during production of the products, and then long term storage of their stuff.
The three products are:
1) a travelling museum exhibit with accompanying brochure. He has a contractor for the panels etc. for the exhibit, but HCMC might be able to help with the layout/production of the brochure.
2) Compendium of research articles : he's looking at a special edition of BC Studies journal or making an arrangement with the UVic Library who does small-scale publishing.
3) Digital History website
MN and PM came by to get started in a new set of work on the Latin site. Arranged times and booked a machine; installed Wine and Hotpot on it; got PM set up and working through the HotPot tutorial.
The website now contains a page (not linked from anywhere else) that lists and links to all the instances of names tagged as UNSPECIFIED.
Two ISE meetings about planning the future structure of the project and its relationship with HCMC and the university.
Received site update request from BAK.
- removed old announcements and replaced with 2 new ones
- uploaded 2 new pdfs (workshop poster and Lansdowne poster)
- added 2 new announcements
- received new blog info from MF; added it to students page
- synchronized site
- sent confirmation email to BAK, MF advising task completed
We have a sample encoding of MND annotations, and I worked today with MT to figure out how best to map it onto the ISE XML that has to be generated from it. I've ended up with some conversion XSLT, along with a set of recommendations emailed to the team; if everyone agrees, I'll be able to a) fix the encoding done by DJ and SW, b) harden-up the TEI schema to help with this type of encoding, and c) convert it to what the ISE machinery needs.
We're now getting French content ready for encoding, and the line-break-detection algorithm was based only on an English dictionary. Going back to this post, I found a French hunspell dictionary and unmunched it, then added language detection to the hocr_to_tei.xsl and dictionary_module.xsl code, based on there being
@xml:lang="fr" on the root TEI element. This seems to be working, although the unmunched French dictionary is nowhere near as good as the English one. It might be a good idea to look around for a better one.
:: Next Page >>
This blog is for work done for academic departments which does not fall under other categories.
|<< <||> >>|