Created an SVG map for use in the presentation on dating in July.
Created XSLT to add long s to transcriptions, based on previous work on Stow, and ran it on TRIU1. Note to self: it needs to exclude editorial notes. Also did a lot of semi-manual cleanup of encoding in the document, ready for KMF and ZV to start work on it. Noticed a lot of remaining @rend attributes; I've now added a Schematron warning for those, so people convert them to @style.
Another mockup for the Guidelines TOC page rewrite.
Hiya, I spent most of my day on the phone, slowly going through my list. It seems most of the Anglican churches have office hours early in the weeks, so I didn't get a hold of too many, but I have been invited to the Colwood Historical Association meeting on Monday as well as to Dick Emory's house to see his private collection of artifacts and newspaper clippings from when his father served and was wounded in France! That is probably the most exciting, although I located many churches from the 1910's and learned a bit about the Anglican mission in the 1870s from a very chatty Rector's Assistant. I also located the church where Pearkes is burried and another whose reverend served with the 88th Regiment, Victoria Fusiliers. All churches seemed interested in circulating a poster in the coming weeks.
Another vital piece of information is that the Anglican archives close for July and August, so I'm hoping to arrange multiple visits but may want someone to join me for some of them as those archives hold a lot of info and are only open two half-days a week. I will keep you posted. I guess that takes away all my show and tell for tomorrow, but I'll see you all then!
PS: because I'm contacting so many groups, I've made a new email for myself just for CGTW. If you want to make this your primary contact info for me it will help me keep things all in one place. It is firstname.lastname@example.org
Used XSLT to add an @n attribute to all paragraphs holding the @xml:id of the preceding or first-child milestone element, to enable faster searches on p tags instead of ranges between milestones, while still returning the correct target milestone.
Created another mockup of proposed new Guidelines TOC page.
Tested Macs in A103 to make sure no memory problems etc. doing large transformation exercises.
Fixed some bugs and cleaned up some XQuery and XSLT.
Took several restarts of Tomcat and various apps, then intervention by sysadmin to increase the number of files a process can open; now we have much higher speed on all apps.
You may notice that when you add images to a blog post it tries to display them at full size, sometimes cropping an edge. To make them easier to view, here's a trick.
After adding the image to a post, look at what the blog engine dropped in to your post editor. It looks like this:
<div class="image_block"><img src="http://hcmc.uvic.ca/blogs/media/blogs/cgtw/poster4.jpg" alt="" title="" width="987" height="1281" /></div>
The width and height attributes are representative of the pixel size of the image. We can adjust them to make it fit a little better by fiddling the numbers. If we reduce each number by, say, 50% we end up with this:
<div class="image_block"><img src="http://hcmc.uvic.ca/blogs/media/blogs/cgtw/poster4.jpg" alt="" title="" width="494" height="640" /></div>
Notice that these are rounded to the nearest whole number. If you try to keep image no wider than about 400 or 500 pixels they'll look better in the blog. Also, please note that this does NOT change the size of the original image. It ONLY changes the display size in the blog. Right-clicking and saving will store a full-size version.
Sorry I did not blog at the end of the day. I started off searching BC Archives' collections for material on police/court activity and war resistors, and then I went down there to talk to the archivist in person. Unfortunately, they told me there would be numerous legal challenges involved in accessing some of the material. I will report what they said in more detail on Friday, but for now I have a basic list of what is open to us. I then looked at what BC Archives has for education, and found some useful school records, some oral histories with Victoria high alumni, and other material. While I was there, I also looked at the music scene in Victoria during our time period so that we might have some audio clips for the website. BC Archives has concert programs from various musical societies, and piano sheet music about Victoria, by Victoria composers and published in Victoria. If we can't find any recordings, I could always record myself playing it and put the clip on the website.
Then I checked out the legislative library, and talked to the lovely reference librarian about what records pertaining to education and the provincial government are there. I think I have an almost complete list of what public schools were in operation during the time period, based on a masters thesis on microfilm she showed me. Today, I am going to Victoria City Archives first. I was hoping to find school board minutes yesterday, although these may not give us the interesting stories we are hoping to feature, so I will keep looking for other material. I have an appointment with the archivist at St. Michael's. I spoke to her on the phone yesterday, and it sounds like they have great records on the school's veterans. I am going to call all the old private schools, and Victoria High, and talk to them about records.
I've spent the day looking for municipal and community archives. So far my list is:
Sooke Region Musuem
Metchosin Museum Society
Esquimalt Municipal Archives
View Royal Community Archives
Oak Bay Archives
Sidney Museum and Archives
It seems that Langford does not have an archive, as much as I've looked for it. Let me know if there are any I've neglected! I have finding aids for some of these and the rest I'll be calling tomorrow. I've also found some useful material in the BC Archives for medical history. Tomorrow I'll be following up with the the Chinese Presbyterian Church and calling local First Nations bands to see if they would be interested in advertising in their newsletter or mailing list.
Wow, things are really kicking off! Love the posters and excited to hear what everyone else found.
As for me, I've started my contact list of churches and social organizations (did you know there are 162 places of worship in this town?) and will be cold-calling them tomorrow. I will be starting with the organizations I know existed 100 years ago, and then as the project moves forward I will start reaching out to newer groups to see if their members have other information - which will be greatly aided by those posters. The things I will look for first are if the groups have lists of members who served, members from the time, any archival materials, and monuments, and from there I will build a list of places worth visiting.
Wish I could join you at the air museum, have a great time. I'm signing off early today to enjoy a birthday dinner, see you all Friday!
My contention about the change to docUtils.java having caused a regression which broke relative paths for the doc() function was borne out after I changed the file and rebuilt. Reported the bug formally on the bugtracker, and it is now fixed, so I have a fresh trunk build of eXist ready to go for MoEML. I'll deploy this first thing tomorrow before anyone else gets to work.
I am posting this exchange about inferred glosses so that I don't have to think it through all over again in the future!
Regarding the search engine, I blogged on 12/12/12:
"ECH's goal for the search engine in the web database is that, if a user searches for "fat", s/he will get results including fat, fatten, fattening, fatty. Our current settings, and our policies for adding inferred glosses, seem to be accomplishing this nicely. An entry which has "fatty" in its def is found by a search for "fat", because it also has an inferred gloss "fat". Searching for "fat*" also returns defs including fat, fatten, fattening, fatty ... but also fatal, fathom, father."
However, we also noticed the converse on 16/04/13:
When I searched for the inflected form “fired”, I also I got all the entries with “fire”.
BUT when I search for “fatty” or “fatten”, I don’t get all the entries with “fat”. What is the difference here?
I think you're just discovering that a stemming analyzer is not an educated human. It doesn't understand semantics; it just knows how to strip off (some) inflectional endings and index the resulting stems, and then how to stem the search input and search the stemmed index with it. You will never find an automated search engine that gives you perfect results.
Right now, the search is paying no attention to whether things are in gloss tags or not; as I understand it, the purpose of the gloss tags is to construct and English-Nxa’amxcin list, not to aid in searching.
The situation with "fatty" is definitely a bit odd; it appears that if you search for that word, you it doesn't get stemmed prior to the search, whereas if you search for "fired" it does. Perhaps the stemmer avoids stemming -tty inputs because there are many which shouldn't be stemmed? ("batty", "natty", "patty", for instance.)
OK, so when I search for fatten, fattened, or fattening, I get the same 5 hits – 3 for “fattening”, one for “fattened”, and one for “fatten” – i.e. everything with the stem “fatten”. It doesn't go all the way down to the root “fat”, and that's fine.
When I search for “fatty”, all I get is the one entry for “fatty”, as you explained above. That's fine too.
We had been adding inferred glosses for the uninflected English stems and roots of attested glosses, e.g.
<seg>I am <gloss>fattening</gloss> it up</seg><bibl corresp="psn:W">W10.138</bibl>
<seg><gloss subtype="i">fatten</gloss></seg><bibl corresp="psn:ECH">ECH</bibl>
<seg><gloss subtype="i">fat</gloss></seg><bibl corresp="psn:ECH">ECH</bibl>
Here, <gloss subtype="i">fatten</gloss> adds nothing to the search capabilities, because the stemmer can find “fatten” within “fattening”.
But does this entry with “fattening” get found when I search for “fat” because of the stemmer, or because of the <gloss subtype="i">fat</gloss>? It must be because of the inferred gloss, because the stemmer only stems as far as “fatten”.
In the case of “fatty”, where we know the stemmer doesn't operate on it, it still gets found when I search for “fat” because of the <gloss subtype="i">fat.
(“fattening” and “fatty” do NOT get found when I search for “fat” just because they contain the string f-a-t, because “fatal” and “father” are NOT found by a search for “fat”. To find anything with the string f-a-t, I would need to search for “fat*”.)
So the inferred glosses do play a role in improving the search. That said, I don't think we should be going out of our way to add inferred glosses for this reason.
Much discussion over the last few weeks regarding the placing of gloss tags for generating the Eng-Nx wordlist. I attempt to summarize our conclusions here for future reference.
1) Why do we place inferred glosses (<gloss subtype=”i”>)?
At various times, we have placed inferred glosses for augmenting the search engine on the website, and for generating the English word list.
We concluded that from here on, we ONLY need to place gloss tags for generating the English word list. Inferred glosses do sometimes enhance the web search engine, but now that the stemming analyzer is in place, we don't need to do any further markup to help it out.
2) How should we tag inflected English words?
Until last week, we had been inferring the root word (or stem where relevant) when a def is an inflected or derived form of an English word, e.g.
<seg>he is <gloss>fattening</gloss> it up</seg>
<bibl corresp=“psn:JM”>JM 1.2.3</bibl>
This encoding means that this entry will show up three times in the English-Nxa’amxcin wordlist: under fat, under fatten, and under fattening. This seems like overkill, especially when these three words will sort one after the other in the English wordlist anyway.
ECH and SMK decided we would like to see the “fat” entries as follows in the print dictionary:
fatten: fatten, fattened, fattening
To accomplish this, we need to reduce the number of gloss tags we place in each entry. Inflected English forms (-ed, -ing) should not be gloss tagged; only their root or stem should be gloss tagged.
So “fattening” would now be gloss-tagged as:
<seg>he is <gloss>fatten</gloss>ing it up</seg>
MDH confirmed that the search engine is ignoring gloss tags, so the stemmer will operate on <gloss>fatten</gloss>ing the same as it would on <gloss>fattening</gloss>. (That is, it will continue to return all results with the stem “fatten” when someone searches for fatten, fattened, or fattening.)
MDH has created two sample Eng-Nx word lists based on the 6 files with “complete” status, one using all the gloss tags, and one omitting the inferred gloss tags. They are in moses/trunk/docs/glosses. We concluded that we don't want to programmatically ignore the inferred glosses, because many of them – especially the synonyms – are worth including. But we can refer to these lists to identify the inflected English words whose gloss tags need to be revised.
3) How should we tag English phrasal verbs?
Where appropriate, English phrasal verbs will be enclosed in a single gloss tag - e.g, <gloss>go after</gloss>. This will allow us to organize the headwords in the Eng-Nx word list as follows:
4) How can we distinguish English homophones in glosses?
English homophones in glosses will be distinguished with a secondary word (or phrase) in an @n attribute on the <gloss> tag, e.g.<gloss n="conflagration">fire</gloss>, <gloss n="back of boat">stern</gloss>. These will then be rendered as follows in the print dictionary:
stern (back of boat):
We decided not to use parts of speech for @n values. We will always use synonyms. We need to select synonyms that will be clear to readers in the community.
I have now disambiguated the English homophones listed here, and updated the Notes on Definitions and Gloss Tagging document accordingly. Where one homophone was far more common in the data than the other, I only added an @n value on the less common one - e.g. watch (wristwatch).
ES added transcripts for accf3, fraf8, cltf6
Trying to abstract the combined keyword/text search into a separate library yesterday was very problematic, but I took a simpler approach this morning and simply copied and adapted the code from search.xq into advanced_search.xq. The result seems to be working perfectly -- the keyword/text search is done first to retrieve a set of @xml:ids, then the search is done on those ids, with additional filters provided by the other form controls.
Did this through XSL with some cunning language-detection code based on content and context, and it seems to have worked pretty well. The Names page now uses the @xml:lang attribute instead of its own cruder detection code to build output.
It was great meeting you all today and I'm looking forward to working with you all through the summer! I thought I would post one of my favorite newspaper articles from the project I mentioned today. Blayney was the oldest of the Scott brothers and the event that earned him the Distinguished Flying Cross is outlined in the article on the left. It's a pretty unbelievable story!
Happy hunting tomorrow.
Too much to do, not enough time to do it...
PAB wants to combine the simple search (which is actually very complicated behind the scenes, since it does keyword lookups and combines them with supplementary text-searching) with the advanced search filters. This is proving virtually impossible, partly because it's just too messy -- you'd need to retrieve a document set from the keyword search in a separate step, and then filter it -- and partly because I just don't have time to implement it properly before the launch. I'll have a couple more shots at it, but things aren't looking good so far.
Made a few other changes and fixes requested with PAB, and hid the text search box, since it's doing what it says on the box (a text search), and not what PAB wants (a complicated keyword search).
Following a meeting at which we discussed strategy, and decided to focus for now on the Mayoral Pageants, worked with KMF on a range of minor display and rendering issues for primary source documents, including bylines, marginal labels, and text indents.
...on instructions from JS-R.
As planned last week.
Meeting to review the presentation -- my task now is to collapse six slides which begin with the picture of the filecard box into a single stepped diagram illustrating the old encoding process and the horrible binary result.
Started a tutorial based on SNOW1 (for the moment), and in the process of writing the first bit of it, came up against many annoyances in the rendering of egXML blocks; fixed those rendering issues (in three places, site, redesign, and codesharing. Grrr).
Emailed DR with latest changes/additions required for site.
Site in progress.
In Progress: updating site with new course listings 2013-14.
In progress: updating site with new course listings for 2013-14.
Added rendering handling for sp, speaker, and p within sp. The stage tag isn't handled yet. Rolled out changes both to site and to redesign codebases.
The ISE was getting the error
java.lang.NoClassDefFoundError: Could not initialize class sun.awt.X11GraphicsEnvironment
when running xwiki.
It turns out that the existence of quotes in JAVA_OPTS directives causes the option to be ignored. So, for future reference, use
-Djava.awt.headless=true instead of
-Djava.awt.headless="true" when launching tomcat on a headless server.
...on RL's instructions.
Since SNOW1 was a bit of a mess at the beginning, because of the encoders following obsolete examples, I've manually encoded the title page as an example.
Also found a problem with METR1 which was not really a bug, nor an encoding invalidity: a body element which goes straight to content (e.g. a head) with no intervening div is not invalid, but it triggered rendering problems because it was completely unexpected. As it happens, the encoding should not have been that way -- other divs appear later in the body -- but it wasn't technically wrong, so it would be good to figure out a way to prevent this through the schema or more likely through Schematron. We could change the content model of body so that it can only have divs, of course.
Section 2 is now down to 6 slides, with more detail and more extensive notes.
Following Sarah's post, I've done the following:
Here are a few requests for the Names page on the website:
DONE -exclude Lexical Suffix entries
DONE -fix the display of sic/corr, so that only “Wenatchi” displays, not “WenatcheeWenatchi” (See for example the entry for “Sam George”.)
DONE -put flora (plants) and fauna (animals) in the link text at the top of the page
-separate out the sorting into Nx-Eng and Eng-Nx pages. Ideally, users should be able to view the complete list, or any of the six lists by name type, sorted either by Nxa'amxcin name or by English name. The present setup with Nx and Eng names mixed together in the Name column is somewhat confusing. Continue to sort the Nx-Eng lists based on name tags in prons. For the present, exclude name tags in orths when generating these lists. Sort the Eng-Nx lists based on name tags in defs.
PENDING ECH'S FURTHER DISCUSSION WITH CCT:
Please also generate a printable version of the six lists of names by type. These only need to be sorted alphabetically by Nxa'amxcin name - i.e. only include the name tags within prons when generating these lists. Ideally they would be spreadsheets with the following columns:
Name (pron:seg type= “p”)
Source (following bibl ... if the pron:seg type= “p” is NOT subtype=“i”)
Definition (all defs)
Pronunciation (pron:seg type= “n”)
Source (following bibl)
Word Parts (hyph)
Running very fast to stay in same place...
Did some tasks from yesterday and some new ones:
<group>have now been converted to
<div>s. (The only exception is stow_1633, which probably does need
I've implemented the advanced search as a separate page, and got it basically working, although some missing bits in the encoding mean that it's not finding everything it should (e.g. dates are missing @whens sometimes).
1. ES corrected location coordinates for cltf6, aacf3, fraf8
2. ES added transcripts (non annotated) for fraq 7, fraq8, fraq9
1309 page images for CO 60 Vol 13 (in three different sizes) have been added to the collection. These cover the British Columbia 1862: Despatches to London. These will now be linked into the transcription documents.
Work arising from the Providence meeting.
I have these tasks coming out of the team meeting today:
On late duty.
I've spent the whole day working on getting a more flexible and successful build system for eXist. This is what I've added to Greg's script:
Found a number of problems with eXist, which I've reported, including a bad one once the webapp is running: you can no longer call transform:transform with a relative path to the XSLT file, otherwise you get an error. A full path from /db seems to work.
ES added about ten new videos and XML data files, so I had to create a thumbnail image for each. I ran each file in the player.xql file, stopped the video, captured a bit of the screen to a png file, edited that to 88x66 px (size that all of them seem to be) added them to the SVN repository, uploaded them to the production site and the copy of the site on my Mac.
While doing that, I noticed extraneous thumbnail files in the images (as opposed to the images/thumbnails) folder, so deleted those from the servers and from the repository.
We've been running the live db with open access since the last time I rebuilt it, so in the process of doing other updates (such as rolling out the Java sorting collations) I've also added back the protection that we had before. In the process of doing this, I got bitten by the horrible eXist bug which enables you to lock yourself out of the admin account if you edit the admin user and forget to retype the password into the two password boxes (the effect is that you end up with a random admin password that you can never discover). As a result, I had to remove the server version of the app and replace it with a refreshed version of my local copy. This failed the first few times -- Tomcat tries to auto-deploy the app before it's completely uploaded the dbx files, so the uploaded .filepart files can not be renamed to overwrite the ones created by the live startup. It took two or three shots to get this problem solved. The only way seems to be to let it deploy, but stop it immediately in the Tomcat manager; then delete all the dbx, lock and log files; then upload them again; then restart it in the manager.
1) For the linguists' dictionary, we would like to see:
first phonemic representation in bold <orthography in angle brackets> [narrow transcription(s) in square brackets], for both forms and cits - e.g.:
ʔáyx̣ʷt <ʔáyx̌ʷt> [ʔáyəx̣ʷt]
1. be tired
2. tired, worn out
• √ʔáyx̣ʷ-tl kɬʔámnc
he is tired of waiting (for you / me)
2) On the website, we would ultimately like things sorted by orthography.
ES noted that recent changes she'd made weren't appearing on the production site at francotoile.uvic.ca.
I had a connection in the exist admin client that used pear.hcmc.uvic.ca as the domain. I thought that would be dead, but when the connection succeeded, I assumed that domain name was forwarding to the current instance. Wrong. Obviously there is another instance somewhere on "pear" that is still running.
Created a new connection in the admin client using tomcat-devel.hcmc.uvic.ca as the domain and that worked. Also, the webapp in the new instance is francotoile and not francotoile21 as it was in the old instance.
In poking through the files, also noticed a connection string using lettuce.uvic.ca, so changed that to hcmc.uvic.ca and it seems to be working.
Updated the lastpass records.
This morning we decided that a simple and quick way to distinguish between homographs with different meanings is required to make the English lookup part of the dictionary less confusing. This will be achieved by adding a clarificatory word or phrase in the @n attribute of a gloss. Glosses will then be presented in the E-to-M view with this clarification in parentheses. Processing on the website will need to be changed to take account of this, and the print dictionary rendering will also have to be written with this in mind.
Wrestling with similarity metric algorithm...
I've now figured out how to create an extension module for eXist, following the instructions here. These are some things I've learned:
build.sh extension-modules, then drop that jar into an existing eXist instance (although if the new jar was built with a substantially different version from the rest of the code, there could well be problems).
<module uri="http://hcmc.uvic.ca/ns/usm" class="org.exist.xquery.modules.unisimmetric.UniSimMetricModule" />along with the other modules.
I'm not yet happy with my module, and I'm still working on it. In particular, I'm not happy with the scores it's generating, and I think this might be something to do with other bits that get included in the GZIP stream, such as a header; if I can figure out how big those are, I can remove them from the calculation. The highest difference I seem to get is around 0.53 with completely dissimilar strings, so it seems as though the results are being compressed into a range much smaller than 0-1.
After an update to DSM 4.2 rutabaga no longer allowed rsync backups, failing with:
sh: rsync: not found
rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]
rsync error: remote command not found (code 127) at io.c(605) [Receiver=3.0.9]
After much wailing and gnashing of teeth we discovered that non-interactive users do not have /usr/syno/bin in their path (it *is* in their path if they shell in to the NAS, so they can run rsync *from* the NAS when shell'd in).
So, that's an easy fix, says us: add a symlink to /usr/syno/bin/rsync in a logical spot that *is* in a non-interactive path, like /usr/bin.
Problem: admin user cannot su root (error message = su: must be suid to work properly), so cannot create symlink.
Answer: TURN ON TELNET AND LOG IN AS ROOT USING THE WORST POSSIBLE METHOD!!! Then, you make the symlink and turn off telnet - quick!
Late duty, then fighting with @W(*%&($^ Rutabaga which has forgotten how to do rsynb backups. Still not solved. GRRR.
I've written out the prose of the Oxford talk. Still remaining to do before July:
This morning we got nets to set up B047 on the switch. Some time after that 3 machines lost the ability to get a DHCP address - I have no idea if there is a causal relationship between these things.
After much mucking about, it *looks* like it might have been a communication problem between the DHCP server and the machines.
Even forcibly releasing the DHCP lease didn't make any difference.
In the end, I booted the machine with a LiveCD and fiddled with enabling/disabling the network (in the network manger). I got a proper IP and rebooted in the installed OS. That seemed to break it out of the loop.
A bit perplexing and aggravating because I don't actually know what the problem was...
1. ES added transcripts for fraq10, fraf6
2. ES has edited Liette's video, and given it to SA. Corresponding xml file has also been added.
3. ES asked SA to upload all new addition to the production site in order to see if edition with Audacity works fine.
Sent in FMIS report May 2nd re furniture removal. (p/up by May 8, 2013)
May 8: followed up on sent FMIS request re furniture removal specific pick up time. Furniture removed May 8th, 11:00am.
Received computer usage requests from several projects for May-August 2013
New schedule updated and posted online
SA and I met with SA (Rel.St) to discuss migration of current Religious Studies site over to Cascade format.
- RELS current information and design to be replicated basically in Cascade
- discussed various Cascade requirements
- sent info. request to SA (RELS) required for outline
- HCMC:currently preparing RELS structure outline in readiness for submission for approval
On late duty.
Team meeting, at which we discussed the use of ISE's facsimile viewer in MoEML (which will be easy enough to do, although it's based on a traditional db, and we'll have to replace that with proper TEI facsimile encoding).
People also asked me to clarify how the EEBO linking works, so I've done that in the transcriptions documentation file, and I've also implemented the display of little page-images linking to the EEBO pages. Also, during today,
<addrLine> were added to the schema, with some basic display rendering.
Met with PAB and made a number of fixes:
We also made a plan for an advanced search, which I'll document in more detail here before I try to implement it.
When making modifications a couple of weeks ago (see post), I changed only the list view and not the timetable view. I didn't realize that the dropdown for which courses to print affected only the list view (as in the code it is located in the active_area code and not the view-specific code in manage_calendars.
I added code to
- manage_calendars (at about line 12985 - that file is ridiculously big) to add the option to the select in the dropdown
- manage_calendars.php (at about line 118, in the else if(strcmp("display_table",$do_what)==0) branch) to check the setting of the dropdown and take appropriate action
Notice that in the timetable view, the effect of changing the setting take place immediately in the view, then that view is printed; in the list view, changing the setting does not change the display but does correctly filter what gets printed. Not sure if that inconsistency is a bug or a feature which reflects how those two views are used.
The problem of duplicate @xml:id attributes on entries has now become a serious issue for the print dictionary building, because I'm unable to properly process the entire collection properly to produce the book; to build the dictionary I have to use XInclude to create a single XML source file, and when I do that there are over 1600 duplicate ids which prevent some of the processing steps from being successful.
I've taken a quick look at where the duplicates tend to be concentrated, by adding the files in alphabetical order and looking to see how many duplicates occur with each addition. These files create no problems (i.e. they have no duplicates among themselves):
affix_glot-ix.xml affix_k-m.xml affix_n-t.xml affix_u-CAPS.xml c.xml c-glot.xml c-rtr.xml glottal.xml h.xml h-phar-part1.xml h-phar-part2.xml l-affric.xml lex-suff.xml new-data-2013.xml p-glot.xml phar-w.xml qw-glot.xml s-rtr.xml t-glot.xml xw.xml
When I add the remaining files, one by one (and only one at a time), these are the results:
k.xml 100 duplicates. k-glot.xml: 18 kw.xml: 2 kw-glot.xml: 2 l.xml: 3 l-fric.xml: 6 m.xml: 3 n.xml: 97 p.xml: 7 particles.xml: 4 pron.xml: 2 q.xml: 4 q-glot.xml: 3 qw.xml: 1 rescued.xml: 54 s.xml: 2 t.xml: 20 ww-glot.xml: 4 x.xml: 3 x-uvul.xml: 4 yy-glot.xml: 4
What I'm going to do is develop the dictionary output using only the valid files, and then add the others in as they get fixed. In the meantime, it might be worth having a go at some of the low-hanging fruit (the ones with only two or three duplicates). More will show up as we add those in, of course -- there will be duplicates across the currently-excluded files as well as those that they share with the "good" files. So the dictionary PDFs will shrink in size, but I'll be able to start doing things like generating page-references that depend on xml:ids.
Lucene-based fuzzy matching seems to be very broken in the build of eXist I'm using, and in any case it's based on Levenshtein distance, so I've implemented a crude version of the USM/NCD algorithm in XQuery. It's a long way from ideal, though, because it's using base64 versions of strings rather than compressing the actual strings (this is all I can do with eXist's exposed gzip access); using zip seems to be punitive because it would require creating a file on the filesystem or in the db and compressing that. I think a simpler approach would be to take my Java class and strip out all the command-line stuff it contains, then call that directly from XQuery (see the xqSearchUtils java project and the way it's called from the Despatches XQuery for an example). A jar file with a simple XQuery module interface might be very handy indeed.
1. SA found a solution with regards to cutting the soundtrack at the millisecond : Use Audacity! The program was installed on POMME.
2. ES entered & committed the transcripts for cltq3, fraq11, fraq12, fraq13
The call is out, and mine are done.
Working with PS on the MoEML redesign.
...for Rees and Urberg, and reconfigured the Rees structure to allow for abstracts (not available yet).
I've been using the opportunity of the redesign (which gives me a complete new incarnation of the web application working alongside the current one) to fix a whole raft of problems and annoyances going back a long time. Among those completed so far:
declare variable $dataDoc := if (collection('/db/data')//TEI[@xml:id=$fileId]) then collection('/db/data')//TEI[@xml:id=$fileId] else let $dummy := response:set-status-code(404) return collection('/db/data')//TEI[@xml:id='missing'];
<li>elements now have a
class="active"attribute where their target URL matches the current URL.
<div>s are often auto-generated with
generate-id()during the XSLT transformation, they cannot be matched for linking any other way.
First pass through is done. We might have to do some XSLT hacking because these review articles, unlike previous ones, have the details of the books in their front matter, and in the PDF view they're showing up in title format, which is probably too big and shouldn't be centred.
Leaving early to burn off a couple of hours.
I've created a new MosesPhonemicCollation jar for sorting based on the phonemic representations. I've also forked the dictionary build process based on a parameter called "dictionaryType", which can be "learner" or "linguist". The former produces a dictionary based on the orthography, sorted with the MosesOrthographyCollation, and the latter produces one based on the phonemic transcriptions, with the new collation. The "alphabet" guides that run across the bottoms of pages are also appropriately different. I've abstracted the front matter into a separate file, and I'm auto-including the personography, although I'm not processing it yet.
Continuing work on tickets arising out of Providence meeting.
I've done about the first third.
Working with TG to pilot the use of new-style boxes from the 2012 template, faked up in the local site CSS, for some attention-grabbing stuff. Coming along nicely.
Report on the Text Directionality Working Group to Council and to the group, and some more work on automated content in the Guidelines.
The idea of having a single collation to sort everything in our db is now impractical, because the orthographical sorting rules clash with the transcriptional sorting rules, so I've created a new, simpler MosesOrthographyCollation class for sorting the orthography only. It's working well, but there are still some outstanding questions about it. In the meantime, we can't update the website because we don't have orthographies there yet, so this is only going to be used in for the print dictionary generation.
This has had to be redone a couple of times due to changes in the list of glyphs, but it's working now and tested with the print dictionary system.
We concluded that we need a different alphabetical order for the community dictionary vs. the linguists' dictionary.
The community dictionary should indeed follow the order in the 2006 language program dictionary - that is:
a aa ə əə č c cʼ h ḥ ḥʷ i ii k kʼ kʷ kʼʷ l lʼ ll llʼ ɬ ƛʼ m mʼ n nʼ p pʼ q qʼ qʷ qʼʷ r rʼ š s t tʼ u uu w wʼ x xʷ x̌ x̌ʷ y yʼ ʕ ʕʼ ʕʷ ʕʼʷ ʔ
The linguists' dictionary should follow the order in MDK's 1981 dictionary:
ʔ a ạ c c̣ cʼ ə ə̣ h ḥ ḥʷ i ị k kʼ kʷ kʼʷ l ḷ lˀ ḷˀ ɬ ƛʼ m mˀ n nˀ p pʼ q qʼ qʷ qʼʷ r rˀ s ṣ t tʼ u ụ w wˀ x xʷ x̣ x̣ʷ y yˀ ʕ ʕˀ ʕʷ ʕˀʷ
Today I got the following bits working:
Next is the implementation of the English-Moses glossary.
Started the project by doing a trial manual markup of RIII to identify all placenames, and link the London ones to our DB. It took 110 minutes to get the markup done on a bare-bones TEI rendering of the play (from the search project). I'll do one more of these before I start working on NER approaches, so I can compare the two processes.
Back after a wee while!!
1. Links to video files were sent to Essen for his review before publishing them on the site.
2. SA and ES discussed the need of finding an editing software that allows cutting at the millisecond or hundredth of a second. ES suggested "Video Edit Master" which seems to be a free software. Ongoing.
3. Files in "media folder" POMME have been updated with the latest information existing on the server.
4. New xml files entered for pscf5, accf3, cltf6, fraf8
5. Transcript for cltq3 is done in txt format. Needs to be entered into xml file.
One q re missing page numbers sent to HT.
Working on automated generation of some stats-related sections of the Guidelines.
Trying to get the Moses article finished in a timely fashion...
Proofing and rewriting. I've now finished section 4. The conclusion remains to be written, and the intro will have to be reworked at the end.
Team meeting to finish up the semester.
Some of my XSLT changes had undermined the presentation of primary sources, so I've re-inserted a lot of the template modes I was using before. I've also added a div wrapper for non-primary-sources (class="bornDigitalDoc") so PS is now able to work on the redesign without much input from me, without overriding styles in the primary source display.
Worked through sections 1 and 2, and into 3, merging previous changes and suggesting more.
Geru department asked me if it is possible to choose to print the offerings that are not approved; currently they can print offerings that are approved and all offerings. I figured easiest way would be to add a "not approved" item to the drop down list in addition to the "all" and "approved" items. I
- added the appropriate line to active_area.php for the GUI
<option selected value="notapproved">Use NOT Approved Course Offerings When Printing</option>
- added the appropriate sql in manage_calendars.php
$restrict_sql = " AND offerings.approved = 'No' ";
Works. Had a couple of copy and paste problems resulting in the wrong option being selected when the list page reloaded itself after invoking the print list view. Now seems to be working
Set up headers, front, and basics. Text comes next, and biblio.
Regular Skype meeting.
Today I've basically finished setting up the code for the redesign. I've reworked a bunch of XQuery, XSLT and CSS to make the whole site actually functional (although ugly) in the redesigned setup, and got the menus working, including the page content menu and the dropdown. I've broken out the CSS into several distinct files, leaving one for PS to work on as he does the design. In the process, I've rationalized and centralized a lot of the XQuery, so there's less of it than there used to be.
Ordered on-line office supplies from G&T
Delivery date Monday, April 22
Pushing forward with MoEML redesign work.
The redesign pages are now showing page contents submenus and a list of people in credits (where they're encoded as respStmts). I still have to rework some of the basic document rendering to introduce sections and headers, but we're on the right track.
Work. Lots of it.
After checking it out with UComm, I've been trying to reproduce the rather nice two-column text boxes available as (I assume) a widget in the new template on the existing History site, for JW and LM, who want to put some on the home page. I have them working OK, but I'm not sure the department are going to want to use them. Took a bit of hacking to make them size correctly, and in the context of the old template they don't look as nice as in the new.
I now have some basic rendering for definitions and examples, so we're getting closer to something that looks like the final product will look. There are two chars missing from the fonts, but the author of the fonts may be able to add them for us (yay!). Other than that, things are looking good.
After extensive period of time working on this site, importing content from
old format to Cascade and from moving site/content to another new Cascade format
this site has been submitted to Communications for review/approval.
Invoice, payment and deposit submitted re HB/HCMC/UVic contract for
HCMC copies on file.
Added a video on RL's instructions.
Still catching up after last week...
The author of the Aboriginal fonts kindly fixed the r-with-caron rendering problem we'd identified, within an hour of our reporting it, so it looks like that will be our font of choice. Tested and working now.
Met with UComm and Humanities folks to plan the next phase of faculty website redesign.
Working on getting Specs files to validate to help prevent some broken commits.
We now have basic pages rendering in the new format so PS can start working on the images, logos and CSS.
There's one review article still waiting to be done. No articles yet.
Lots of catch-up to do after last week's TEI Council meeting, and SVN conflicts kept me from leaving for ages.
PS and I worked out a document structure on the board, so I can get a design area working on the server. I set this up in the controller-config, and then started moving stuff around in SVN so that the redesign materials on the server are in the same place as they are in SVN. This got me into a bunch of SVN conflicts which took hours to resolve. Grrr. Working now, hopefully.
Meeting with library folks re collaboration plans.
Encoding team will work on the three docs under way (le_blanc, le_bon, and ville_t) while I'll work on the port to eXist; following that, we'll start on the modern spelling plan.
Delivered in person timesheets to payroll.
The word "δαίμων" appears in Iliad 5.431, and is tagged as term. Because of this, the sentence in the reader is missing a noun: "When he was coming on for the fourth time, equal to a."
I have added in "Daimon" is Butler wrote it in his translation. If this convention is to be followed, we'll probably need a tag for this demigod type word.
We spoke at various points about creating a list of mythical half living potentially magical objects that seem to have agency or power in some odd way:
I think this post might refer to the same thinf -- http://hcmc.uvic.ca/blogs/index.php?blog=45&p=9810&more=1&c=1&tb=1&pb=1Well, I've another one. Iliad 5.416 - "Ichor"
Received request from RS for position posting.
Updated site with information.
Leaving early to prepare for TEI Council meeting trip (leaving tonight).
When adding events to events.xml you'll need to browse the existing events to make sure you aren't adding a duplicate. Here are a couple of special searches you can use to look through the events file a bit more rapidly:
1) If you want to find all events that take place in, say, Crete, you can run an XPath search (there's a search field at the top-left of oXygen) to do it. That search looks like this:
Paste the above line in its entirety in to the XPath search field and hit 'Enter'.
To search for another place, change crete to the xml:id of the place you want to find - leave the quotes alone!
2) Something similar can be done for characters:
Again, paste the above line in to the XPath search field and hit 'Enter'
3) You can combine this functionality, too. To find all events which include Eurystheus and Mycenae, try this:
//event[(.//persName[@corresp='characters.xml#eurystheus']) and (.//placeName[@corresp='places.xml#mycenae'])]
That's all one line, and it goes in that XPath field.
Could someone take on the task of sorting out Heaven and Mount Olympus?
They are, as far as I can tell, interchangeable but this needs to be confirmed. If it is confirmed, we need to resolve all of the texts and events to reflect the change.
The problem is illustrated by going to Pausanias Desc 1.29.11 and clicking on Heaven, look at the citations that come up: all Pausanius. However, there are events associated with it that take place in heaven, but there's no mark up for, say, Apol. 1.6.3 that distinguishes heaven.
Spent all day fighting this, but I now have polygons working. To experience it for yourself, click through to the map, click on Peloponnese - BAM! Polygon.
While doing that I also ripped off an OpenLayers applet for creating polygons:
Now we can begin to create regional outlines for other areas, like Boeotia and Arcadia and so forth. There's a handy Wikipedia article on this precise topic here:
After looking with SMK at the way some characters are rendering in the PDF, I've created a function called hcmc:renderingFixups() which does some character substitution. Specifically, we're replacing i-with-dot-below + combining accent aigu or grave with i-with-accent plus combining dot below; and a similar thing with i-with-short-stroke. This last may not be as pretty as we hope, and there are still outstanding problems with the dot below m (off centre to right) and l (off to the left). We may be able to fix the latter by flipping briefly to another font, but that's very ugly. Still, these are minor issues, and so far CharisSil is working well for us.
Looks like CB and GB-S were both working on references.xml this morning at the same time. Had to manually merge some changes because CB couldn't commit her files.
Possibly a project coming to update HotPot to HTML5 and small-device support; and some blue-sky thinking for a bigger idea.
Leaving a little early.
I thought we had set these up before, but apparently not. Now tested and working. We also worked through some validity problems with the MC text, and worked on some CSS for dropcaps etc.
Added him to the coldesp group so he can upload maps while I'm away. Reminder to self: unsigned jars bug is caused by using OpenJDK 1.6; switch to 1.7 to avoid. This is a per-user setting on the machine.
...for the Council meeting.
SMK suggested CharisSIL, which I've downloaded and tested with XEP; it looks good. I'll now try integrating that into FOP. I've also updated the Collator after discovering that we had not included barred lambda without a following apostrophe, causing one (actually erroneous) example of that to be sorted to the beginning, a mystifying thing for a while. I've made a little more progress with the rendering of entries, but I don't have a model to work from yet so it's just exploratory.
Up to now, we've been using the excellent Gentium Plus fonts for our dictionary website, and they're working well; they're attractive and cover all the glyphs we need.
However, GentiumPlus comes only in regular and italic flavours; there's no bold version of the font. When we use "bold" on the website, the browser renderer automatically fattens-up the regular font to give the impression of bold; it's not particularly pretty, but it works. Unfortunately, that can't be done when generating the PDF. Neither Apache FOP nor the commercial XEP generator we use has the capability of automatically generating a bold version of a font from a regular version (and they would presumably argue that it shouldn't be done, because it's ugly, and would be especially noticeably so in print). So we're faced with three choices in the print dictionary:
We'll have to think carefully about this. I've written to the team for thoughts and suggestions.
...for jTEI. Interesting, and sent me looking at the ODD system in more detail than i have for a while.
Just reminding myself: I need the MosesCollation.jar file to provide the sort collation for the dictionary, and in eXist this is found automatically as long as it's in the WEB-INF/lib directory. However, locally, I have to add it manually to the transformation scenario -- on the first tab, click on Extensions, then Add, and find the jar file.
Arising out of the meeting last week is the idea that we could include auto-generated information on recent changes to the db. This should presumably be an XQuery library which digs out the following:
This should be written to be completely configurable so it can be used in any project, and should provide output in the form of an XHTML5 fragment or an RSS feed.
Working out transportation from the airport, booking bus tickets, printing maps...
This has turned out to be quite a long document, and I have two sections left to do. Once it's finished, I can start building the presentation for Oxford.
Collected stats for the first quarter of 2013.
Began preliminary work on the Dates presentation.
DHSI requires a coursepack, and for that we needed PDF renderings of the presentation materials. I've spent most of today writing XSLT-to-FO and testing it with FOP to generate this stuff. It's not perfect, but it'll do, and it will be useful in future.
Talking in KSW's class.
Did a brief introduction to DH in KSW's professional writing class. The presentation will be handy to have, and can be expanded in future.
Old URLs for www.tapor.uvic.ca were being used to call on resources on the home1t filesystem; I've now switched all of those to hcmc.uvic.ca. This should also be done for other projects such as Mariage.
I've finished the first draft of my bits -- over to ECH...
Worked with CB to figure out how to handle new roles in the personography, categorization of bibls, and the retirement of the old links page (which will free up the bibl/@type attribute for categorization, because the only place @type='replace' was being used is in the links page).
Found this website today while hunting for sources on obscure characters: http://mythagora.com/who.html
This site isn't official and has no University affiliation, but is pretty comprehensive and a useful counterpoint to Theoi
In 5.76 of the Iliad, the term "δῆμος" appears mid-sentence. Other instances of greek terms that appear tagged as "Term" have had a replacement english word that appears in the reader. This instance, however, does not. The line in the reader appears as follows:
"Dolopion, who had been made priest of the river Skamandros, and was honored in the as though he were a god"
In the Butler translation from Perseus, "δῆμος" appears as "Davos" (and was honored in the davos as though..) Other versions of the Butler translation that I've found on various school sites have rewritten the phrase to say "and was honored among the people as though..."
I left it as is for now, not knowing whether to use the Perseus Anglicized-greek or the other options.
Needed quiet time to work on article draft.
Redrafting the third part of the article encouraged me to do even more refinement of the schema; I've now nailed down the use of @type and @subtype more thoroughly, and tweaked the encoding of
<gloss> as a result. It uses @subtype="i" instead of @type, for consistency with
<phr>, even though it makes no use of @type (at the moment). The article is coming along, and I hope to get the first draft of my bits finished tomorrow.
There are a handful of passages from the original Frazer translation of Apollodorus that have been withheld from our version due to their context being unclear (i.e. block quotations which interrupt narrative flow). They are as follows:
Epitome 1.1.21: There is a sizeable portion here on the marriage of Pirithous and the fight of Theseus with the centaurs that has been removed. Frazer's footnote on this section comments that:
"This passage concerning the fight of Theseus with the centaurs at the marriage of Pirithous does not occur in our text of Apollodorus, but is conjecturally restored to it from Zenobius, or rather from his interpolator, who frequently quotes passages of Apollodorus without acknowledgement. The restoration was first proposed by professor C. Robert before the discovery of the Epitome; and it is adopted by R. Wagner in his edition of Apollodorus."
I checked the Perseus version of this section and the way that they have handled it is to include the omitted section as a quotation from Zenobius, complete with Frazer's explanation above.
Epitome 1.6.15a, b and c: These sections are quoted paragraphs from another work, Tzetzes's Scholia on Lycophron. Frazer includes in his footnotes for this section that:
"The following three paragraphs are extracted from the Scholia on Lycophron of Tzetzes, who seems to have borrowed them from Apollodorus."
Perseus has handled this by having separate tags for sections 15a, 15b and 15c in which they present this material as a quotation, with Frazer's citations, but as distinct from the original narrative.
Ostensibly though it is from other works, the truncated material is attributed to Apollodorus. The missing material in 1.1.21 does actually add useful context to that passage, but for the sake of clarity it may be best to find an unobtrusive way of presenting these as quotations belonging to other authors who may in turn have borrowed them from Apollodorus.
Battling mightily with a fresh build of eXist on the new Tomcat server. Defeated it eventually.
Built a fresh checkout of eXist trunk, and reworked the app to run in it. Had to change relative paths to XSL files in XQL files to full paths from /db for some reason. Also fixed a bug in the personography rendering, which after much confusion turned out to be caused by my having moved the schemas around in the db. Only thing left to do: add password protection.
Finally registered my annoyance at the presence of the Tomcat favicon, and created a little loupe-based thing based on the banner.
All current Tomcat-based projects have now been moved to Peach, with the help of RE, who has re-pointed all the relevant domains. The IALLTJournal project has been retired, since it's no longer in use, as has the old version of Francotoile, which was built on eXist 1.4. This was a fairly slow and careful migration over a couple of days, and we expect no problems, but Pear will continue to run for a little while just in case.
...for KSW's class, with SA. The biggest problem is the brevity of it (10 mins).
Refreshed the copy of the app on Peach, then updated all the data and tested codesharing. Then worked with RE to get the domain repointed (after server updates), briefed the team before and after the change, and tested the results. All good so far.
Leaving a little early today.
And better tested and documented. I think it's finished now, but we've also learned enough to be able to update it easily. I also learned that if I compile it targetted at Java 1.7, it will cause errors on Pear, so I compiled it for 1.5.
It's remarkably fiddly to get all this stuff right.
On RL's instructions.
In book 5 and elsewhere, the name "Pallas" has been coded alongside the entity
The Pallas tags in book 5 are void, in that Pallas Athena is just Athena, rather than Pallas the Nymph and Athena the Goddess. When I remove these incongruities, do I also delete the entity  ? If it serves a purpose outside of information retrieval for Pallas the Nymph, I wouldn't want to ruin it.
This took way longer than I expected, and I ended up resorting to tables to keep the layout under control. Not ideal, but nothing else would work.
On RL's instructions.
Regular team meeting followed by a discussion of maps, witnesses, layers etc. One new idea is the possibility of allowing users to "punch a hole" through one layer to see a layer beneath, just in one area. That would get around the busy-ness of multiple overlays. JJ and KMF are still working out exactly how the base map should be constructed, although it should definitely be from the pieces we're currently working with.
Cleaning up the codebase and web materials following the migration to the new Allura system on SourceForge.
I think we decided on Monday (talking with Ewa) that you should go ahead and use the full def:segs instead of the glosses in the "get related words" and "other entries containing this morpheme" lists. The glosses are only intended as headwords for the English-Nx word list.
Our example was cìqqnúnn, with the def:seg "I accidentally dug up something". The gloss tags are "dig", "dug", and "accidentally". We want the Nx word to appear (with its def) under all three headwords in the English-Nx word list, but we don't want a learner to look at the "get related words" list and think the word can be used to mean just "accidentally".
KMF and I have been working on the Metropolis Coronata, which is going to be our first model for transcription of primary sources, both for encoding practices and for rendering templates. We have the first couple of pages working, with models for encoding the following features:
The rendering pipelines now fork into two, with one set of templates being applied to all our regular born-digital documents and a new set of templates applying to primary sources, using a mode="primarySource" attribute, and the primary source templates have been spun off into a separate XSLT file. At the moment, this fork is triggered by the presence of a
<titlePage> element inside the
<front>; born-digital documents obviously don't have title pages, but primary sources do.
When we've finished the Metropolis Coronata, we'll turn it into a tutorial piece for everyone working on the primary sources. It's quite short, so it shouldn't take long.
Worked on handout documents as we Skyped; next meeting tomorrow to discuss inclusion of MK book chapter in handout.
While fiddling with the images we'll use for the stitching I decided that having them transparent will provide all sorts of advantages at every step.
Being as how it isn't an entirely transparent (sorry) process I figured I'd document it:
These instructions are for the GIMP (I'm using 2.8)
1. Open line art image. Select all and copy.
2. Choose the Channels tab in the toolbox and create a new channel (it will show up as 'New Channel' in the region below the red,green and blue channels). Set opacity to 100%. Window will turn black.
3. Paste clipboard content to new channel (your image should re-appear).
4. Make the channel called 'Alpha' (in the top region containing red, blue and green channels) invisible. Your image should now appear to have a transparent background.
5. Choose the Layers tab in the toolbox and right-click on the layer titled 'Floating Selection'. Choose 'New from Visible'. A new layer called 'Visible' will show up in the layers list.
6. Back on the 'Channels' tab, make the 'Alpha' channel visible again and make the 'New Channel' (in the list below red, gren,blue and alpha) invisible.
7. On the 'Layers' tab delete the layers titled 'Floating Selection' and 'Background'.
8. Save as PNG.
Fighting with Tomcat 6. We really have to get moved over to Peach asap...
I've now defined the list of mandatory components in the response XML, leaving the rest as optional, and created a schema and RNG file which incorporates Schematron which insists on their presence. I've tweaked some variable names, fixed some more bugs, provided more helpful components on the web page, and moved the core files into their own directory. I'm now trying to get it working on the server, and although the XQuery is working and XML results are coming back (albeit slowly), the XSLT transform is not working on the server; I think this is because of path interference from the site configuration, which places /site/ at the root in the controller-config.xml. That's a bit annoying, but I can probably work around it somehow if I have to. It could also be caused by Tomcat 6, which is increasingly revealing bugs that are not there in 7. On tomcat-devel, it works fine.
Met with GN and KMF to begin the process of creating a new stitch-up of the map pieces. KMF will now discuss the history of the pieces we have with JJ and decide on a strategy. Most likely we will not attempt to match edges exactly, but rather leave enough space between sheets so that there are no jarring effect. Tested the use of Hugin to auto-stitch, but it was not able to identify control points.
Met with JL and Jim Kempling to discuss A City Goes to War project. They have obtained almost $100K.
They want a WordPress site based on the Victoria's Victoria model. A framework into which student projects will be integrated. So issue #1 is how to migrate student projects from a dev instance(s) of WP to the production instance.
They want to have a centralized searchable repository of photos with metadata and images of documents with metadata.
They want to have a database of military records (attestation papers, which includes sections: Questions put, Declaration by Man, Oath taken by Man, Certificate of magistrate, Description on Enlistment, Certificate of Medical Examination, Certificate of Officer) which is searchable/filterable by various criteria (to be provided by JK).
Data structure ideally is interoperable with the Canadian Great War Project.
Ideally the search interface will search both the db of military records and the metadata of the repository of images, if possible.
I've moved the NetBeans project for the RuleBasedCollator class MosesCollation into our SVN tree, and then I rewrote it to include orthographic characters and English (upper and lower case) so that we can sort all three types of string using the same collation.
I had to do this twice because the current version of NetBeans from the Precise repo, which was 7.0.1, has the most disastrous bug imaginable: no file changes are saved to disk. This is undetectable while you're running it, because file buffers are changed, and jar files are built from the buffers. I lost all my work to this bug, and had to repeat it after removing the repo NetBeans and installing 7.3 from the download installer.
On late duty, and meetings all day so needed to make some actual progress with code before going home.
Discussion of print dictionaries, collation for orthography, vacations and timing, and other issues.
Fixed several bugs, added a returns-per-page setting, began more detailed documentation (not sure how to handle that yet -- ODD/RNG seems impractical), added handling for embedded egXML elements, and prettied-up the interface a bit.
Another meeting with KFM and PS.
Opening old NetBeans 6 projects in NetBeans 7 resulted in errors because the JUnit libraries couldn't be found. I had to right-click on Test Libraries, choose Add JAR/Folder, then choose /usr/share/java/junit4.jar to resolve the missing dependency. Took a while to figure that out, curses...
We sort our Moses entries currently based on the phonemic representation, using a Java comparator I wrote specifically for the project. Now we're going to have orthographical representations, the sort order will have to be amended to take account of that. I'm therefore reviving the NetBeans project for the MosesCollation, and beginning to update it.
This is the current sort order.
We'll need to add the following characters to the list:
I'm now trying to get the Gentium Plus font working with FOP, to handle our non-ascii characters. It is possible to use the "simple method", which involves giving the FOP processor a special config file telling it to parse the system fonts to find the font it needs. However, because I want to be able to make PDF generation a part of the portable webapp, I need to do it in the "hard" way as well, so I've started figuring it out.
The process is confusing, because FOP behaviour has changed. It seems to involve three steps:
java -cp lib/fop.jar:lib/avalon--framework-4.2.0.jar:lib/xercesImpl.jar:lib/commons-logging-1.1.1.jar:lib/commons-io-1.3.1.jar:lib/xmlgraphics-commons-1.5.jar:lib/xml-apis.jar org.apache.fop.fonts.apps.TTFReader /usr/share/fonts/truetype/gentium-plus/GentiumPlus-R.ttf GentiumPlus-R.xml java -cp lib/fop.jar:lib/avalon--framework-4.2.0.jar:lib/xercesImpl.jar:lib/commons-logging-1.1.1.jar:lib/commons-io-1.3.1.jar:lib/xmlgraphics-commons-1.5.jar:lib/xml-apis.jar org.apache.fop.fonts.apps.TTFReader /usr/share/fonts/truetype/gentium-plus/GentiumPlus-I.ttf GentiumPlus-I.xml
Still working on the second and third steps...
Wrote, tested and delivered a utility XSLT file for handling problems in the glottal.xml file, per SMK's request.
One late duty, and then engrossed by XSL:FO.
I've started work on the XSL:FO/PDF generation code, working with some test files for which I've auto-generated the orthographies. I have a basic layout done, for letter-sized paper, and a parameter system built in which enables me to add other paper sizes later. I'm working with FOP, because if we can get what we want with it (and it's looking good so far -- columns work) then we can deploy anywhere. The biggest hurdle right now (after reminding myself of how page-masters and sequences work) is getting the Unicode characters to display correctly. That's next on my list.
We have a large number (126) HBC images in our incoming collection. Of these TB has processed eleven, and marked up five so far. Those have been added to the maps folders, both locally on my machine (and therefore backed-up to Rutabaga) and on the coldesp account on home1t. This is documentation of how such images should be processed prior to being marked up with the Image Markup Tool.
process_map_images.shwhich you run as follows:
trunkfolder in their svn repo (without adding it to svn, of course). Then they can clone an existing image markup file, tweak the metadata in Oxygen, then in the Image Markup Tool, delete all the boxes and replace the image. Then they're ready to start annotating.
I've written some XSLT to scale the image markup for any maps which were marked up with larger-scale images than we conventionally use on the site. This may be handy in future too. I still need to do the actual scaling of all remaining HBC maps that have been added to the system. That'll have to be handled through a script.
TB has been working on image markup, but the HBC images on which he's working aren't yet in the system. I've processed and uploaded the 11 he's picked out, but I should probably convert all the remaining ones to the right sizes too, for future use.
Takes longer than you think, every time.
Following TEI SourceForge conversion to the new system, with new URLs, reworked the Jenkins setup and build script, and started setting up to test the latter.
Fixed some bugs introduced by the transformation from rend to style (intrusive default attribute values), and implemented a Schematron schema with three rules (so far) to help catch typos. Fixed many of the typos it caught.
Wrestling with security issues around the Facebook Like button.
Fixed a bug in the search results whereby a result found in the personography was linking to a page that could not display. At the same time, I decided to improve the search functionality so that now each hit in a document links to the anchor of that hit (as in Mariage). The only wrinkle here is that hits in editorial notes are not displayed, but we'll wait for a clearer idea of the new layout before we work on that; I think we should probably be including notes in static form in a list at the bottom, as we do in Mariage.
Skype call with SB, and then spent an hour preparing a second version of a handout for Oxygen because we cannot get a straight answer on which version of Oxygen will be available in the lab we'll be teaching in. This is incredibly frustrating.
Met with KFL and PS. Still working on the layout and design, but we're getting closer. PS will come back next week with more mockups, and meanwhile I'll build the dev area in the webapp.
I've converted all the documents and updated all the XSLT, and the site appears to be working fine both on Peach and on Pear, but the app won't start on my local machine, which is worrying. I need to port this to a new eXist asap.
Spoke to LS about deploying a WP instance on web.uvic.ca and moving away from DWT setup.
Recommended that they put in a proposal for HCMC to port the DWT to WP theme.
Leaving a little early.
Met with JT and PAB about the Beck site. All of us, particularly PAB, are busy right now so we've decided to postpone the Cascading of the Beck Trust site until the summer, when there will be more time.
More refinements to the orthography generation. We've now decided that we should base the orthography on the hyph rather than the pron, because that way we can perhaps insert intrusive schwas more easily; morpheme boundaries appear to be significant for this purpose. I'm also orthing the phonemic phrases in citations, and since these are partly hyphenated, I'm handling them slightly differently, splitting on the morpheme boundaries, but using the same conversion code. I have a hook in for the schwa insertion if we can formalize the rules for it.
Both HT and JT are open to the idea of rolling publication. We're still talking about whether it should be just reviews, or everything.
All the articles in volume 19 of the Scandinavian-Canadian Studies Journal are now available on the journal website http://scancan.net.
This is a group of characters and they have a group entry, but they also have a character entry and both are used - the former in the text and the latter to list them as the children of Apollo. I spoke to Greg about it and initially it was assumed to be a mistake, but it seems that both are used because there's currently no obvious way of linking genealogically to groups.
Edit: Greg has written new code so that groups may now be listed as having a genealogical connection to a character.
Edit 03/27/13: This list encompasses Apollodorus Library 1-3 and Epitome. As I begin to work on the Iliad, I will add the cases I find there as well.
[Original Post] I've encountered some instances in the text where siblings are listed, and then it it suggested that in fact rather than being siblings, one of the individuals was the offspring of one of the others. For example. Europa is listed as both the sister of Phoenix and his daughter, based on Apollodorus saying her father was Agenor, but others (Apollodorus doesn't say who, but it's substantiated in the Iliad) claim it was Phoenix. I have expanded the descriptions for both of these characters in an attempt to disambiguate their relationship.
Some of the children of Minos (Catreus, Deucalion, Glaucus, Androgeus, Acalle, Xenodice, Ariadne, and Phaedra) are listed as having two mothers - either Pasiphae or Crete. He does have other children by other women, but these are the offspring specifically by his wife, who is alternately Pasiphae or Crete depending on the source. Apollodorus gives it as Pasiphae, but notes that Asclepiades claims she was Crete. Do we want to include information of conflicting accounts like these, and if so how do we best present it clearly? Additionally, Crete is currently associated with event 618, "Minos marries Pasiphae and has children." This seems a little confusing.
I know there's a number of similar conflicts, and I've spoken with Greg about it at some length. At this stage I don't think we're making any significant changes, but I'm going to document the cases I find for future reference.
Edit 03/12/13: Children of Oedipus (Antigone, Eteocles, Ismene, Polynices) are born to him by his mother, Jocasta; Apollodorus makes a vague reference to another version of the tale in which the children's mother is Eurygania, who according to the source is either a daughter of Hypherphas, an alias of Jocasta, or a woman Oedipus married after Jocasta's death. She is currently listed as one of two mothers of Oedipus' children, but based on my research she's used extremely infrequently and most sources (including Apollodorus) agree it was Jocasta.
Edit 03/15/13: Callisto listed alternately as the daughter of Lycaon according to Apollodorus, Nycteus according to Asius, and Ceteus according to Pherecydes.
Elatus and Aphidas are listed with multiple mothers. Their mother is either Leanira, Meganira, or according to Eumelus, Chrysopelia.
Lycaon, is listed with multiple mothers: Meliboea and Cyllene.
Atalanta is listed as being the daughter of three fathers - Iasus, Maenalus, and Schoeneus. Additionally, Euripides attests that her husband was Hippomenes and not Melanion, as Apollodorus claims. Further, Apollodorus comments that the father of her son Parthenopaeus was either Melanion or Ares.
Idas is listed as the son of two fathers: Aphareus and Poseidon.
Aesculapius is listed as the son of two mothers: Arsinoe and Coronis.
Edit 03/19/13: Helen is listed as the daughter of two mothers, Leda and Nemesis.
Tithonus, Lampus, Clytius, Hicetaon, Priam, Hesione, Cilla and Astyoche all have three mothers - Strymo, Placia, and Leucippe.
Hecuba is the daughter of three fathers; alternately Cisseus, Dymas and Sangarius. Additionally, her mother is given as Metope, but only in the pairing of Metope with the river Sangarius. I don't think Metope is inferred or mentioned as her mother if either of the other two men are her father.
Telamon has two mothers: Endeis and Glauce, and likewise two fathers, Actaeus and Aeacus.
Edit 03/20/13: Menesthius has two fathers, Peleus and the river Sperchius.
Patroclus has three mothers: Periopis, Polymele, and Sthenele.
Adonis has three mothers: Alphesiboea, Metharme, and Smyrna. He likewise has three fathers: Cinyras, Phoenix and Thias.
Erichthonius has two mothers: Athena and Atthis.
Phineus has two fathers: Agenor and Poseidon.
Amphitrite has two mothers: Doris and Tethys, and two fathers, Nereus and Oceanus.
Erectheus has two mothers: Gaia and Zeuxippe.
Aegeus has two fathers: Pandion and Scyrius.
Edit 03/22/13: Sciron has two fathers: Pelops and Poseidon.
Hippolytus has two mothers: Antiope and Hippolyte.
Persephone has two mothers: Demeter and Styx.
Palamedes has three mothers: Clymene, Hesione and Philyra.
Agamemnon and Menelaus have two fathers: Atreus and Plisthenes.
Tydeus has two mothers: Gorge and Periboea.
Edit 03/26/13: Tenes has two fathers: Apollo and Cycnes.
Hippothous has two fathers: Lethus and Pelasgus.
Sarpedon has two mothers: Europa and Laodamia.
Briseis has two fathers: Briseus and Chryses.
Rhesus has two mothers: Calliope and Euterpe.
Tisamenus has two mothers: Erigone and Hermione.
Circe has two mothers: Perse and Perseis.
Edit 03/27/13: Pisinoe, Aglaope, and Thelxiepia have two mothers: Melpomene and Sterope.
Scylla has two fathers: Phorcus and Trienus.
Pan has two mothers: Penelope and Hybris, and two fathers: Zeus and Hermes.
Hellen has two fathers: Zeus and Deucalion.
Endymion has two fathers: Zeus and Aethlius.
Aegialia has two fathers: Adrastus and Aegialeus.
Opheltes has two mothers: Amphithea and Eurydice.
Amphitryon and Anaxo have three mothers: Astydamia, Hipponome, and Laonome.
Augeas has three fathers: Helios, Phorbas and Poseidon.
Hippomedon has two fathers: Aristomachus and Talaus.
Perieres has two fathers: Aeolus and Cynortes.
Too busy auto-orthographizing to go home on time...
Wrote the orthography-generating XSLT, and tested and refined it with SMK working on the l-affric file. The only remaining outstanding questions are: what to do with dotted n (change to nn, or keep as-is, or remove the dot); and how/whether to insert the extra schwas we see in older examples of the orthography.
Also tested some XQuery to determine how practical it will be to link some morphemes in hyphs automatically to their source morpheme. There are many instances where a particular string has only one existing morpheme link, so there are lots of candidates. My XQuery could be used to build a lookup table for all instances of a string which has a single existing corresp, and we could use that to auto-link a lot of m elements.
Met with JJ, NP and SM re dates. Fantastic example from Stow of 1200 (Julian) + regnal date. Resolved that NP and SM will build a table of Stow's understanding of regnal dates, and use that in their encoding, and we will then document and map that as appropriate. Lots of meat in this example for the presentation -- find it and mark it in the 1598 text!
Working through changes.
Attended with SA information session re news & events issue.
BKB will reply when process is clarified.
Manipulating photos requires: downloading from "old" site to Cascade; organizing (72 photos) varying in size; resizing, uploading resized photos to Cascade appropriate folder (images/photos/misc), inserting photos; updating links
Continued refining the names page, in response to feedback, including adding print settings in the stylesheet to hide the menu stuff. Also added counts of entries into the status page, so that we can see where we're at.
Catching up after conference trip.
Working through changes, inserting photos, editing research page, correcting right-column location
SA, JN attending Cascade session tomorrow to sort out site's news & events issues
On the plane yesterday I worked on the rewrite of the Names page, which is now working. It's a sortable table, showing all things tagged as names, along with links to their entries. In the process I brought in some JS for the table sorting, and modularized some of the entry link display code. I also found some bits of code which look obsolete, that I might be able to get rid of; I should do that asap. Uploaded the new code this morning, then worked on a bug with the menu display caused by my changing some of the GET params.
Took one day of vacation (Monday) following the ICLDC3 conference. Tuesday was travel-home day.
main content points at the block
the box in the right column points to accordian body block
Received batch #15 from DR
- changes to navigation implemented
- manipulating photos
- news & events issues in progress
- editing spelling
- inserting new content
Batch #14 changes/additions received from DR.
Inputting content; editing.
Navigation changes/confirmation in progress.
All T4's now distributed or picked up in person.
Received request from SB (EMRC) for new page accompanied by new content and
new plate image(s).
Created a new page; chose plate image, inserted image and graduate and postdoctoral research information.
Sent SB email (cc'd EK, SA) with site updates. Waiting for approval before making new page public.
Need to create a new departmental account for hispanic and italian to host all their practice tests and exercises. Current instance is in their departmental account. When the site migrates to Cascade the URLS pointing to that site will be automatically bounced to the Cascade site. There are zillions of files and we don't want them all appearing to be sitting inside the department site, so we'll write absolute URLs from the new dept site to the new instance of the exercise files.
- additions to Undergraduate section (confirmation pending re layout of awards and scholarships additions)
- new arrangement and content inserted in alumni page
- reviewed news and events issues
- internal site review scheduled for next week
Iliad 3.39 "Did you not, such as you are, get your following together and sail beyond the seas ? Did you not from your a far country carry off a lovely woman wedded among a people of warriors - to bring sorrow upon your father, your city, and your whole district, but joy to your enemies, and hang-dog shamefacedness to yourself? And now can you not dare face Menelaos and learn what manner of man he is whose wife you have stolen?" In terms of an event that occurs in the temporal run of the text, this is Hektor shaming Paris for his cowardice, but it refers to the series of events in which Paris sparked the war.
3.181 "The old man marveled at him and said, "Happy son of Atreus, child of good fortune. I see that the Achaeans are subject to you in great multitudes. When I was in Phrygia I saw much horsemen, the people of Otreus and of Mygdon, who were camping upon the banks of the river Sangarios; I was their ally, and with them when the Amazons, peers of men, came up against them, but even they were not so many as the Achaeans."
This, while more vague, seems to refer to Antenor allying with the Otrians and Mygdonians in a battle against the Amazons, but is again temporally irrelevant to the proceedings of book3 proper.
How do we identify moments like this, in which past events are obliquely referred to?
Continuing with building site.
Batch #13 received today re right-column content.
Working with Alumni and Research pages content, layout, photos
We're down to 45 characters that do not have a brief description.
To find them, run this xpath filter in the text field called 'XPath 2.0' at the top of the oXygen window: //person[not(note[@type='description'])]
We're also down to 42 places without co-ordinates. Run this filter to find them:
On another note, there are 57 places that have 'beyond' as their co-ordinates
Leaving early to prepare for conference trip tomorrow.
Fixed a rendering bug in the bibls, tweaked the XInclude for the historical personography so it excludes people with no data, and made a couple of other minor fixes.
Worked on the layout CSS for the Le Blanc and Ville Thierry documents. The difficulty of setting the page width adequately so that text isn't grouped too far to the left, without triggering the occasional line-wrap, is a continual problem, but we're getting to a good approximation for these two documents. I still need to move from @rend to @style -- that will be done when I get back from ICLDC3.
Received request from BAK for site updates re
upcoming Lansdowne lecture and LARG workshop
Added information to all pages; sent confirmation email to BAK
Trying to get stuff finished ahead of the conference trip...
Had to move back to LibreOffice 3.6 because of a major bug in handling of custom animations in 4.0. My bits are now done, and ECH is finishing the last parts of hers.
All timesheets have gone off today, ahead of my trip this week.
Met with PS and KMF to discuss the redesign. PS will work on some aspects which are already clear, while we make final decisions on menu contents and behaviour.
I'm now handling the whitespace issues in the output as well as could be expected, and I've made a start on prettying up the page a bit. I still have to add some JS enabling-and-disabling of controls based on the values of other controls, and then we're done.
I have put a list of feature structure questions for further consideration into the docs folder in the SVN respository. It's called feature_structure_questions_Jan2013.odt.
ECH's paper copy of these questions, with our notes on it, is in the blue folder in the top box of Lexware data!
Received batch #12 of changes/additions from DR for site.
- renaming of navigation
- sorting out news & events
- moving photos
- creating folders within photo folder for future incoming photos
- creating new layout for alumni
- creating new layout for research
- obtaining java script for special "more" & "less" features on specific pages
- applying java script to research section
Uncaught TypeError: Object #<Document> has no method 'write'
So, for OL pages, use this serialization option:
declare option exist:serialize "method=html5 media-type=text/html encoding=utf-8 indent=yes";
The obvious problem with serving as text/html is that, according to the spec:
The 'text/html' media type [RFC2854] is primarily for HTML, not for XHTML. In general, this media type is NOT suitable for XHTML except when the XHTML is conforms to the guidelines in Appendix A. In particular, 'text/html' is NOT suitable for XHTML Family document types that add elements and attributes from foreign namespaces, such as XHTML+MathML [XHTML+MathML].
According to the spec, then, if one wants to use something like SVG directly in the document, you may not be able to on pages like this. Using SVG graphics in CSS seems to be OK, though.
I'm not yet sure what impact this will have on OL maps that include SVG via JS.
Received T4's for staff; distributed to those on campus; advised those off campus to
pick up or T4 will be mail this week according to deadline
Balanced and confirmed PC for year end; forwarded completed report to BK (Acctng)
Received new batch of changes/additions from DR.
Updating site accordingly; more content arriving next week
Feb. 21: attended Cascade support session for advanced features advice
Feb. 25:SA &JN meeting to discuss site changes
Our new feature structures for names and loanwords are displaying on the website as NAME: TRUE and LOANWORD:TRUE.
Please hide the :TRUE part of these features from the website display.
Completed a handout sheet for DHSI. My Brown login is now working so I can use the SVN.
Finally managed to get some data into LEXUS, and out in XML format -- it's pretty ugly, and seems to be in its own namespace, although it nods at LMF. Did more work on the presentation (new diagram). I think my bits are basically done.
I have a simple form interface working, but I'm fighting with the HTML and CSS to display the code snippets. I can't seem to preserve the whitespace from the original without a lot of analyze-string sorts of thing. But we'll get there.
Met with PAB and added more data to Beck; also initiated discussions with Communications re Beck moving to Cascade. Discussed future changes to Myndir.
Not sure where the time went today...
Managed one more diagram today, along with some thinking and ideas for showing how ODDs work.
Had a regular meeting, resulting in a couple of tasks for me on Flow. Also arranged meeting with PS next Monday.
Sent emails to DR re site updates.
Updated site with content and most recent structural changes/additions.
DR currently reviewing site.
On a coding roll -- kept going till I finished!
Everything is in place and working for the XML web service. Now I need to graft on the HTML rendering and the form/AJAX response stuff.
Creating photos, diagrams and screenshots, and putting the first half of the presentation together.
Just came across an example where:
<phr type="n">√ƛʼécʼ-p=ap Račəméntən</phr>
got autophonemicized to:
<phr type="p" subtype="i">√ƛʼícʼ-p=ap ʔḥacmíntn</phr>
The capital R was correctly changed to ḥ, but where did the glottal stop come from? Did the "insert glottal stop before a word-initial vowel" rule somehow apply between the removal of the R and the addition of the ḥ?
There are 15 other instances of word initial ʔḥ generated from R in the data. I can just fix them with find-and-replace, but I'm curious about how they were generated.
Please adjust the display of the contributors' roles. Currently, "Fluent Speaker" and "Elder" appear to have a space after them, so for contributors who are both, we see Fluent Speaker space comma space Elder space.
Meeting ran a little late, and I've also got deeply into my CodeSharing tool.
Today I've made considerable prgress with my TEI CodeSharing idea; I've finished specifying the basic API, and I'm working out the details in the implementation of the XML part of the service. Once that's working, I'll finish documentation of the API, and then I'll be able to build a form-based interface on top of it. I've committed to a presentation on this at DHOXSS in July.
Several hundred emails to deal with...
All our current values for
@role have now been documented in the ODD file, and I've also put a system in place for reading that data out of the ODD file at transform time (using
<xsl:document>) and using it to create the web display text. This will be handy in future, and it's a good approach because it encourages us to do more documentation, in more detail, in the ODD file, with the sense that it's going to be made public. The ODD file is now in the db, of course, which is also probably good. It might make sense to put the rng file there as well, and make both forms of documentation available from the site.
1) In the email that pops up when EB or SC clicks on an Email link, please add the following fields to the template:
3) The question in the entry for "Paschal Sherman" is not showing up on the website, although it's there in the xml file. SMK can't figure out why ...
4) ECH needs a way to prioritize the Questions for Elders – e.g., questions on lexical suffixes are low priority.
5) If we start to get a large volume of feedback coming from the community, ECH would like a tool for speeding up entry of responses. Here are some initial ideas ...
-responses to Questions for Elders would appear on a new page on the website, instead of coming as email responses
-for each response, the editor (likely ECH or SMK) could select from options like
-add to existing entry as editorial note
-add to existing entry as new <def>
-add to existing <def> as new <bibl>
-add to existing entry as new pron:seg type=“n”
-add whole new entry in new-data-2013.xml
-Please add a menu button for accessing the Verbs page.
-Please make the page title “Verb list” instead of “Word list”.
-Add an explanatory line below the title: This list includes the verb forms recorded in Dale Kinkade's Nxaʔamxcín materials.
-We don't want to combine the verb list with the Questions for Elders page. But we do need a way to mark which verb forms EB, SC, and PCS have already reviewed. SMK suggests the following:
If an entry contains psn:EB, psn:SC, or psn:PCS (i.e., they have commented on it, and we have recorded their input in the entry), flag it on the verbs page as reviewed – perhaps with an icon (a checkmark?) next to the editing status “traffic light”?
-We will eventually add a separate verb paradigms page.
-We've figured out a better way to identify verbs in the data, based on the transitive object and subject endings. Now we just need to finalize our list of those endings, and figure out how to work around endings like “-n” which could be one of many morphemes.
Process date tags such as the following to show the @when attribute when the date tag has no text in it.
<bibl corresp="psn:PCS">PCS <date when="2013-02-06"></date></bibl>
2) Roles of contributors:
Please make the full description appear after each person's name on the Contributors page.
EB distinguishes "Elders" who have significant cultural knowledge, and "Fluent Speakers" who have language knowledge. So some of the contributors will be one or the other, and some could be both.
Working on recent batch of changes/additions. Inputting content and making some
structural changes as per DR's request.
Sent email to DR re another possible structure change.
- photo display
- incoming content
- verification of links
Regenerated all the OAI records locally, then uploaded them to the db, and provided a zip for the library to process into whatever Canadiana requires (it can't ingest OAI effectively, apparently).
Added a new xml file to the svn repository. It's called new-data-2013.xml, and it's for new words PCS contributes, beyond her comments on words already in the data.
Took Feb 12-15 as vacation.
On current site made changes to Study Abroad Program as requested by DR. (e.g. reordered items) Confirmed with SA then sent email to DR advising request completed.
Received correspondence from DR re site changes/additions.
Working my way through these with batch #11 next in line.
Sent confirmation email to DR (cc SA) with update.
- additional content incoming (photos)
- decisions re photo display, location, selection, arrangement
- possible structure changes/suggestions
Now that I am working on the feature structures for the affix files using the new feature system I am noticing changes that are needed. I have documented them in comments at the top of the feature system file. (3) is implemented already
1. Need to add new daughter to name="affix", i.e. value="syntactic"
2. Need to add new fDecl name="syntactic" category, and list of symbol values that are daughter of "syntactic"; one of these latter is going to be "interrogative" (it is a type of enclitic particle with syntactic function).
3. Moved attributiveHabitual and repetitive to secondaryAspectual category.
Please note: The categories are determined on the basis of grammatical function and meaning. Therefore, for example, the outofControl morpheme is included with Control morphemes rather than with Reduplication on the basis of its function as a control marker rather than on the basis of its formal characteristics as a reduplicative morpheme. For the most part the categories reflect those provided in Nxa'amxcín grammar of Marie Louise Willett (2003), which in turn is based on all the work of Dale Kinkade.
Leaving a little early.
Finished building the new webapp, and deployed it to Pear. Notes:
<root pattern="/*" path="xmldb:exist:///db/site"/>This means that the MoEML app runs in the root, and access to the dashboard, eXide etc. is not available. If we want to make them available in future, we can just reverse the order of declarations in the config file so that the apps we want to use appear before the /* pattern. I tested this locally and it works.
New video, fixes to titles, and addition of a FB link to the menu.
MoEML webapp broke, and I'm on vacation next week, so I've been doing an emergency port to a new version of eXist. Should be ready to go early tomorrow morning.
The webapp for some reason stopped allowing members of the editor group to see and upload files; only admin could do it. After casting around for a fix, I ended up building a new webapp from a fresh trunk build of eXist, and restoring the data to it. The problem was still there. I started again, and this time recreated an editor user from scratch, rather than restoring. This seems to have worked, so I'm going to recreate all the other users tomorrow and then deploy the new webapp.
I've also invoked the snowball analyzer, so we now have stemmed searches. Nice.
SMK assigned me these tasks:
1) Please sort the names with the following headings, using the tags in the pron:seg type="p"?
Personal Names (persName) Place Names (placeName) Tribes (orgName) People in Stories (name type="storyPeople") Animals (name type="fauna") Plants (name type="flora")
2) Printable PDF view of the names page for MM and SB to work from - Full view, with the entries expanded. MM needs to be able to see:
She will also need a printable view of the Contributors page.
3) Get the orthography working. (Note: MM does not use the community orthography, but rather MDK's! ECH will ask GM what orthography SB uses.)
4) DONE One thing which arises from yesterday's feedback from SC is that we should hide glosses with type="i" from the Moses-English displays.
For example, in the verbs list we currrently see:
ʔacḥámˀsn I stopped him from getting in a fight, forbid, stop
"Forbid" and "stop" are inferred glosses that I added for the purpose of creating the English-Moses word list, but they're not direct translations of ʔacḥámˀsn. So of course SC and PC replied that "forbid" and "stop" should be removed. Can you please hide the inferred glosses from all the Moses-English pages?
The English-Moses word list is working correctly, in that ʔacḥámˀsn appears under the English headwords "forbid" and "stop", as well as the more specific "stopped".
Received request from Accounting re vac. days owing to PEA.
Copy filed and original sent back to Accounting.
Received request from SA (RelSt) to post on their site a job listing.
Completed request and sent confirmation email to SA (cc'd SA/HCMC also)
Received request from SA (MedSt) to removed sessional posting on their site.
Completed request and sent confirmation email to SA (cc'd SA/HCMC)
Cascade Hispanic & Italian Site meeting: Feb. 6, 2013 (DR,SA,JN)
- Reviewed the site together. DR approved the new Study Abroad Program information which is now
on current site as well as Cascade.
- Composite photo discussion: DR provided several photos which may be manipulated into a grouping
and displayed in various ways throughout the Cascade site. DR will choose a few photos for
us to experiment with. He will get back to us re this.
- DR has a colleague who will proof-read the Cascade site for them
Received request list from DR for changes to current (Hispanic & Italian) site and for
Cascade new site.
Updated both sites with changes (Study Abroad Program information etc.)
- cropped, resized, inserted additional photos on page
- reordered list as requested
- inserted new text with links
LATIN AMERICAN STUDIES:
- added next text
- deleted old text; inserted new text with links
- created table and inserted faculty profiles (4)
- sent confirmation email to DR (cc'd SA) advising changes completed
- will pursue how to create "News and Events" section
Leaving a little early.
Text inside person/persName has now all been tagged, so my old XSLT, expecting text nodes, was not showing the name. Wrote a quick hack to fix this -- since reg now exists alongside surname, forename etc., it's now no longer possible simply to render the contents, so I'm reverse-engineering the reg element text to get a normal rendering of the name.
SS finally decided not to go ahead with the netlink thing, so I sent my script to sysadmin to set up the system. It worked, but I'd forgotten a couple of people and spelt a couple of netlinks wrong, so I sent a follow-up script which has now run. Sent a message to the MVP list with instructions for use.
Finished uploading all the changed files to the db -- it took about five hours. In the meantime, I've started a Schematron file with some basic constraints, and shown KSW how to use it; he'll generate some more ideas for constraints from the documentation, and we'll then add the Schematron constraint to the top of the files so it's in force for everyone editing.
A bit of work on TEI tickets (removing data.code, which is now obsolete).
The process has been running for about 90 minutes so far, and is less than half done.
I still have to update the entire contents of the database with the newly-changed files. That'll take all day tomorrow...
Telco with SB and plans for changes to materials and course outline.
Set up meeting with DR & SA to discuss composite photo idea on cascade site.
Meeting Date: Wednesday, Feb. 6th, 10:00 am, HCMC office
Note: Previous blogs in Admin. & ALL - will enter in CASCADE from now on (Feb. 5, 2013)
CASCADE - Hispanic and Italian website update:
- uploaded new photos; inserted more faculty photos
- updated text within cells
Most of this work was done last night, at home, to avoid working on the data when everyone else might also be editing.
Board telco this morning, followed by a review of the current state of the white paper.
Lots of stuff to get done before vacation next week...
Finally finished my bits of the paper relating to feature structures and interoperability. Now I can focus on the presentation...
Received changes/additions to site from BAK
STUDY ABROAD PROGRAM:
- uploaded current information
- edited new text
- added Study abroad in Quito, Ecuador information in announcements (clickable)
Sent confirmation email to BK (cc'd SA)
Received batch #4 of changes/additions to Cascade website from DR.
- Switched order of paragraphs 3 & 4 etc.
- "Master of Arts program" linked to GRADUATE page
- deleted 3 tabs
- added new text as provided
UNDERGRADUATE/HISPANIC - PROSPECTIVE STUDENTS
-in left column removed 2 ("Student Profiles; Programs and Courses")
UNDERGRADUATE/HISPANIC - PROGRAMS AND COURSES
- added new provided text; links activated
- added new text to tabs; activated links to courses
- added new text
- created table and added 6 testimonials (2 side by side); arranged order to balance best on page
- more faculty photos to come
- need to explore how to implement News and Events feed
Sent email to DR (cc'd SA) re these updates.
MDH has posted a list of entries whose hyphs include verb morphology at:
Here are the criteria we used to generate the list:
EB wants to work on the subject/object suffixes after particular transitive suffixes or suffix combinations. So first, let's list entries which include each of the following:
-m:n-CTL followed by m:t-TR
-m:ɬ-DIR followed by m:t-TR
Here's a more detailed list of morphemes found in verbs vs. those found in nouns.
A verb entry's hyph will include one or more of:
A verb entry's hyph will NOT include any of the nominalizing morphemes:
Judy and I went on tour of possible replacement furniture:
32" big square chairs - welcome center
29" medium square chairs - Mac's
26" small square chairs - Finnerty's
3' at back 1/10 pie chairs - advising
32" "winged" chairs - Finnerty's
cost of $800 - $1200 / chair
pedestal table $400
Leaving a little early.
Worked on some TEI tickets.
We did a preliminary intro to SVN, eXist, and timesheets, and then discussed the issue of map images and reconstruction.
Tomcat has now been configured so that a slash not followed by a filename is handled by the webapp, not trapped as an error by Tomcat itself. I've also uploaded and tested the new Moses app on Peach, confirming that while the eXist dashboard and eXide fail under Tomcat 6, they work under Tomcat 7.
Added an embedded video on RL's instructions.
I've also implemented "Get related words", which is more subtle than getting all the entries for a morpheme, and I've cleaned up some other iffy code and extended the schema a bit.
Met with the whole team. Key points:
Following that, CB and I worked through some schema changes he needed, and also fixed some errors we discovered in XML files when validating with the new schema.
Hispanic and Italian Studies Cascade:
Inputting new content:
Assets : created new folders for images - photos - (originals, thumbnails, final);
- created new individual pages
- inserted photos on people-index page; updated index page with activated links to individual's webpages
- incoming : more photos yet to come
Research : content from DR re individual projects TBA
CC wishes to move her "use of software in language learning" lab from B047 to C251 if possible.
I OK'd that change of use for B047 for her to take to the dean and with Judy, Martin and Greg worked up a proposal for how we'll use our overall space now that we have that room back for our use. We'll put A/V station(s) and a workbench station in there, and possibly move our backup server(s) in as well.
Later, met with network person, reasearch accounting person, CC and Greg in the room to discuss feasibility of using room for intended purpose, implementation and administrative implications, etc. Some concerns about implications of using the room for research purposes and for dept meetings even though the dept meetings would not be using the research equipment/resources.
Contacted DR (Hisp) requesting content for new Cascade site.
Received from DR lots of content with more yet to come.
Last couple of weeks have been working daily on this site inputting incoming materials from DR.
More content is currently being prepared by DR and colleagues which will be forwarded to me
once completed. Meanwhile, working on site daily.
Have reworked the structure according to DR's instructions re changes.
Updated with new content within various sections with newly created pages; news & events to be added
Received series of new photos to be inserted in People section and on individual newly
created faculty pages.
Manipulated photos (cropped, resized, scaled......); inserted; more photos to come
Eliminated 2 sections (Hispanic and Italian) as requested by DR with Graduate section only
Received interview #33; downloaded and filed copies.
Completed file and paperwork.
Had to resort to removing all the existing link elements following the first (main) stylesheet, then inserting a new link for the supplementary stylesheet. Also had to eschew the use ofd @title on link.
I have the cookies and session settings working OK, and abstracted to a little module, so that will be handy in future. I also have the widget controls on the page working, and retaining the correct settings. Now the problem is that I'm trying to turn stylesheets on and off using their "disabled" property, but it's really not working at all. Both Firefox and Chrome have problems, but they exhibit different behaviour. The spec says I can write to the disabled attribute, but it doesn't seem to work on Chrome if the stylesheet was disabled initially in the HTML code; with Firefox, the first disabled stylesheet seems to be active when the page first loads. Frustrating and annoying. I may have to resort to deleting and inserting link elements in the header, which is very crude, but might actually work.
SD sent me an spreadsheet with the 1820 to 1824 data in it. Found a little "how-to" file which explained the steps to turn that into an xml data file with schema for validating. Did the process and noticed that two of the fields had values swapped. Checked the xslt and sure enough
which I corrected to this:
Also noticed that the import changed all integer values to floating point, (e.g. 16 became 16.0), and only integers are valid in the various field (age, weeks, months, years etc.) Just did a grep search and replace to fix those.
Huge majority of 100+ remaining invalid instances are mercy appeals where Simon has entered something like jury/prosecutor and the XML requires a separate mercy appeal for each proponent.
XML file now with SD to make remaining corrections, then return to me, at which point I'll follow the rest of the how-to procedure to render back to relational data and upload to db.
Catching up with backups and email after Moses meeting and making copies of key documents during the afternoon.
Last round of MoEML timesheets for me before KFM takes over; then I'll only have three to do.
Wrote, tested and deployed some XSLT that converts or comments out some now-obsolete types of feature structures.
Made a start on three different views of entries. I have pipelines set up, and need to tweak the XSLT for each case; following that, I have to decide how to save the setting (session variable, cookie or URL parameter, or combination).
Writing up notes from day-long meetings.
Met for most of the day with EB, SC and the rest of our team, and made some sound decisions about different views of the database. These are some notes:
Elders' view (perhaps not needed, because EB and SC interface between them and the dictionary):
We should gather and include information on informants (genealogy, dates, etc.) so that we can generate relationship info and trees, and provide a good credits page if we need to, but keep that info confidential for the moment. One source for info on deceased participants would be here, especially the St. Mary's Mission, Nespelem City Cemetery, Little Nespelem Cemetery, and Nespelem Catholic Cemetery (Sacred Heart?) records. Other cemetery records which might be helpful are Cashmere Cemetery and ____ Creek Cemetery in Monse.
Portal page - with photo of Moses Mountain from EB? The web database should have the same title as the community's print dictionary: Nxaʔamxčín Nwwáwəlxtənt. EB and SC will confirm the spelling with PC.
A good map of the area, on which we could also put the placenames and link to their entries, would be good.
Names need to be divided into personal, place, story characters, orgs, flora and fauna. For flora, we could try to get a copy of Ethnobotany of the Southern Okanagan. Personal names will eventually be hidden in entries.
<f name="category"> <symbol value="proper-noun"/> </f>to:
<f name="name"> <binary value="true"/> </f>
<f name="baseType"> <symbol value="loanword"/> </f>to:
<f name="loanword"> <binary value="true"/> </f>
<fs> <f name="baseType"> <symbol value="compound"/> </f> </fs>
<fs> <f name="baseType"> <symbol value="suffix"/> </f> <f name="derivational"> <symbol value="lexical-suffix-compound"/> </f> </fs>
Leaving a little early.
Following the first few steps on the Google Code wiki for Tesseract to learn how to train it for a new language, I've used the moshpytt box editor on a sample file, and read through the other sample data. It looks like we may be able to do something like the following, For any sufficiently large run of a journal which has consistent page-images, fonts, print quality etc.:
Adding examples for attributes that don't have them, and working on a couple of tickets.
Fixed some persistent broken links on the History site.
Met with AC to plan for an application in the fall. My notes cover the technical side of the project, and we'll meet again in March to produce a draft of that bit for discussion with some colleagues she'll be meeting.
Entered HT's corrections to Hale, Paulson and Young.
Meetings eating up the day.
Met with ECH and SMK to rework the whole feature structure thing, and we went back to a flat model. We'll work with a flat model until we're sure everything is functional and accounted for, and then we'll start adding structure and dependencies as appropriate. This seems the cleanest approach, and will let us get productive again. I've encoded a clean new version of the feature system with no @dcr: attributes or comments, and we'll go from there. I'll start adding back the @dcr attributes soon.
Met with MJ and talked at length about the current toolchain and the options for moving towards TEI. Everything could be TEI quite easily, but converting the toolchain and the rendering engine in synchronization will be hard. He's going to start reading the Guidelines as he also works on cleaning up the toolchain and the current output. It looks as though we'll be able to start from something a bit simpler than what is currently rendered, with fewer join ids, and we should be able to encode everything there is, including annotations and critical apparatus, in TEI. This is obviously a very long-term goal.
Following this morning's discussion of feature structures, we are no longer treating names and loanwords as baseTypes (since names, and potentially loanwords too, can be multi-morphemic).
The feature structure for names is being changed from
Monomorphemic names will also have the <fs> of a root:
The feature structure for loanwords is being changed from:
A loanword stem entry would have this feature structure:
AC wants to put pseudonyms into the authors table so that they can be searched. Problem with that is there is no way to distinguish a real author name from a pseudonym. So, I've tweaked the display code to produce the following behaviour:
A poem written by "Jane Smith" using the pseudonym "Minnie Mouse". Create record in poems table for the poem with "Minnie Mouse" in the pseudonym field. Create records for Jane Smith and Minnie Mouse in the authors table, and associate them both with the poem.
A search for either name will return that poem, and the initial report will show that it has two authors ("Jane Smith" and "Minnie Mouse").
If you put "Minnie Mouse" into the pseudonym field, then the initial list of hits is the same and the detailed report would show "Jane Smith (using pseudonym Minnie Mouse)"
If you put nothing in the pseudonym field for the poem, then the initial list of hits is the same and the detailed report would show ("Jane Smith" and "Minnie Mouse"). The user has no way of knowing that one of the authors is in fact a pseudonym of the other.
If you put "Donald Duck" into the pseudonym field, then the initial list of hits is the same and the detailed report would show "Jane Smith (using pseudonym Donald Duck); "Minnie Mouse (using pseudonym Donald Duck)".
Here's the logic:
For the initial listing of hits:
- displays title and whatever value(s) it finds in the displayname
field in the author table for authors associated with the poem
- does not display the value in the pseudonym field.
For the display of details for one poem:
If the poem has a value in the pseudonym field, then
- if the value is identical to the value of the displayname field then
- - if there is only one author name associated with this poem, then
display the author's name followed by "(pseudonym)".
- - if there is more than one author name associated with this poem,
then don't display this pseudonym author, as the pseudonym will be
displayed with the other author name (see immediately below)
- if the value in the pseudonym field is not identical to the value of
the displayname field then display the author's name, followed by "using
the pseudonym" and the pseudonym
Spent the day tidying up the last of the things on my list. I've implemented authentication through eXist rather than Tomcat, which makes things cleaner in the deployment department, but slightly more messy when it comes to users, groups and permissions in the db itself. I've also got the Snowball analyzer working with English stemming, and it works out of the box, with syntax highlighting and everything. Things I'll need to remember for the next one:
<svg>element doesn't have
<lucene> <analyzer class="org.apache.lucene.analysis.snowball.SnowballAnalyzer"> <param name="name" type="java.lang.String" value="English"/> <param name="stopWords" type="java.util.Set"> <!-- using set from StopAnalyzer.ENGLISH_STOP_WORDS_SET --> <value>a</value> <value>an</value> [...] </analyzer> </lucene>
To create a visualization with gource (on Linux), first install gource (it's in the Ubuntu repo), then:
cd [the SVN directory you want to work on] svn log -r 11019:11452 --xml --verbose --quiet > svnlog.xml [Choose the revision numbers you want.] gource -1280x720 -o - svnlog.xml | ffmpeg -y -r 60 -f image2pipe -vcodec ppm -i - -vcodec libx264 -preset ultrafast -crf 1 -threads 0 -bf 0 video.mp4
Remarkably productive day. On a roll and reluctant to stop.
Today I ported the old Cocoon app to a fresh build of eXist, and got everything working. There are two or three little niggles to sort out, but basically everything is up and running. In the process, I've cleaned out a whole pile of obsolete code, cleaned up a lot of things, and simplified a bit. Another few hours and we'll be ready to deploy this as a replacement for the current app, which is pretty flaky sometimes, and which I suspect is responsible for the death of Tomcat once in a while.
Notes for future ports:
We've established mvp.uvic.ca, and I've also asked for the removal of all DNS entries for modernistversions.* domains from UVic DNS. At domainsatcost, I've set these to 301 to mvp.uvic.ca. When all this filters through the system, we should have what we want.
AC noticed that the is unsigned status is not reported in the vpn-single-record output. The query in vpn-search.php did not collect that field, so I added it to the SELECT clause. The field name is po_unsigned, and Jamie's convention dictates to use "poems.po_unsigned AS unsigned" in the SELECT clause. Turns out unsigned is a reserved word, so the query failed. I used "poems.po_unsigned AS isunsigned" and that worked. The output code now displays "unsigned: yes" or "unsigned: no" based on the value of that isunsigned variable in the poem object.
Sent AC a listing of all poems with pseudonyms and the author value associated with that poem, and a list of all poems with no pseudonym and with unconventional author names (initials, odd words etc.). Fundamentally the problem is they are inconsistent with how they are using the pseudonym field (at some points they want is treated as the real author, and at other times as the name used by a real author of another name for the writing of this specific poem).
Piles of work. Piles of it.
Reindexing with the snowball analyzer is taking a phenomenal amount of time, and crashed exist on the text box, so we need to give Tomcat more RAM (if there is any) and see if that solves the problem. If not, perhaps we'll have to work with local copies of eXist. The indexing does take a phenomenal amount of time, though, so perhaps the analyzer isn't working properly. Meanwhile, we've continued working on ways to encode character lists, and to present them in different orders and combinations.
Board meeting, then emails with MvH who fielded my request for mvp.uvic.ca. That should be up soon. Then we can redirect the old domains to that, which should help a bit with traffic and SEO. The domain host is using 301s to "forward" to the 1922 domain; 301s are exactly what we want.
"Questions for elders" now come with an email link that pre-populates an email message with the id and the question.
Meeting to discuss what we'll do during the visit of the CT folks next week. Agreed on a dinner on Sunday, presentation of the db on Monday morning, and discussions re orthography, dictionary components, a user view of the db, etc.
Need to make some decisions about how to indicate transitive/intransitive distinctions here. The categories in green are the affixes that need to be added to the feature system.
Imperfective-aspect ʔac (allomorph c)Imperfective Aspect (under Aspect Property)
This aspect prefix is used with transitive stems
Imperfective-aspect sac (allomorph sc)
This aspect prefix is used with intransitive stems and cooccurs with -mix. However, there are a few examples where it shows up on transitive stems (see MLW 2003: 309).
This aspect prefix is used with intransitive stems and usually cooccurs with -mix; it is not clear whether its meaning/usage is different from that of sac-. There are a few examples where it shows up on transitive stems (see MLW 2003: 309).
Imperfective-aspect ʔas This aspect prefix may be composed of a combination of ʔac- and s-IMPF. It occurs on transitive predicates and is rare (see MLW 2003: 310).
This aspect suffix is used with two non-perfective constructions: imperfectives (prefixed by sac- or s- according to MDK 1982) and irrealis/unrealized forms (prefixed by kas- (i.e. kaɬ-s)).
CC is moving her proposed lab from B047 to C251. I got a copy of the network infrastructure estimate for B047 and it includes things that are likely not needed in C251 (namely a 48-port switch). The network DB shows that C251 has 3 ports (don't know how many are physically connected to a switch) and that B047 has 7 ports (3 connected). Not sure how many ports CC wants, so have asked her. Other possible issues: wireless connectivity (probably not a problem), special requirements for video conferencing (again probably not a big deal), use of research vs standard network (feasibility, cost).
Have asked CC if she wants me to get a revised estimate, and in any case, to review technical needs before getting new estimate.
If a poem has a pseudonym but no author (e.g. "The Excuse" poem id 3107 has no record in the poems-to-authors table, but the value 'by author of "the Castilian"' in the pseudonym field of the poem table), then the pseudonym does not appear. The search to get the authors occurs at lines 372 to 387 and 491 to 506 of vpn-search.php. I'm not sure yet how to modify that so that if no value is found for an author to check for the pseudonym field and what the implications might be (i.e. it looks like subsequent code assumes that the author is a real author, but prior to that I don't know if should even make that kind of modification.
I did modify the vpn-single-record output (in the VPN theme) so that if the current poem has a pseudonym and no associated authors, then it outputs the message "no author name found;" the pseudonym and the word "(pseudonym)".