The one remaining problem was ironed out by an XSLT fix, and the discovery that one of the thorny bugs was caused by a flaw in the original data, where "back" was typed as "bac-k" in the page of a reference target. All now seems to be working. The bug in page number duplicates is also fixed (where the original document duplicated a page number, the output rendering created duplicate ids).
Category: "Activity log"
The editorial team has decided that the 1609 Sonnet document is actually a composite document containing two texts. This means it needs to be marked up in a more complex structure than the rest of our texts:
TEI teiHeader (header for the whole document) text (container text) front (frontispiece for main document) group (group element which contains constituent texts) text (first constituent text: Satire Menipée) front (first text frontispiece) body (first text body) text (second constituent text: Thimithélie) front (second text frontispiece) body (second text body) back (back matter containing references, created by XQuery)
This requires a rewrite of the XQuery and the XSLT that processes documents, because they're not expecting this complex structure. I've rewritten the XQuery successfully, so the complete XML with references is being retrieved; I've edited LSPW's source document to add the rendition="#layoutDesc"
attribute to both of the constituent texts; and I've confirmed that rendering works OK on my local copy of the db. However, I'm now struggling with a validation issue caused by the need to number each reference link in the text. These get unique ids created by counting the preceding ref elements with the same target. However, in this structure, the preceding:: axis doesn't seem to be working properly, so I'm getting lots of duplicate ids in the output. Still puzzling over this.
The two files of references are now unified, and the old 1621 refs file has been deleted.
AM's Shingle Cloud algorithm is something I'd like to test against the Universal Similarity Metric for the DH 2010 presentation. JC sent me a link to an abstract on it, which will be very useful when I prepare my presentation.
I spent last Thursday and this morning proofreading Agreemens 1, 2, 3 and 4; Complaintes des mal maries; Comtesse d’Isembourg; Comtesse de Candale 1 and 2; and Les espines du marriage.
I made notes on what needs fixing for each xml file.
Before examining the details of each file, I would like to make some general notes.
- Many of the references (most of which I wrote) could use some serious reduction. I do not want to cut any crucial information, so if I get the chance to revise these, I will err on the side of keeping more information than cutting too much; this said, there are definitely some parts that can go.
- Leanna and I both noticed that sometimes an errant “font-style: italic;” shows up in the bibliographic citation at the bottom of the reference. All we have to do to get it to leave, however, is to refresh the page. On this subject, sometimes the space disappears between the end of the written paragraph and the bibliographic citation following it. Once again, when we refresh the page, the spacing goes back to normal.
- Some references have links for more information to Wikipedia; ccarlin, are you comfortable keeping these? Would you prefer I delete them?
- Some .xml files have larger fonts than others – for instance, the Varin shows up much larger on both Leanna’s and my monitors... is there a reason for this? I find the Varin more visually appealing because of the larger font. Perhaps this larger style of font would be better for readers?
Here are my specific notes on each file:
agremens_1:
- “Sophronie” does not yet have a reference
- Madame de Cleves, Monsieur de Nemours and the Princesse of Cleves reference could all use reworking – it’s the same reference copy-pasted for each.
- The “fin” on the last page needs to be centered. I’m not sure how to encode this: I don’t think it would be a head tag... would <hi rend=”font-style: italic; text-align: center; margin-left: auto; margin-right: auto;”> work?
agremens_2:
- After the “Extrait du Privilege du Roy”, I encoded two page breaks, more or less one after the other, to represent a blank page with the catchword “Philogame” at the bottom, but it just looks weird on the website. Is it important that we represent this blank page with the “Philogame” catchword, or would it be better to delete it?
- On p. 38, “Histoire de Fraudelise” should be centered and all on one line. Is there something wrong with the code? <pre><head rend="center; margin-left: auto; margin-right: auto;"><hi rend="font-size: 110%; font-style: italic;">HISTOIRE</hi><lb/>
<hi rend="font-size: 90%; font-style: italic;">DE FRAUDELISE.</hi></head></pre> - The “Titus” reference was not showing up when I was clicking on it, so I went into the references file to see what was wrong with it. It had the tag “atticus” instead of “titus” (the tag used in the agremens_2 file), so I changed the references.xml tag to “titus”. Hopefully this will fix the problem.
- There was also a page break in the wrong place at the bottom of p. 100, so I relocated it.
- The “Vulcain” reference also was not displaying anything when I clicked on it, so I went into the references file and changed the xml:id from “vulcan” to “vulcain”.
- The “Fin” at the end of the document also needs centering; I had encoded it as a <hi rend=”text-align: center; margin-left: auto; margin-right: auto;”> and yet it is not showing up as centered.
agremens_3:
- On the dedicace to the “Mesdemoiselles”, there is a large amount of space between the “M” and the rest of the word. Is there a way to fix this?
- On p. 47 there is a note (note 3) about the “Tiers Livre” by Rabelais. Ccarlin, I’m wondering if you just want to check this and make sure it’s right.
- “Fin” needs centering.
agremens_4:
- On p. 65, “Bachique” did not have a reference showing up for it, so I ended up changing the xml:id from “bachique” to “bacchus”.
- The “Pistoles” reference should now be showing up as well as a reference. (The reference id in the agremens_4 file was “pistoles” but in the references.xml file, it was “pistole”).
complaintes:
- P. 499: “Ananie” needs a reference.
- The reference for Jesus Christ needs one more guillemet around ‘n’est pas de ce monde’ (right now, it’s n’est pas de ce monde).
isembourg:
- The catchwords and the sig tags need to be moved closer to the left – they’re indented too far right.
- “Fin” needs to be centered.
candale_1:
- The “Roy Charles VIII” reference should now show up; in the references.xml it was encoded as “charlesVIII” and in the candale_1, it had been encoded as “royCharlesVIII”.
- I fixed the references “Marguerite de Flandre”, “Amboise”, “Loches”, “Foix” and “Cesar”.
candale_2:
- Fixed the “Duc d’Orleans” reference.
- I added a brief reference for Naples as well.
- “Fin” needs centering.
varin:
- Catchwords and sigs could be moved to the left – the indents do not correspond to those in the original text.
- For the reference “Coeneus”, I deleted the link, since there is not yet a corresponding reference. I kept the note that I had added previously, saying that most sources talk about Lichas’ fall from “mont OEta” but not from “mont Coeneus”.
- There is a dotted line (like the dotted line for references) on p.52 from the reference “Amphiaraus” through until the middle of p. 53.
- When I click on note #24, it pulls up notes #25-37. I checked in the varin.xml file but was unable to see the cause of this in the code.
- Also, notes 32, 33, 34 and 35 are showing up in italics because they are wrapped in a foreign tag. Is there any way to get rid of this?
- “Fin sans bout” at the end needs to be centered.
I spent last Thursday and this morning proofreading Agreemens 1, 2, 3 and 4; Complaintes des mal maries; Comtesse d’Isembourg; Comtesse de Candale 1 and 2; and Les espines du marriage.
I made notes on what needs fixing for each xml file.
Before examining the details of each file, I would like to make some general notes.
- Many of the references (most of which I wrote) could use some serious reduction. I do not want to cut any crucial information, so if I get the chance to revise these, I will err on the side of keeping more information than cutting too much; this said, there are definitely some parts that can go.
- Leanna and I both noticed that sometimes an errant “font-style: italic;” shows up in the bibliographic citation at the bottom of the reference. All we have to do to get it to leave, however, is to refresh the page. On this subject, sometimes the space disappears between the end of the written paragraph and the bibliographic citation following it. Once again, when we refresh the page, the spacing goes back to normal.
- Some references have links for more information to Wikipedia; ccarlin, are you comfortable keeping these? Would you prefer I delete them?
- Some .xml files have larger fonts than others – for instance, the Varin shows up much larger on both Leanna’s and my monitors... is there a reason for this? I find the Varin more visually appealing because of the larger font. Perhaps this larger style of font would be better for readers?
Here are my specific notes on each file:
agremens_1:
- “Sophronie” does not yet have a reference
- Madame de Cleves, Monsieur de Nemours and the Princesse of Cleves reference could all use reworking – it’s the same reference copy-pasted for each.
- The “fin” on the last page needs to be centered. I’m not sure how to encode this: I don’t think it would be a head tag... would <hi rend=”font-style: italic; text-align: center; margin-left: auto; margin-right: auto;”> work?
agremens_2:
- After the “Extrait du Privilege du Roy”, I encoded two page breaks, more or less one after the other, to represent a blank page with the catchword “Philogame” at the bottom, but it just looks weird on the website. Is it important that we represent this blank page with the “Philogame” catchword, or would it be better to delete it?
- On p. 38, “Histoire de Fraudelise” should be centered and all on one line. Is there something wrong with the code? <pre><head rend="center; margin-left: auto; margin-right: auto;"><hi rend="font-size: 110%; font-style: italic;">HISTOIRE</hi><lb/>
<hi rend="font-size: 90%; font-style: italic;">DE FRAUDELISE.</hi></head></pre> - The “Titus” reference was not showing up when I was clicking on it, so I went into the references file to see what was wrong with it. It had the tag “atticus” instead of “titus” (the tag used in the agremens_2 file), so I changed the references.xml tag to “titus”. Hopefully this will fix the problem.
- There was also a page break in the wrong place at the bottom of p. 100, so I relocated it.
- The “Vulcain” reference also was not displaying anything when I clicked on it, so I went into the references file and changed the xml:id from “vulcan” to “vulcain”.
- The “Fin” at the end of the document also needs centering; I had encoded it as a <hi rend=”text-align: center; margin-left: auto; margin-right: auto;”> and yet it is not showing up as centered.
agremens_3:
- On the dedicace to the “Mesdemoiselles”, there is a large amount of space between the “M” and the rest of the word. Is there a way to fix this?
- On p. 47 there is a note (note 3) about the “Tiers Livre” by Rabelais. Ccarlin, I’m wondering if you just want to check this and make sure it’s right.
- “Fin” needs centering.
agremens_4:
- On p. 65, “Bachique” did not have a reference showing up for it, so I ended up changing the xml:id from “bachique” to “bacchus”.
- The “Pistoles” reference should now be showing up as well as a reference. (The reference id in the agremens_4 file was “pistoles” but in the references.xml file, it was “pistole”).
complaintes:
- P. 499: “Ananie” needs a reference.
- The reference for Jesus Christ needs one more guillemet around ‘n’est pas de ce monde’ (right now, it’s n’est pas de ce monde).
isembourg:
- The catchwords and the sig tags need to be moved closer to the left – they’re indented too far right.
- “Fin” needs to be centered.
candale_1:
- The “Roy Charles VIII” reference should now show up; in the references.xml it was encoded as “charlesVIII” and in the candale_1, it had been encoded as “royCharlesVIII”.
- I fixed the references “Marguerite de Flandre”, “Amboise”, “Loches”, “Foix” and “Cesar”.
candale_2:
- Fixed the “Duc d’Orleans” reference.
- I added a brief reference for Naples as well.
- “Fin” needs centering.
varin:
- Catchwords and sigs could be moved to the left – the indents do not correspond to those in the original text.
- For the reference “Coeneus”, I deleted the link, since there is not yet a corresponding reference. I kept the note that I had added previously, saying that most sources talk about Lichas’ fall from “mont OEta” but not from “mont Coeneus”.
- There is a dotted line (like the dotted line for references) on p.52 from the reference “Amphiaraus” through until the middle of p. 53.
- When I click on note #24, it pulls up notes #25-37. I checked in the varin.xml file but was unable to see the cause of this in the code.
- Also, notes 32, 33, 34 and 35 are showing up in italics because they are wrapped in a foreign tag. Is there any way to get rid of this?
- “Fin sans bout” at the end needs to be centered.
Several things emerged out of a frustrating hour:
- Macs can't do SFTP for some reason, so we have to use SMB to connect to taporshare.tapor.uvic.ca.
- When logging on, remember the prefix UVic\ before the user name.
- There's an ACL on the UVicDefault wireless setup that blocks SMB. That means Macs have no way of mounting a network share. The Help desk have promised to investigate this.
CC's laptop also has no ethernet, so it was doubly difficult to get connected.
Needed to see the XML myself -- presumably others would too.
On CC's instructions, made a set of changes to the title elements in the header of the Agreemens files, to reflect what's actually in the documents. Uploaded the changes to the working area on home1t as well.
Having implemented the similarity metric in C++ under QT, I'm now experimenting with the results I get, and comparing them with the results from the same data using my Java implementation. There are some interesting issues:
- I have noticeable different results between the Java (GZip) implementation and the C++ (zLib) version. Differences are of the order of 0.12 in some cases (12% of the range). This is both intriguing and worrying, although it may not be an issue if the only use of the values is for relative comparisons.
- The order in which the strings are dealt with (when concatenated as part of the calculation algorithm) affects the score, on the order of about 0.02 (2% of the range). This is interesting. Right now, my object calculates scores using both sequences, and averages them out, but it may be more "correct" (whatever that means) to take the larger or the smaller of the values in each case. I'll have to do some thinking about this.