Started some detailed reading on this topic, with some pointers from friends and people on TEI-L. It looks like a flurry of activity happened around 2005-2007, and there are some working examples such as EEBO with fully implemented systems, as well as lots of surveys of approaches, and some tools. It looks useful and interesting. Haven't found anything resembling a dictionary of variants for Early Modern French, though.
One of the problems we face in building our next-generation search engine is the issue of archaic spellings and modern equivalents. In an effort to understand the scale of the problem before we begin tackling it, I've written some scripts which are in the process of compiling a list of all the distinct word-like tokens in the corpus which do not appear in a modern spelling dictionary. Right now, it's it's up to the Rs, and at around 35,000. I'll stay this evening till it completes, because I want to see the final tally.
Once we have the complete list, we'll be able to work out how many of them could be dealt with by means of normalization algorithms (such as switching long s to s, and normalizing other spelling variant patterns known to be common). Following that, we'll have an idea of how many tokens will actually have to be provided with equivalents by a human reader.
I've finished the process of converting uses of <argument>
for marginal labels to the <label>
tag. I had to regenerate the schema again, because in the documents affected (the Sonnets, Forest, Le Bon Mariage and Ville-Thierry), there are now occurrences of <label>
where it did not appear before, so I regenerated the odd file:
java -jar /home/mholmes/saxon/saxon9he.jar -it:main -o:/home/mholmes/WorkData/French/Claire_data/mariage_5/mariage_2012-01-30.odd /home/mholmes/WorkData/tei/sf_repo/trunk/Stylesheets/tools/oddbyexample.xsl corpus=`pwd`/
then edited the file manually to add @type
to <label>
(I'm doing this in the TEI namespace, although strictly speaking I probably shouldn't, but I don't see why <label>
doesn't have @type
in the first place).
I've written to EGB and GM to explain the change, and I'm now going to look at the documentation to see what needs changing there.
Fixed a typo in the menu reported by CC.
I added the following files to the Documentation folder on the Mariage server:
css_in_mariage.pdf – MH’s instructions on how to use CSS in Mariage
Editing_Guidelines.doc – guidelines and important points on editing transcribed text and markup.
Reference_and_Note_Writing_Manual.doc – step-by-step instructions on how to write references and notes.
XMLcodes_Master_List.doc – contains commonly used tags in the Mariage project with definitions and examples for each.
Allusions_manquantes.doc - contains a list of references in the references.xml that need to have content written for them as well as literary allusions that have yet to be identified. RAs can add to this list if they come across allusions in their text that they have trouble identifying. These will be readdressed in the future.
Mariage_bookmarks.html – bookmarks of useful websites for doing research for reference and note writing.
Mariage_Oxygen_Prefs.xml – RAs can import these preference settings in Oxygen.
Suggestion: MH, can you add your “Notes for initial basic markup” document to the documentation folder? I think it will be useful for teaching new RAs how to do basic markup of their texts.
1. le_blanc.xml:
Page number “76” in the t.o.c. should be in alignment with “ſon Mary.” in the chapter title, “La Conſolation & la Direction d'vne/Femme , qui n'eſt point aimée de/ſon Mary.” There seems to be a bug that’s creating a line break between the chapter title and page #.
2. maladies_des_femmes.xml:
a) Links to notes are still causing formatting issues in t.o.c. but I believe most of these problems should be fixed when the notes are replaced with <sic>
<corr>
tags.
b) The note next to page number “75” (“Anomalie de pagination de la part de l'imprimeur; à la place de « 74 », il a mis « 75 ».”) is interfering with the first line of that page (“engendre la gonorrhee: il aduiẽt auſsi que la quãtité ou la quali-”). It’s creating an indent that shouldn’t be there.
3. References.xml file:
a) Link references within references. (Reminder: if a term appears more than once within the same reference, only tag the first occurrence.)
b) Remove <ref>
links in all <cit>
<quote>
s. (We decided not to put <ref>
links in quoted material to avoid giving the erroneous impression that the source of the quote came from one of our references)
c) References need to be reviewed because there are still some, such as for word definitions, which should be converted into notes.
4. Review and standardize usage of <cit>
<quote>
-> See MH’s blog post “Quotes and cits -- need to do a review and standardize” (15/06/11)
5. All <argument>
tags need to be changed into <label>
tags.
6. Include a “back” button (⇐) for references and notes? This could be helpful for the user because when you click on a link within a reference or a note, the only way to return to the original reference/note is by going back to the text and clicking on the link. By including a “back” button in the window of a reference/note, the user will be able to return to the original reference/note more easily. CC and MH will have to discuss this.
7) MH has to upload changes to Mariage site for GMM and EGB through Exist Client.
Reminders for CC:
1. Questions to ask Evelyne for Marinello text:
a) Does she want all abbreviated terms for medications written in Latin to be translated into French and then have references written for them?
b) Definitions for “thym,” “maladie de nymphe” and “ansules”?
2. Ask Hélène Cazes to:
a) transcribe missing Greek phrase in Sonnet 1609 (“Et voſtre Muſe eſt tanquam [missing greek text] vous ne portez...”)
b) verify that transcription of Greek word in Des maladies des femmes (“à raiſon dequoy les Grecs l'on appellé μπτοα”) is correct.
MH also suggested that he could ask a student who knows ancient Greek to complete these two tasks.
Met with LSPW to get an outline of the current state of play. This is the summary:
- LSPW has created several files in the /documentation/ directory which provide full editing guides, tag lists, and reports on e.g. references that haven't yet been identified. These will be complete at the end of this week, and provide a very solid grounding on our markup practice and tag usage.
- The Le Blanc text is in the following state:
- Transcription and textual markup complete.
- References done up to the entry for Ch VI of Livre IV in the TOC.
- GMM is working on Ville-Thierry, and has almost completed the transcription and basic markup. Detailed markup (CSS, annotation etc.) will need to be done after that, and he'll need some help getting started with that. I'll also have to take over sending his hours to SL in the French department, who's doing his timesheets.
- EGB is working on Le Bon Mariage. The transcription and basic markup is done, and she's now adding CSS and references.
- There are some outstanding issues, decisions and tasks which LSPW will put into a blog post and assign to me as a task.
I have cleaned up all the notes in the following files (See MH's post "Notes versus choice/sic/corr" (20/12/11):
consolation.xml
forest_nuptiale.xml
homme_est_bien_malheureux.xml
la_louange_du_mariage.xml
la_louenge_des_femmes.xml
le_blanc.xml
sonnet_1609.xml
sonnet_1621.xml
The notes in the following files still need to be reviewed/revised by MH:
DONE 2014-05-02: le_contre_mariage.xml
DONE 2014-05-02: maladies_des_femmes.xml
DONE 2014-05-02: misogame.xml
DONE 2014-05-02: opinion_des_poetes.xml
DONE 2014-05-02: stances_chrestiennes.xml
DONE 2014-05-02: stances_du_mariage_blanchon.xml
DONE 2014-05-02: stances_du_mariage_trellon.xml
varin.xml
This task has been outstanding for a while, but I've managed to solve it in a very simple way using CSS columns. The solution is specialized to lists at the moment, but it should work identically for any other element on which we want to implement it. Notes/limitations:
- The implementation is triggered by the list having a
child::cb
, but it usesdescendant::cb
for counting the number of columns, on the assumption that some column breaks may occur within list items. It would fail to work if there were no<cb>
that was a direct child of the<list>
, though. If we were implementing a more general solution, we would need to figure out what level the columnar layout should be implemented on, and we'd probably have to require@type="columnar"
or something like that on a block-level element, to trigger the appropriate CSS. - The CSS still has to have
-moz-
and-webkit-
prefixes.
Fixed this bug, which was caused by the XSLT not expecting that notes would be placed inside page number <fw>
tags. In the process of fixing it, I realized that links from TOC page numbers to the pages concerned would not work in the continuous view because we're not showing page numbers in that view, so I added code to create empty anchors in the text; this allows links from TOCs to work as expected.