I've edited the ODD file to take account of changes in the latest TEI, and built a new schema. Caught a few errors in the process, in the ODD and consequently in transcription files (usually bad @type values). Wrote to CC to see about finally merging the cab_sat fragment files into the transcriptions they belong in, and getting rid of them. I also fixed some odd spacing around @rend attributes, in preparation for the slightly complicated move from @rend to @style. I'll need to convert only those @rend values which contain a colon, and the XSLT will have to be updated in advance (preferably in such a way that it still handles CSS in @rend, although Schematron should help with trapping for that too).
Met with CC to discuss the grant application and the TRUTH presentation in September, and also fixed a couple of things in the db (publishing Le Blanc).
Met with CC to go over plans for the application, and tweak the French translation of the technical description we wrote the other week.
Met with CC to write a preliminary draft of a section of the grant application dealing with the proposed normalization and search functionality. This was a useful exercise, forcing me to make all the details explicit, and explain them in clearer terms than I have been doing to myself. The plan still looks good, and I'm looking forward to making more detailed plans based on this (especially plans for the creation of normalization rules, and an automated system for testing them and evaluating the results.
I now have my XSLT module successfully reconstituting a line-broken word on both sides of the break, like this:
<ab corresp="mar:textnode#xpath(/*/*/*/*/*/text())"><seg> </seg><w corresp="mar:offset#xpath(substring(., 22, 3))"><choice><orig>ant</orig><reg type="joined-2">imagiant</reg></choice></w><seg> </seg><w corresp="mar:offset#xpath(substring(., 26, 3))"><choice><orig>que</orig></choice></w><seg> </seg><w corresp="mar:offset#xpath(substring(., 30, 4))"><choice><orig>Vous</orig></choice></w><seg> </seg><w corresp="mar:offset#xpath(substring(., 35, 5))"><choice><orig>pren-</orig><reg type="joined-1">prendrez</reg></choice></w></ab><ab corresp="mar:textnode#xpath(/*/*/*/*/*/text())"><seg> </seg><w corresp="mar:offset#xpath(substring(., 22, 4))"><choice><orig>drez</orig><reg type="joined-2">prendrez</reg></choice></w><seg> </seg><w corresp="mar:offset#xpath(substring(., 27, 7))"><choice><orig>quelque</orig></choice></w><seg> </seg><w corresp="mar:offset#xpath(substring(., 35, 8))"><choice><orig>intereſt</orig></choice></w><seg> </seg><w corresp="mar:offset#xpath(substring(., 44, 1))"><choice><orig>à</orig></choice></w></ab>
It's nasty-ugly but it's only intended for machines to read. Having the full form of the word on both sides of the linebreak means we'll be able to do n-grams properly, and having the two joined forms labelled differently (joined-1 and joined-2) means we'll be able to ignore one of them if we're reconstituting a continuous string.
I've been working out my ideas a little more clearly, and beginning to evolve the idea of a working pipeline and a target format for my documents. It would look something like this:
<ab>element. At this stage,
@xml:id, like this
<ab>element points back to the location of the original text node which gave rise to it, using a TEI pointer structure, something like this:
<w>tag, and that tag is linked back to the original source using XPath again:
<w corresp="xpath1(substring(., 36, 10))">.
<w>tag. It is also stored in an attribute (possibly
@n, or more likely a custom attribute), so that when the text content is normalized and modernized, the original form is still available.
<w>tags are run through a series of normalization rules which do things such as replacing long s.
<w>tags. This is going to require some serious processing, and will include algorithmic spelling modernization, dictionary lookups, etc.
@lemmaattribute on the
For this, we'll need a range of tools, some of which exist and some of which appear not to exist yet (or, as in the case of the lemmatizer, not in an open-source form we can adapt for a Java web application).
I now have a collection of a dozen or so papers I'm reading and annotating, and some ideas are getting clearer. At the moment (although I still have a lot of reading and consulting to do), this kind of approach looks promising:
Started some detailed reading on this topic, with some pointers from friends and people on TEI-L. It looks like a flurry of activity happened around 2005-2007, and there are some working examples such as EEBO with fully implemented systems, as well as lots of surveys of approaches, and some tools. It looks useful and interesting. Haven't found anything resembling a dictionary of variants for Early Modern French, though.
I added the following files to the Documentation folder on the Mariage server:
css_in_mariage.pdf – MH’s instructions on how to use CSS in Mariage
Editing_Guidelines.doc – guidelines and important points on editing transcribed text and markup.
Reference_and_Note_Writing_Manual.doc – step-by-step instructions on how to write references and notes.
XMLcodes_Master_List.doc – contains commonly used tags in the Mariage project with definitions and examples for each.
Allusions_manquantes.doc - contains a list of references in the references.xml that need to have content written for them as well as literary allusions that have yet to be identified. RAs can add to this list if they come across allusions in their text that they have trouble identifying. These will be readdressed in the future.
Mariage_bookmarks.html – bookmarks of useful websites for doing research for reference and note writing.
Mariage_Oxygen_Prefs.xml – RAs can import these preference settings in Oxygen.
Suggestion: MH, can you add your “Notes for initial basic markup” document to the documentation folder? I think it will be useful for teaching new RAs how to do basic markup of their texts.
Page number “76” in the t.o.c. should be in alignment with “ſon Mary.” in the chapter title, “La Conſolation & la Direction d'vne/Femme , qui n'eſt point aimée de/ſon Mary.” There seems to be a bug that’s creating a line break between the chapter title and page #.
a) Links to notes are still causing formatting issues in t.o.c. but I believe most of these problems should be fixed when the notes are replaced with
b) The note next to page number “75” (“Anomalie de pagination de la part de l'imprimeur; à la place de « 74 », il a mis « 75 ».”) is interfering with the first line of that page (“engendre la gonorrhee: il aduiẽt auſsi que la quãtité ou la quali-”). It’s creating an indent that shouldn’t be there.
3. References.xml file:
a) Link references within references. (Reminder: if a term appears more than once within the same reference, only tag the first occurrence.)
<ref> links in all
<quote>s. (We decided not to put
<ref> links in quoted material to avoid giving the erroneous impression that the source of the quote came from one of our references)
c) References need to be reviewed because there are still some, such as for word definitions, which should be converted into notes.
4. Review and standardize usage of
<quote> -> See MH’s blog post “Quotes and cits -- need to do a review and standardize” (15/06/11)
<argument> tags need to be changed into
6. Include a “back” button (⇐) for references and notes? This could be helpful for the user because when you click on a link within a reference or a note, the only way to return to the original reference/note is by going back to the text and clicking on the link. By including a “back” button in the window of a reference/note, the user will be able to return to the original reference/note more easily. CC and MH will have to discuss this.
7) MH has to upload changes to Mariage site for GMM and EGB through Exist Client.
Reminders for CC:
1. Questions to ask Evelyne for Marinello text:
a) Does she want all abbreviated terms for medications written in Latin to be translated into French and then have references written for them?
b) Definitions for “thym,” “maladie de nymphe” and “ansules”?
2. Ask Hélène Cazes to:
a) transcribe missing Greek phrase in Sonnet 1609 (“Et voſtre Muſe eſt tanquam [missing greek text] vous ne portez...”)
b) verify that transcription of Greek word in Des maladies des femmes (“à raiſon dequoy les Grecs l'on appellé μπτοα”) is correct.
MH also suggested that he could ask a student who knows ancient Greek to complete these two tasks.
:: Next Page >>
Faut-il se marier? La question de Panurge s’avère incontournable en Occident, surtout à partir de la contre-réforme. Des débuts de la Concile de Trente en 1545 jusqu’à la fin du règne de Louis XIV, la tentative de renouveler le mariage se heurte en France à l’intervention croissante de la monarchie dans cette institution dominée auparavent par l’Église. La rencontre entre ces deux autorités fut tumultueuse mais propice au foisonnement des documents qui font l’objet de ce site : « l’imaginaire nuptial » se compose de divers genres textuels, chacun ayant son caractère propre, mais tous traitant des peurs, des désirs et des fantasmes de plus en plus visibles dans la société d’Ancien Régime grâce aux débats soulevés par la nouvelle problématique de l’union conjugale. L’accent pour le moment est sur les textes et images misogames qui font partie d’un renouveau de la Querelle des femmes pendant les 25 premières années du XVIIe siècle.
|<< <||> >>|