Category: Academic

18/01/13

Permalink 02:44:07 pm, by mholmes, 125 words, 71 views   English (CA)
Categories: Academic; Mins. worked: 90

Update of schema, and plan to move to @style

I've edited the ODD file to take account of changes in the latest TEI, and built a new schema. Caught a few errors in the process, in the ODD and consequently in transcription files (usually bad @type values). Wrote to CC to see about finally merging the cab_sat fragment files into the transcriptions they belong in, and getting rid of them. I also fixed some odd spacing around @rend attributes, in preparation for the slightly complicated move from @rend to @style. I'll need to convert only those @rend values which contain a colon, and the XSLT will have to be updated in advance (preferably in such a way that it still handles CSS in @rend, although Schematron should help with trapping for that too).

07/08/12

Permalink 05:31:26 pm, by mholmes, 27 words, 109 views   English (CA)
Categories: Activity log, Academic; Mins. worked: 120

Meeting to go over grant app and presentation

Met with CC to discuss the grant application and the TRUTH presentation in September, and also fixed a couple of things in the db (publishing Le Blanc).

03/07/12

Permalink 04:30:10 pm, by mholmes, 24 words, 180 views   English (CA)
Categories: Activity log, Academic; Mins. worked: 60

Meeting to review application plans

Met with CC to go over plans for the application, and tweak the French translation of the technical description we wrote the other week.

20/06/12

Permalink 11:21:49 am, by mholmes, 85 words, 184 views   English (CA)
Categories: Activity log, Academic; Mins. worked: 60

Meeting to write project description

Met with CC to write a preliminary draft of a section of the grant application dealing with the proposed normalization and search functionality. This was a useful exercise, forcing me to make all the details explicit, and explain them in clearer terms than I have been doing to myself. The plan still looks good, and I'm looking forward to making more detailed plans based on this (especially plans for the creation of normalization rules, and an automated system for testing them and evaluating the results.

19/03/12

Permalink 04:09:28 pm, by mholmes, 365 words, 151 views   English (CA)
Categories: Activity log, Academic; Mins. worked: 120

Basic tokenizing now working

I now have my XSLT module successfully reconstituting a line-broken word on both sides of the break, like this:

<ab corresp="mar:textnode#xpath(/*[1]/*[2]/*[2]/*[1]/*[2]/text()[10])"><seg>
                    </seg><w corresp="mar:offset#xpath(substring(., 22, 3))"><choice><orig>ant</orig><reg type="joined-2">imagiant</reg></choice></w><seg> </seg><w corresp="mar:offset#xpath(substring(., 26, 3))"><choice><orig>que</orig></choice></w><seg> </seg><w corresp="mar:offset#xpath(substring(., 30, 4))"><choice><orig>Vous</orig></choice></w><seg> </seg><w corresp="mar:offset#xpath(substring(., 35, 5))"><choice><orig>pren-</orig><reg type="joined-1">prendrez</reg></choice></w></ab><ab corresp="mar:textnode#xpath(/*[1]/*[2]/*[2]/*[1]/*[2]/text()[11])"><seg>
                    </seg><w corresp="mar:offset#xpath(substring(., 22, 4))"><choice><orig>drez</orig><reg type="joined-2">prendrez</reg></choice></w><seg> </seg><w corresp="mar:offset#xpath(substring(., 27, 7))"><choice><orig>quelque</orig></choice></w><seg> </seg><w corresp="mar:offset#xpath(substring(., 35, 8))"><choice><orig>intereſt</orig></choice></w><seg> </seg><w corresp="mar:offset#xpath(substring(., 44, 1))"><choice><orig>à</orig></choice></w></ab>

It's nasty-ugly but it's only intended for machines to read. Having the full form of the word on both sides of the linebreak means we'll be able to do n-grams properly, and having the two joined forms labelled differently (joined-1 and joined-2) means we'll be able to ignore one of them if we're reconstituting a continuous string.

13/03/12

Permalink 11:36:33 am, by mholmes, 463 words, 141 views   English (CA)
Categories: Activity log, Academic; Mins. worked: 120

More work on modernization/searching etc.

I've been working out my ideas a little more clearly, and beginning to evolve the idea of a working pipeline and a target format for my documents. It would look something like this:

  • Original document is processed into a sort of generic structure where each text node is expressed as an <ab> element. At this stage,
    • The root text element in the new file points back to the source document using a private URI system based on the source document's @xml:id, like this xml:base="mar:maladies_des_femmes"
    • The <ab> element points back to the location of the original text node which gave rise to it, using a TEI pointer structure, something like this: <ab corresp="xpath1(*[20]/*[4]/*[3]/text()[2])">.
    • The contents of the text node are tokenized. It's not clear to me yet whether we need to tag punctuation, but we definitely need to tag words, so we'll need a good tokenizer that can handle this.
    • Words broken across linebreaks are reconstituted in the context of the text node preceding the linebreak, and ignored in the one following it. The reconstituted word is linked (see below) back to the original character strings in both locations, though.
    • Each word is marked up with a <w> tag, and that tag is linked back to the original source using XPath again: <w corresp="xpath1(substring(., 36, 10))">.
    • The original form of the word (reconstituted in the case of a broken word) is included as the text content of the <w> tag. It is also stored in an attribute (possibly @n, or more likely a custom attribute), so that when the text content is normalized and modernized, the original form is still available.
  • The resulting file is then processed again, and the text contents of <w> tags are run through a series of normalization rules which do things such as replacing long s.
  • Further processing attempts to modernized the contents of the <w> tags. This is going to require some serious processing, and will include algorithmic spelling modernization, dictionary lookups, etc.
  • The now-hopefully-modernized form is lemmatized, and the lemma is stored in an @lemma attribute on the <w> tag.
  • These documents can now be stored in the db and indexed for searching and analysis; search hits will have available to them the original spelling of the form, and will also be able to get back to the exact place in the original document where the form is located.

For this, we'll need a range of tools, some of which exist and some of which appear not to exist yet (or, as in the case of the lemmatizer, not in an open-source form we can adapt for a Java web application).

09/03/12

Permalink 01:55:15 pm, by mholmes, 180 words, 123 views   English (CA)
Categories: Activity log, Academic; Mins. worked: 60

More research on historical spelling variance

I now have a collection of a dozen or so papers I'm reading and annotating, and some ideas are getting clearer. At the moment (although I still have a lot of reading and consulting to do), this kind of approach looks promising:

  • Run XSLT on collection to create parallel collection in which each significant block (not clear what a block is yet) is converted to a modernized textual representation with an XPath pointer that points back to the original block in the original doc. In this process, linebreaks would be dealt with.
  • Each modernized block includes the original variants as attributes or elements (if the latter, the modern indexer can be instructed to ignore them).
  • Modern blocks may also be stemmed.
  • Search is done on modern blocks.
  • KWIC hits from search can be shown EITHER as modern OR as original sequence (reconstructed from original variants stored in modern block).
  • Clicking on the hit takes you to the original text, with hits highlighted based on a new search done using the original tokens stored in the modern block as search terms.

01/03/12

Permalink 03:56:25 pm, by mholmes, 68 words, 140 views   English (CA)
Categories: Activity log, Academic; Mins. worked: 120

Research on handling historical spelling variants

Started some detailed reading on this topic, with some pointers from friends and people on TEI-L. It looks like a flurry of activity happened around 2005-2007, and there are some working examples such as EEBO with fully implemented systems, as well as lots of surveys of approaches, and some tools. It looks useful and interesting. Haven't found anything resembling a dictionary of variants for Early Modern French, though.

14/01/12

Permalink 11:02:54 am, by lspwong, 192 words, 125 views   English (CA)
Categories: Academic; Mins. worked: 0

Mariage Project Guidelines Documents

I added the following files to the Documentation folder on the Mariage server:

css_in_mariage.pdf – MH’s instructions on how to use CSS in Mariage

Editing_Guidelines.doc – guidelines and important points on editing transcribed text and markup.

Reference_and_Note_Writing_Manual.doc – step-by-step instructions on how to write references and notes.

XMLcodes_Master_List.doc – contains commonly used tags in the Mariage project with definitions and examples for each.

Allusions_manquantes.doc - contains a list of references in the references.xml that need to have content written for them as well as literary allusions that have yet to be identified. RAs can add to this list if they come across allusions in their text that they have trouble identifying. These will be readdressed in the future.

Mariage_bookmarks.html – bookmarks of useful websites for doing research for reference and note writing.

Mariage_Oxygen_Prefs.xml – RAs can import these preference settings in Oxygen.

Suggestion: MH, can you add your “Notes for initial basic markup” document to the documentation folder? I think it will be useful for teaching new RAs how to do basic markup of their texts.

Permalink 10:40:09 am, by lspwong, 493 words, 200 views   English (CA)
Categories: Tasks, Academic; Mins. worked: 0

Mariage Project To Do List

1. le_blanc.xml:
Page number “76” in the t.o.c. should be in alignment with “ſon Mary.” in the chapter title, “La Conſolation & la Direction d'vne/Femme , qui n'eſt point aimée de/ſon Mary.” There seems to be a bug that’s creating a line break between the chapter title and page #.

2. maladies_des_femmes.xml:
a) Links to notes are still causing formatting issues in t.o.c. but I believe most of these problems should be fixed when the notes are replaced with <sic><corr> tags.

b) The note next to page number “75” (“Anomalie de pagination de la part de l'imprimeur; à la place de « 74 », il a mis « 75 ».”) is interfering with the first line of that page (“engendre la gonorrhee: il aduiẽt auſsi que la quãtité ou la quali-”). It’s creating an indent that shouldn’t be there.

3. References.xml file:
a) Link references within references. (Reminder: if a term appears more than once within the same reference, only tag the first occurrence.)

b) Remove <ref> links in all <cit><quote>s. (We decided not to put <ref> links in quoted material to avoid giving the erroneous impression that the source of the quote came from one of our references)

c) References need to be reviewed because there are still some, such as for word definitions, which should be converted into notes.

4. Review and standardize usage of <cit><quote> -> See MH’s blog post “Quotes and cits -- need to do a review and standardize” (15/06/11)

5. All <argument> tags need to be changed into <label> tags.

6. Include a “back” button (⇐) for references and notes? This could be helpful for the user because when you click on a link within a reference or a note, the only way to return to the original reference/note is by going back to the text and clicking on the link. By including a “back” button in the window of a reference/note, the user will be able to return to the original reference/note more easily. CC and MH will have to discuss this.

7) MH has to upload changes to Mariage site for GMM and EGB through Exist Client.

Reminders for CC:
1. Questions to ask Evelyne for Marinello text:
a) Does she want all abbreviated terms for medications written in Latin to be translated into French and then have references written for them?

b) Definitions for “thym,” “maladie de nymphe” and “ansules”?

2. Ask Hélène Cazes to:
a) transcribe missing Greek phrase in Sonnet 1609 (“Et voſtre Muſe eſt tanquam [missing greek text] vous ne portez...”)

b) verify that transcription of Greek word in Des maladies des femmes (“à raiſon dequoy les Grecs l'on appellé μπτοα”) is correct.

MH also suggested that he could ask a student who knows ancient Greek to complete these two tasks.

:: Next Page >>

Mariage

Faut-il se marier? La question de Panurge s’avère incontournable en Occident, surtout à partir de la contre-réforme. Des débuts de la Concile de Trente en 1545 jusqu’à la fin du règne de Louis XIV, la tentative de renouveler le mariage se heurte en France à l’intervention croissante de la monarchie dans cette institution dominée auparavent par l’Église. La rencontre entre ces deux autorités fut tumultueuse mais propice au foisonnement des documents qui font l’objet de ce site : « l’imaginaire nuptial » se compose de divers genres textuels, chacun ayant son caractère propre, mais tous traitant des peurs, des désirs et des fantasmes de plus en plus visibles dans la société d’Ancien Régime grâce aux débats soulevés par la nouvelle problématique de l’union conjugale. L’accent pour le moment est sur les textes et images misogames qui font partie d’un renouveau de la Querelle des femmes pendant les 25 premières années du XVIIe siècle.

Reports

Categories

May 2013
Sun Mon Tue Wed Thu Fri Sat
 << <   > >>
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31  

XML Feeds