1. ES met with Pierre for recording, on Friday, May 25. All 8 videos have been edited. 2 have been chosen to go on the site. Transcripts still to be done.
2. The next recording session was supposed to happen on Friday, June 1. However the subject cancelled our meeting. To be rescheduled later in the month (i.e. mid-June 2012).
3. 15 new transcripts have been added to Oxygen. CC has agreed to upload them to the site. To be continued.
After much testing and experimenting, I think I've got the search page working in eXist 2.1 (running under jetty). The newer version of eXist (and/or the lucene extensions) handle default or implicit namespaces differently.
Original code looked something like this:
for $match in $utter//exist:match
let $summary := kwic:get-summary($expanded, $match, <config xmlns="" width="40"/>)
for $line in $summary//self::p
return
The p element is introduced by the kwic:get-summary function, and the question is what namespace that element is deemed to be in. In the older version of eXist, the code above worked. In eXist 2.1, that code returned nothing. I don't know what namespace that p is in, so Martin suggested the wildcard namespace:
for $match in $utter//exist:match
let $summary := kwic:get-summary($expanded, $match, <config xmlns="" width="40"/>)
for $line in $summary//self::*:p
return
and that worked.
I had to work through similar issues with the span elements that kwic embeds within the p element. They too are in some limbo namespace so my code had to include the wildcard namespace selector. In addition, the span outputted to the page included an xmlns="" attribute, and that caused the css to fail to select it.
Original code (worked in eXist 1.4)
for $line in $summary//self::p
let $before := $line/span[@class='previous']
let $match := $line/span[@class='hi']
let $after := $line/span[@class='following']
return
<li>
<a href="player.xql?id={$id}&start={$startTime}" title="{$start}"> {$before} {$match} {$after} </a>
</li>
Modified code (works in eXist 2.1)
for $line in $summary//self::*:p
let $before := $line/*:span[@class='previous']
let $match := $line/*:span[@class='hi']/text()
let $after := $line/*:span[@class='following']
return
<li>
<a href="player.xql?id={$id}&start={$startTime}" title="{$start}"> {$before} <span class="hi">{$match}</span> {$after} </a>
</li>
As the span is coming from the lucence kwic extension, it will be only text, so I don't think explicitly grabbing only the text should cause me any problems, and it allows me to then code in the containing span, which renders properly on the page.
I discovered that I can append an option to the ft query that allows wildcards at the start of the search string. I added a searchClauseOptions variable
let $searchClauseOptions := '<options><leading-wildcard>yes</leading-wildcard></options>'
and then passed that in as an argument to the search clause:
fn:concat('[tei:text/tei:body[ft:query(.,"', $searchterm, '",', $searchClauseOptions, ')]]')
There's a lot of escaping of string delimiters as that search clause itself ends up as a string which is eval'd to generate the results.
The system/config/db/site/data/collection.xconf file controls which lucene analyzer to use when indexing the data collection. It was set to use the WhitespaceAnalyzer. When I changed that to use the StandardAnalyzer instead and re-indexed the files, then the upper-case/lower-case issues went away.
I've done a bit of testing and the change does not seem to have introduced any problems, so I'm going to stick with it.
I also ran across the SnowballAnalyzer, which looks interesting, but I'll postpone investigating that until I get eXist working within Tomcat, as the arrangement with Jetty is annoying - particularly the implications for SVN.
I poked around the log files for a while to see what I could see about the problems launching eXist 2.1 in Tomcat. A guy on the eXist list posted the following in response to me posting the log files showing the errors when I tried to launch exist 2.1 in Tomcat. I haven't yet taken any action on it.
From this [see below] I read that one of exist-db extensions (betterFORM) tries to initialize the SAXON xslt library without success…. a method is missing.
Since the error is not about a missing class, but about a missing java method , I think a different (older or newer) version of saxon.jar is installed.
The solution is….. either to change saxon.jar (endorsed directory or somewhere else) to the version expected by betterFORM (actually bF depends on version 9.2.x.y ; for a newer version the bF code needs to be changed), or to to disable bF in the configuration files [need to check; it is in web.xml I think]
The localhost log includes this:
May 30, 2012 9:20:13 AM org.apache.catalina.core.StandardContext filterStart
SEVERE: Exception starting filter XFormsFilter
javax.servlet.ServletException:
de.betterform.xml.config.XFormsConfigException:
java.lang.reflect.InvocationTargetException
at de.betterform.agent.web.filter.XFormsFilter.init (http://web.filter.XFormsFilter.init)(XFormsFilter.java:71)
and
Caused by: de.betterform.xml.config.XFormsConfigException:
java.lang.reflect.InvocationTargetException
at de.betterform.xml.config.Config.initSingleton(Config.java:135)
Caused by: java.lang.NoSuchMethodError:
net.sf.saxon.sxpath.IndependentContext.setFunctionLibrary(Lnet/sf/saxon/functions/FunctionLibrary;)V
Encountered various problems with the lucene indexing and reporting in Francotoile, so decided to upgrade from eXist 1.5 to 2.1 in hope that improvements to lucene between those two versions would solve the problems.
I am successfully running an eXist instance on
a Mac running OS 10.7.4
java version 1.6.0_31
Java(TM) SE Runtime Environment (build 1.6.0_31-b04-415-11M3635)
apache tomcat 7.0.21
eXist 1.5.0
I tried downloading and running exist-2.1-dev-rev16458 and am unable.
The catalina log says
"org.apache.catalina.core.StandardContext startInternal
SEVERE: Error filterStart"
If I use the tomcat manager to start exist 2.1 I get
"FAIL - Application at context path /exist21 could not be started"
and the same error in the catalina log
I installed a newer version of tomcat (7.0.27) on that same computer and got the same results when I tried to launch eXist 1.5 (worked) and eXist 2.1 (failed), so I don't think that's the issue.
I then went to a Mac running OS 10.6.8
java version 1.6.0_31
JRE build 1.6.0_31-b04-415-10M3646
apache tomcat 7.0.27
I tried to run exist-2.1-dev-rev16458 and that worked, so it appears to be an issue with the JRE or (less likely) the OS on the first Mac.
Over the past 10 days or so, have worked through the to-do list and made a number of improvements on the dev site. If CC approves, I'll migrate these to the production site. Estimate about 20 hours in all.
improvements to searching:
- if user types in upper-case search string, the search now works as
well as if they typed in a lower-case search string
- the selected item in the "gender" and "topic" dropdowns on the search page remain visible
- all apostrophes have been normalized to straight (') rather than smart (’)
- quotation marks around a phrase now matches the entire phrase rather
than any word within the phrase
- if the user puts a colon (:) in the search string, the page removes it
improvements to markup and presentation:
- if you put a <title> element into an <utterance>, a <reference>, a <note>, or into <person><trait><p> in the teiHeader, it will be rendered in italics on the page
- if you add an <incident who="#interviewer"><desc> element into an utterance or a <u who="#interviewer> element, it will be rendered as grey on the page
- there is a show/hide control on the full transcript
remaining issues:
There are still problems with searching for words that happen to be
upper case in the transcript (e.g. search for bonjour and you'll see
there are 4 hits, the two that are lower-case in the transcript show a
link to the occurrence in the transcript, the two that are upper-case
don't show the link). To fix those I need to upgrade the version of the database engine and for some reason I'm unable to do that on my computer (though I can on others). So, until I sort that out, we're stuck on that issue.
We still don't match instances of the search string that occur in the
notes. I think this may be related to the upper-case/lower-case problem,
so a solution to it will have to wait a newer version of the database
engine.
The underline of the space following a reference results from the way
the machinery I'm relying on handles whitespace and is virtually
impossible to fix reliably, so I'm leaving it for now.
The indexing engine (lucene) used by the database does not allow wilcard
characters (? or *) at the start of the search string. No way around that.
1. With iMovie, which was installed on POMME on Wednesday, 8 video files have been edited. ES will write a document to explain the procedure for future use.
2. ES prepared two posters for recruitment. To be displayed at GSS and other significant places on campus.
3. ES will record two new subjects on Friday, May 25 and Wednesday, May 30 (respectively a male 60+, from south of France, and a female 20-30, from south of France).
4. Real Player was installed on POMME as it currently is the only player that will display one digit beyond the seconds and therefore provide enough precision for establishing the utterances for subtitles.
5. New transcripts for accf1, ancf1, and ancf2 were entered on Oxygen. To be continued with all files.
1. Transcripts for "Gary 1" and "Gary 2" are now complete. Once iMovie is installed on POMME and training has been provided, ES will proceed with video editing and timeline setup for subtitles.
2. All transcripts as they currently appear on the website have been saved in a folder named "TranscriptionsOLD" on Dropbox and in a folder named "Old Transcripts" on POMME. Each file contains the timeline and the transcript in Plain text format. This would allow to quickly copy/paste the info back into Oxygen, should reverting to these versions in the future be needed. The three following files did not have any transcripts: "sngl1", "sngl2", and "mixm1"
3. Agreed with CC today: Interviewer utterances and incidents within utterances will display in a grey color (no italics)
4. Agreed with CC today: Titles of books and films, for example, will show within utterances and within notes in italics (no change of color)
5. Next steps are:
a. contact the two potential contacts for recording;
b. edit "Gary 1" and "Gary 2" (see 1. above);
c. prepare a poster for recruitment that can be posted at GSS;
d. start entering the new annotated transcripts in Oxygen.
The prime objective of this project is to create a prototype of a searchable digital video library representing francophone culture. It is to be implemented in French 262.
| Sun | Mon | Tue | Wed | Thu | Fri | Sat |
|---|---|---|---|---|---|---|
| << < | Current | > >> | ||||
| 1 | 2 | 3 | 4 | 5 | ||
| 6 | 7 | 8 | 9 | 10 | 11 | 12 |
| 13 | 14 | 15 | 16 | 17 | 18 | 19 |
| 20 | 21 | 22 | 23 | 24 | 25 | 26 |
| 27 | 28 | 29 | 30 | 31 | ||