Having fixed the problems with instances of http vs https protocol, still left with the problem that the site has not set up a Google account with an API key and a credit card number, so the MAP interface appears with a "site not using google maps properly" alert box and "development only" watermark on the map. Wrote to CC and outlined the two options (1. get such an account and give us the details and we'll modify code as needed; 2. replace google map with open layer map and rewrite blocks of code interacting with the map to provide same features, namely pins on map that when clicked launch a little popup window with link to actual video in player.)
Category: "Activity log"
Various features in the UI of the site were failing because a number of links to imported libraries etc. used the "http" protocol and the new server infrastructure will respect only those using the "https" protocol. I downloaded the entire site from the server and replaced the old files in the svn repository with the files on the server, then searched all the files for "http" and replaced instances that were actual links with "https" and left instances that were identifiers for namespaces alone. Uploaded those revised files using the JNLP client and it seems to be working.
1. ES added the xml files for fraf7, cltf5, pscf6.
2. ES added transcripts for fraf7, cltf5, pscf6
3. SA uploaded all the latest changes to the website and all seems to work fine. Thumbnails have to be created for fraf7, cltf5, pscf6.
In the kerfluffle last week with the eXist server and contained webapps, the Francotoile webapp was somehow corrupted. After an hour or two, we got the instance going again, but then discovered that the password for the admin client no longer worked, so we wouldn't be able to update the webapp. Solution was to replace the instance of the webapp on the server with a copy of it on my local machine.
Basic procedure to replace a corrupted instance of a webapp e.g. francotoile
- log in as tapor to tomcat manager on server (peach)
- undeploy webapp
- go back in browser to safe URL (one without undeploy instruction in it)
- ftp in as hcmc to server (peach.hcmc.uvic.ca)
- cd up and down to /usr/local/tomcat-instances/devel/webapps/
- delete old folder
- upload new folder (same name as old folder)
- refresh webapp listing in tomcat manager
- app should appear, click deploy
ES added transcripts for accf3, fraf8, cltf6
1. ES corrected location coordinates for cltf6, aacf3, fraf8
2. ES added transcripts (non annotated) for fraq 7, fraq8, fraq9
ES added about ten new videos and XML data files, so I had to create a thumbnail image for each. I ran each file in the player.xql file, stopped the video, captured a bit of the screen to a png file, edited that to 88x66 px (size that all of them seem to be) added them to the SVN repository, uploaded them to the production site and the copy of the site on my Mac.
While doing that, I noticed extraneous thumbnail files in the images (as opposed to the images/thumbnails) folder, so deleted those from the servers and from the repository.
ES noted that recent changes she'd made weren't appearing on the production site at francotoile.uvic.ca.
I had a connection in the exist admin client that used pear.hcmc.uvic.ca as the domain. I thought that would be dead, but when the connection succeeded, I assumed that domain name was forwarding to the current instance. Wrong. Obviously there is another instance somewhere on "pear" that is still running.
Created a new connection in the admin client using tomcat-devel.hcmc.uvic.ca as the domain and that worked. Also, the webapp in the new instance is francotoile and not francotoile21 as it was in the old instance.
In poking through the files, also noticed a connection string using lettuce.uvic.ca, so changed that to hcmc.uvic.ca and it seems to be working.
Updated the lastpass records.
1. ES added transcripts for fraq10, fraf6
2. ES has edited Liette's video, and given it to SA. Corresponding xml file has also been added.
3. ES asked SA to upload all new addition to the production site in order to see if edition with Audacity works fine.
1. SA found a solution with regards to cutting the soundtrack at the millisecond : Use Audacity! The program was installed on POMME.
2. ES entered & committed the transcripts for cltq3, fraq11, fraq12, fraq13
Back after a wee while!!
1. Links to video files were sent to Essen for his review before publishing them on the site.
2. SA and ES discussed the need of finding an editing software that allows cutting at the millisecond or hundredth of a second. ES suggested "Video Edit Master" which seems to be a free software. Ongoing.
3. Files in "media folder" POMME have been updated with the latest information existing on the server.
4. New xml files entered for pscf5, accf3, cltf6, fraf8
5. Transcript for cltq3 is done in txt format. Needs to be entered into xml file.
Last session on Francotoile (end of contract) - ES will continue to work on various aspects of the site on an irregular basis from now on.
Still to be done:
1) add the missing thumbnails to the following files on the website: fraf4, fraf5, fraf6, fraq7, fraq8, fraq9, fraq10, fraq11, fraq12, fraq13, ctlq2, cltq3, accq1, eduq1, edubc1
2) obtain the filter for the search function (transcript only or transcript+annotations). Once this is sorted, ES will prepare the three help files to go on the "info" page of the site.
3) annotations are to be added on the following transcripts: accf2, franb1, franb2, franb3, fraq3, fraq4, fraq5, fraq5, fraq6, pscf4, accq1, cltq2, cltq3, edbc1, eduq1 , fraf4, fraf5
4) annotated transcripts are to be added for the following files: fraf6, fraq7, fraq8, fraq9, fraq10, fraq11, fraq12, fraq13
5) The information displaying on the "info" page of the Francotoile should be updated (i.e. the help files and the RA names)
6) ES got a reply from another young lady from interested in being filmed for the project. She is in Québec currently and will get in touch when she return, early November. Still waiting for a response from a Quebecois writer.
7) At revision 218. Latest changes committed should be uploaded to the site.
1. Fifteen new and converted videos, shot in August and September 2012, were given to SA and added to the site.
2. ES created an xml file (no transcript) for each of them: accq1, cltq2, cltq3, edubc1, eduq1, fraf4, fraf5, fraf6, fraq7-13.
3. The following were reported and solved:
- the non-annotated transcripts for franb1 and franb2 were not displaying on the website (solved)
- the thumbnail for frac1 was missing (solved)
- the location for mixbc1 was indicating Québec instead of BC (solved)
- the latitude and longitude for fraf2 and eduf1 were those of France instead of Switzerland (solved)
4. At this stage, all videos have a corresponding xml file and should display on the website:
5. The following videos have a non-annotated transcript: accf2, franb1, franb2, franb3, fraq3, fraq4, fraq5, fraq6, pscf4
6. ES has prepared to help videos to get inserted on the Intro page of Francotoile. Checked how the video player was working with SA in order to prepare the last Help video. However there are some URL issues with the bookmarks and search functions. SA will look into this.
7. ES will send an email to Gary, Emilie and France to inform them that their video has been uploaded to the site.
Remaining hours on this contract: 7 hours 30
Old instance of francotoile was running Exist 1.5 and the structure required an odd "extra" instance of "francotoile/" in the URL to work properly.
New instance of francotoile runs in Exist 2.1 and has been restructured to not require the extra "francotoile/" in the URL.
Had sys-admins create a definition for francotest.uvic.ca and point it at the new instance and tested that. Then got them to use that same definition for francotoile.uvic.ca. Then they got rid of the definition for francotest.uvic.ca.
I still have to shut down the web-app of the old instance, as it's just wasting resources on the exist server now. I'll do that when Greg returns - likely we'll shut that instance down, get rid of all the files and archive them somewhere.
1. The problem of duplicated files mentioned in the previous blog post has been solved.
2. The inverted transcript for frac1 and frac2, as mentioned in previous blog post, has been corrected.
3. The problem with prmf6 mentioned in previous blog post has been solved.
4. ES and SA encountered a strange problem with the list of items in Keywords. Some items (grandes écoles, système éducatif français, classes préparatoires) are displaying in the list while not showing anywhere in the xml files. In the list these words have to be removed, and the items "musique" and "vie personnelle" should be translated into English.
SA solved the problem.
5. ES added the transcript for all videos that are available and on the site. Nine of them need to be annotated.
6. The [age] section of the search function, when selected on its own leads to an error message. However, when selected with another filter (e.g. [male] + [10-25]) the age filters functions properly. SA will look into this.
7. SA contacted Pat with regards to thumbnails. If this can be fixed quickly, it will be done before SA goes on vacation at the end of this week. Otherwise, the site will go live and thumbnails will be added at a later stage. SA will inform ES and CC.
8. ES noted an issue with the display of "there are no other video from". While the syntax in French has been fixed, now only "from Mali" or "du Mali" is displaying.
9. ES asked SA whether it could be possible to make a search within annotations optional so that the site user can choose to look for a word in the transcripts only or within the transcripts+annotations. In theory, this could be done. More discussion to follow if CC agrees with this idea.
Hours worked since beginning of August: 15
SA uploaded latest changes to Francotoile21.
(1) ES noticed the following problems to be fixed before moving to the production site:
- veqf1 should be removed from server as it was renamed/replaced, as follows: vepf2.
- mixc1 should be removed from server as it was renamed/replaced, as follows: mixbc1.
- eduf2 and eduf3 should be removed from server as they were renamed/replaced, as follows: pscf2 and pscf3.
- prmf5 appears twice on the map (?) Both files should be deleted from server, as the video is missing.
- lafc1 should be removed from server as it was renamed frac2. NOTE: ES notices that transcripts are inverted - frac1 has the transcript of frac2 and frac2 has the transcript of frac 1. This has to be amended.
- cltca1 and lacfa1 should be removed from server as they were renamed/replaced, as follows: cltq1 and vepq1.
- prmf6 (Valentin's video) displays as xxxx1 on the map (??) The video is not playing. Will need to look into this.
(2) In response to SA's questions, here is the list of videos that are missing thumbnails:
mixbc1 ; fraq3 ; fraq4 ; fraq5 ; fraq6 ; vepq1 ; cltq1 ; franb1 ; franb2 ; franb3 ; cltc1 ; frac1 ; frac2 ; edum1 ; pscf2 ; pscf3 ; pscf4; prmf6 ; cltf4 ; vepf2 ; accf2 ; fraf3 ; vepf1 ; lsrf3
1. The following videos have been transformed, edited and given to SA for upload on the website:
Émilie (cltf4), Jennifer (fraq5 & 6), Rémi (accf2) and Rémi&Émilie (pscf4)
2. xml files have been created and added to database for the above files on the exception of pscf4. Since this video involves two subjects, and because CC mentioned that it could a recurring format in the future, it will be necessary to discuss in more details how to:
- enter a transcript differentiating between Subject 1 and Subject 2,
- locate each character separately on the map,
This discussion should ideally take place with CC present, as these will be her decisions to make, and upon SA's return from vacation, early September.
3. various edits on xml files were made (fixes on dates, ages, etc.). Changes have been committed to the server.
4. The transcript and timeline for subtitles for fraq3 (Jordan) have been done. The file still needs to be annotated. Committed to the server.
5. ES reviewed the "theme" tags that exist on the server to amend the xml files appropriately, add new ones if necessary and clean up the dropdown search list on the site.
6. SA will upload all new changes + additions to the test server Francotoile 21 so that ES can check them next week. Thumbnails still have to be inserted on the Google map. The site should go live next week.
To be continued: transcripts and annotations for fraq3 (annotations only), fraq4, franb1, franb2, franb3, accf2, pscf4, cltf4
Hours worked since July 1st: 74 hours
ES gave me franb1.mp4, franb2.mp4, franb3,mp4.
I processed those through handbrake from mp4 to mp4, because the source files generated "unknown option" errors when I tried to process them with ffmpeg2theora, but the mp4's generated by handbrake worked. I've uploaded the files to the media folder in the florevid account.
Get source mp4 file on desktop
In Handbrake
- create mp4 output in a temp folder that is a stub
In terminal client
- cd to temp folder
- execute
find . -name *.mp4 -exec ffmpeg2theora --videoquality 6 {} \;
should end up with a .ogv file
Upload both to media folder in florevid account
I've also found that if I used the *.mp4 argument I would get "unknown option" errors, but if I specify the actual file name, I don't. Obviously means I have to do the files one at a time rather than as a batch.
Coordinates on Google map for all subjects under 19 have been amended, as follows:
1. For lsrq1, prmq1, prmq3, prmq4, prmq5, prmq6, vacq1, vacq2, vacq3 - Nationality settlement and Residence settlement tags were amended to "Saguenay-Lac-Saint-Jean" and geographic coordinates were changed to 48.547869, -71.648631
2. For lsrf1, lsrf2, prmf1, prmf2, prmf3, prmf4, prmf6, vacf1, vacf2, vacf3, secf1, secf2 - Nationality settlement and Residence settlement tags were amended to "La-Roche-Sur-Yon" and geographic coordinates were changed to 46.66169, -1.447642
3. Pierre's files were copied to DVD. ES will make sure that the DVD gets to him on his return from France.
4. Videos from Acadie were edited and added to the site. They have been named as follows:
franb1 (Véronique) / franb2 (Guy) / franb3 (Kim)
Transcripts to be done.
5. Transcript and annotations for vepf1 completed.
6. Xml files have been added for Jordan's videos (fraq3 & fraq4). Transcripts to be done.
7. Received new videos from CC, in .mpg format. This format cannot be opened by iMovie for editing. Will have to transformed in .mov files. SA is looking into it.
1. After discussing filenames with CC, the following changes occurred were added to the site:
Jean-Philippe (pscf 2 & 3) ; Pierre (lsrf3 & vepf1) ; Josette (vepf2) ; Elvis (frac2) ; Gary (cltq1 & vepq1) ; Natasha (mixbc1)
2. Annotations were added for lsrf3
3. Transcript + annotations started for vepf1 (to be continued)
4. New videos made in Acadie collected. To be transferred to POMME.
5. To be done: change location for children's videos.
Went through rest of Francotoile pages and made changes needed to get the pages to validate.
Most errors were because the code was calling the version of the i18n:echo method that returns an element (which was an element in the wrong namespace) rather than the version that returns text.
A couple of instances of code producing tei elements into the output xhtml which of course didn't validate.
One instance where the html element included a style namespace which didn't validate and does not appear to be in use, so I commented out that html line and replaced it with a simpler one.
in player.xql I've added "/text()" to the end of these three lines to force the query to return a string rather than an element.
{$movie//tei:person/tei:trait[@type='description']/tei:p/text()}
let $title := $movie//tei:titleStmt/tei:title/text() | $movie//tei:titleStmt/tei:title/tei:title/text()
let $relatedTitle := $related//tei:titleStmt/tei:title/text() | $related//tei:titleStmt/tei:title/tei:title/text()
If I return a string:
- I do not get the text of any embedded elements (e.g. the title of a publication in the title of the interview), which is why in the $title line, there is the or condition which at least returns the text of the embedded title, but not as a title for styling
- the output validates
If I return the element:
- I get a validation error "element from an unrecognized namespace" in the output xhtml
- I'm not sure what happens with embedded elements, presumably more validation errors.
1. Jordan's videos have been edited and passed on to SA for upload on website
2. ES sent a few emails to SA with regards to recent fixes:
a. ES noticed that mixc1 was no longer showing on the google map in the Browse page.
b. Film titles are still not showing in the video titles for sngl1 and sngl2
c. ES noticed that both the live and test website behave differently on different versions of browsers. ES asked SA if it would be possible to update Firefox and other browsers on POMME to their newer versions.
d.ES will do more testing during the weekend on newer versions of Firefox, Chrome, Opera, Safari and IE to find out whether there are major differences of behaviour.
e. the extension of the full transcript has helped for reading some of the annotations but various files are still affected with the "flickering" and "scroll down bar" problem. ES sent a list of those files by email to SA.
f. The thumbnails for newly added videos are not showing on the website.
3. Five new transcripts have been created and added and committed to the database, one that still requires to get annotated:
a. cltc1, frac1, lafc1 (Elvis)
b. fraf3 (Maeva)
c. cltf4 (Pierre) - annotations still to be created
Hours worked since July 1: 50 hours
Any annotation that goes beyond the size of the full transcript box leads to having a scrolldown bar appearing. However, it is impossible to scroll down, the screen and annotation flicker, and it makes it impossible to read the content of the annotation.
Simplest thing was to add more bottom-padding to the detailPanel in global.css, so that's what I did.
Any title (film, book, etc.) inserted in a <title> element within the titleStmt/title element does not show on the screen (see files sngl1 & sngl2)
Temporary improvement:
Modified the code in player.xql 40 from
let $title := $movie//tei:titleStmt/tei:title/text()
to
let $title := $movie//tei:titleStmt/tei:title/text() | $movie//tei:titleStmt/tei:title/tei:title/text()
Problem is that I eventually want to style any embedded titles, and the current approach isn't going to support that. I'll have to do some research to figure out how to parse the titleStmt element rather than just grab its text.
When generating an item in the list of related videos, the relatedList code in player.xql looked for
$relatedTitle := $related//tei:title
which I changed to
$relatedTitle := $related//tei:titleStmt/tei:title
so that only the title in the titleStmt would be returned and not all the title elements (e.g. of books, productions etc.) in the utterances
The space inserted between </ref> and <ref> is automatically deleted, which results in words annotated consecutively to be run together in the output xhtml. E.g. :
http://pear.hcmc.uvic.ca:8081/francotoile21/player.xql?id=accf1 (end of
6th paragraph)
in transcript.xsl added <xsl:text> </xsl:text> at end of template for tei:ref[@type='info'] to force a space to appear after the anchor in the xhtml generated from each tei:ref element
1. ES was informed by SA that amending the names of video files after they are uploaded on the server is fine, as long as we change all references to these files elsewhere (i.e. changing all references in the corresponding xml files). Therefore, all new videos have been added to the server and are visible on http://lettuce.tapor.uvic.ca/~florevid/media/ under the following (temporary) names: cltca1 & lafca1 (Gary), eduf2 & eduf3 (Jean-Philippe), prmf6 (Valentin), cltc1, frac1 & lafc1(Elvis), vepf1 & cltf4 (Pierre), fraf3 (Maeva).
2. To add a new xml file, indicate the following in the command box: [svn add [name of file].xml]
3. ES created a document with the list of the principal codes that are used for transcriptions (i.e. italics, interviewer's intervention, etc.) for future use. Document is saved on POMME and has been uploaded to Dropbox (cf. Liste des codes de transcription pour FrancoToile).
3. Transcripts for cltca1, lafca1, eduf2, eduf3 have been created and committed to Oxygen. These files are visible on FrancoToile21. Transcript for cltc1 has been created but not added to server's database as ES is awaiting CC's permission to do so (more details were sent by email to CC)
4. Latitude and longitude details for sngl1 and sngl2 were amended to reflect the country where these two speakers come from (i.e. Mali & Burkina Faso). ES also asked Dr. Niang where the recordings took place and he confirmed that he was in Burkina Faso and not Senegal as originally entered in the xml files.
5. New requests have been forwarded to SA with regards to the Google map:
a. Change the language from English into French in video descriptions visible on Google map for French interface (i.e. "name" becomes "nom", "location" becomes "lieu", "gender" becomes "sexe")
b. Change the word "location" for "lieu" on the Search page of the French interface.
c. In the xml files, ES added a comma between the [settlement] tag and the [country] tag for both residence and nationality, so that it displays better within the video descriptions available on Google map. SA believes that the comma should be part of the code and not as a text addition to the xml file, and he will look into creating the appropriate rule.
d. It was suggested that being able to choose several topics & age brackets from the drop-down lists of the Search page could be a good idea. It would allow to be more precise when selecting videos from databse. For example, one could find all videos where someone between "10-14" and "65+", is talking about "music", "language", and "cinema". The Search function would use AND (not OR).
e. ES got confused with the [stages of life] tag that is available in the xml files. The purpose of this tag is to be discussed with CC and SA will double-check if it's being currently used on the site (for example, as part of a site Search, etc.)
f. In the "topic/thème" drop down list on Search page, all the new topics that ES added as part of the newly added videos only display in the language they were entered (i.e. French) both on the French and English interfaces. Most of these topics will be deleted anyway as they make the list too long, and not easy to read and manage. However, one question remains: should there be a need to add a new [item] tag for the topic list (for example: music/musique), how should it be entered in Oxygen so that it displays and translates correctly on both interfaces of the website? Also, would it be a good idea to have these topics classified in alphabetical order in the drop down list?
6. To be discussed with CC: what is information to be used for Google map? Should it be where the person grew up, where the person lives, or where the recording took place? ES will discuss this with CC during next meeting and notify SA accordingly.
Note 1: Currently, the information regarding "location" that is displayed on Google map is what is entered under "residence" in the xml file.
Note 2: ES wanted to check the Excel sheet that CC created and that contains some information on the interviewed people (i.e. where they live, their age, etc.), but XL documents cannot be opened on the HCMC's computer.
7. Also to be discussed at next meeting with CC:
a. the topics (les thèmes) that best describe the videos and that provide a list that is neither too long or too difficult to choose from.
b. the new videos (Jordan) to be edited and added to the site + transcripts to be done
c. create a CD with Pierre's videos
d. identify the type of subjects that would be the most needed to expand the video database
e. retrieve, if possible, missing video (Yoanna, education primaire, prmf5)
8. Next step is the creation of transcripts for all remaining new videos.
1. All transcripts have now been reviewed and committed to the server. ES asked SA to upload them on FrancoToile21.
2. ES sent an email to SA (copy to CC) with a list of all the technical problems noticed while reviewing transcripts.
3. 10 new videos have been given to SA and are to be uploaded onto the site. ES sent an email to CC with suggestions of id for videos (e.g. cltf2, etc.) Will pass on CC's decision to SA as soon as it is received.
4. One video (prmf5 - primaire France 5 - Yoanna) is still missing. Once received ES will review and update the transcript that is currently on the website.
5. Next steps are :
a. create new Oxygen files and enter transcripts for new videos (9 transcripts to be done from scratch + 1 to be entered in Oxygen)
b. upload new videos and transcripts on website
c. Receive Jordan's videos from CC, edit, transcribe and add them to the site.
d. copy Pierre's video to CD.
Following Jamie's earlier posts to this blog (here and here), I downloaded and installed ffmpeg2theora and handbrake. Not sure exactly where the installer put ffmpeg2theora as it didn't tell me, but I was able to invoke it. I installed handbrake into the applications directory.
To create the ogv file, I executed from the command line:
find path/to/mov/folder -name *.mov -exec ffmpeg2theora --videoquality 6 {} \;
To create the mp4 file, I used handbrake's GUI, set the codec to H.264, the Video Quality to Constant Quality 20, and the framerate to Same as Source. I was unable to execute the same thing from the command line.
1. ES recorded Maeva and edited her video (can be viewed by CC on Dropbox; also has been saved on POMME)
2. All new transcripts have now been entered on Oxygen and uploaded on FrancoToile21
3. ES is now reviewing all transcripts on FrancoToile21 to correct all possible mistakes, and identify potential technical problems. So far 4 files have been reviewed (accf1, ancf1, ancf2, cltf1). The following problems have been identified and reported to SA:
a. the space situated between a closing [ref] tag and an opening [ref] tag is automatically deleted.
b. the [title] tag does not work when inserted in the title of the video. For example, in sngl 1 and sngl2, the title of the films (which has been entered between [title] tags in Oxygen) does not display in the title of the videos. Also, the [title] tag does not display in italics when inserted in the description of the video. For example, in sngl 1 and sngl2, the title of the films (which has been entered between [title] tags on Oxygen) does not display in italics.
c. when an annotation goes over the [div] box of full transcript, a scroll down bar appears and interferes with the displaying of the annotation. Basically, it keeps flickering and it is impossible to read the content of the annotation. See the last annotation in full transcript of file accf1 to see this problem in action.
4. ES had a conversation with CC today. The next steps are to (a) correct the newly added transcripts on FrancoToile21 so that they can go live; (b) add the newly recorded and edited videos; (c) create and enter the transcripts on Oxygen for these new videos; (d) identify with CC the type of subjects that would be the most needed to expand the video database.
Entered the following new annotated transcripts on Oxygen :
lsrq1, mixc1, mixh1, mixm1, prmf1, prmf2, prmf3, prmf4, prmq1, prmq3, prmq4, prmq5, prmq6, pscf1, secf1.
ES asked SA to upload these files on the FrancoToile21 server for later review.
Note: One video is still missing (i.e. prmf5.mov) - ES will prepare and enter a new annotated transcript when the video is provided.
After a couple of failed attempts on my own, Greg gave me some tips on what needs to be done to create a new instance of an exist app within tomcat. Here are my notes of the overall process
1) Get new instance of eXist into tomcat environment
- Get the exist folder from the most recent release (or build it yourself).
- Rename it (e.g. Francotoile21)
- Shutdown tomcat (cd to bin directory, ./shutdown.sh)
- Put Francotoile21 folder into the webapps folder
- startup tomcat (cd to bin directory, ./startup.sh)
- browse to http://localhost:8080/manager, you should see francotoile21 in the list
- browse to http://localhost:8080/francotoile21/index.xml, you should see the default exist home page
2) Get your stuff into the new exist instance
- you'll probably want to clear out all the rubbish that comes with the default exist instance: shutdown tomcat, delete everything but the WEB-INF file in your francotoile21 folder, restart tomcat.
- use the exist Admin client to create the structure for your site and populate it with your files. You can start the client from the exist instance you've just created by browsing to:
http://localhost:8080/francotoile21/webstart/existAdminClient.jnlp
- in the eXist admin client, ensure that permissions on the xql files and folders are rwxr-xr-x, permissions on all other files are rwxr--r-- The owner and group should have the default values (typically admin and dba).
3) If your entire site is in the database (rather than files in the filesystem) you need to redirect requests based on the server name (e.g.localhost:8080/francotoile21) to a specific collection within the database (i.e. so exist knows where to look for your site's files).
- Shutdown tomcat
- In the file tomcat/webapps/francotoile21/WEB-INF/controller-config.xml find this line:
<root pattern=".*" path="/"/>
and modify it to the collection containing the root of your site, e.g.
<root pattern=".*" path="xmldb:exist:///db/site"/>
- Save the file and startup tomcat
You should now be able to browse to a page stored in the new eXist instance within tomcat, e.g.
localhost:8080/francotoile21/search.xql
1. ES met with Pierre for recording, on Friday, May 25. All 8 videos have been edited. 2 have been chosen to go on the site. Transcripts still to be done.
2. The next recording session was supposed to happen on Friday, June 1. However the subject cancelled our meeting. To be rescheduled later in the month (i.e. mid-June 2012).
3. 15 new transcripts have been added to Oxygen. CC has agreed to upload them to the site. To be continued.
After much testing and experimenting, I think I've got the search page working in eXist 2.1 (running under jetty). The newer version of eXist (and/or the lucene extensions) handle default or implicit namespaces differently.
Original code looked something like this:
for $match in $utter//exist:match
let $summary := kwic:get-summary($expanded, $match, <config xmlns="" width="40"/>)
for $line in $summary//self::p
return
The p element is introduced by the kwic:get-summary function, and the question is what namespace that element is deemed to be in. In the older version of eXist, the code above worked. In eXist 2.1, that code returned nothing. I don't know what namespace that p is in, so Martin suggested the wildcard namespace:
for $match in $utter//exist:match
let $summary := kwic:get-summary($expanded, $match, <config xmlns="" width="40"/>)
for $line in $summary//self::*:p
return
and that worked.
I had to work through similar issues with the span elements that kwic embeds within the p element. They too are in some limbo namespace so my code had to include the wildcard namespace selector. In addition, the span outputted to the page included an xmlns="" attribute, and that caused the css to fail to select it.
Original code (worked in eXist 1.4)
for $line in $summary//self::p
let $before := $line/span[@class='previous']
let $match := $line/span[@class='hi']
let $after := $line/span[@class='following']
return
<li>
<a href="player.xql?id={$id}&start={$startTime}" title="{$start}"> {$before} {$match} {$after} </a>
</li>
Modified code (works in eXist 2.1)
for $line in $summary//self::*:p
let $before := $line/*:span[@class='previous']
let $match := $line/*:span[@class='hi']/text()
let $after := $line/*:span[@class='following']
return
<li>
<a href="player.xql?id={$id}&start={$startTime}" title="{$start}"> {$before} <span class="hi">{$match}</span> {$after} </a>
</li>
As the span is coming from the lucence kwic extension, it will be only text, so I don't think explicitly grabbing only the text should cause me any problems, and it allows me to then code in the containing span, which renders properly on the page.
I discovered that I can append an option to the ft query that allows wildcards at the start of the search string. I added a searchClauseOptions variable
let $searchClauseOptions := '<options><leading-wildcard>yes</leading-wildcard></options>'
and then passed that in as an argument to the search clause:
fn:concat('[tei:text/tei:body[ft:query(.,"', $searchterm, '",', $searchClauseOptions, ')]]')
There's a lot of escaping of string delimiters as that search clause itself ends up as a string which is eval'd to generate the results.
The system/config/db/site/data/collection.xconf file controls which lucene analyzer to use when indexing the data collection. It was set to use the WhitespaceAnalyzer. When I changed that to use the StandardAnalyzer instead and re-indexed the files, then the upper-case/lower-case issues went away.
I've done a bit of testing and the change does not seem to have introduced any problems, so I'm going to stick with it.
I also ran across the SnowballAnalyzer, which looks interesting, but I'll postpone investigating that until I get eXist working within Tomcat, as the arrangement with Jetty is annoying - particularly the implications for SVN.
I poked around the log files for a while to see what I could see about the problems launching eXist 2.1 in Tomcat. A guy on the eXist list posted the following in response to me posting the log files showing the errors when I tried to launch exist 2.1 in Tomcat. I haven't yet taken any action on it.
From this [see below] I read that one of exist-db extensions (betterFORM) tries to initialize the SAXON xslt library without success…. a method is missing.
Since the error is not about a missing class, but about a missing java method , I think a different (older or newer) version of saxon.jar is installed.
The solution is….. either to change saxon.jar (endorsed directory or somewhere else) to the version expected by betterFORM (actually bF depends on version 9.2.x.y ; for a newer version the bF code needs to be changed), or to to disable bF in the configuration files [need to check; it is in web.xml I think]
The localhost log includes this:
May 30, 2012 9:20:13 AM org.apache.catalina.core.StandardContext filterStart
SEVERE: Exception starting filter XFormsFilter
javax.servlet.ServletException:
de.betterform.xml.config.XFormsConfigException:
java.lang.reflect.InvocationTargetException
at de.betterform.agent.web.filter.XFormsFilter.init (http://web.filter.XFormsFilter.init)(XFormsFilter.java:71)
and
Caused by: de.betterform.xml.config.XFormsConfigException:
java.lang.reflect.InvocationTargetException
at de.betterform.xml.config.Config.initSingleton(Config.java:135)
Caused by: java.lang.NoSuchMethodError:
net.sf.saxon.sxpath.IndependentContext.setFunctionLibrary(Lnet/sf/saxon/functions/FunctionLibrary;)V
Encountered various problems with the lucene indexing and reporting in Francotoile, so decided to upgrade from eXist 1.5 to 2.1 in hope that improvements to lucene between those two versions would solve the problems.
I am successfully running an eXist instance on
a Mac running OS 10.7.4
java version 1.6.0_31
Java(TM) SE Runtime Environment (build 1.6.0_31-b04-415-11M3635)
apache tomcat 7.0.21
eXist 1.5.0
I tried downloading and running exist-2.1-dev-rev16458 and am unable.
The catalina log says
"org.apache.catalina.core.StandardContext startInternal
SEVERE: Error filterStart"
If I use the tomcat manager to start exist 2.1 I get
"FAIL - Application at context path /exist21 could not be started"
and the same error in the catalina log
I installed a newer version of tomcat (7.0.27) on that same computer and got the same results when I tried to launch eXist 1.5 (worked) and eXist 2.1 (failed), so I don't think that's the issue.
I then went to a Mac running OS 10.6.8
java version 1.6.0_31
JRE build 1.6.0_31-b04-415-10M3646
apache tomcat 7.0.27
I tried to run exist-2.1-dev-rev16458 and that worked, so it appears to be an issue with the JRE or (less likely) the OS on the first Mac.
Over the past 10 days or so, have worked through the to-do list and made a number of improvements on the dev site. If CC approves, I'll migrate these to the production site. Estimate about 20 hours in all.
improvements to searching:
- if user types in upper-case search string, the search now works as
well as if they typed in a lower-case search string
- the selected item in the "gender" and "topic" dropdowns on the search page remain visible
- all apostrophes have been normalized to straight (') rather than smart (’)
- quotation marks around a phrase now matches the entire phrase rather
than any word within the phrase
- if the user puts a colon (:) in the search string, the page removes it
improvements to markup and presentation:
- if you put a <title> element into an <utterance>, a <reference>, a <note>, or into <person><trait><p> in the teiHeader, it will be rendered in italics on the page
- if you add an <incident who="#interviewer"><desc> element into an utterance or a <u who="#interviewer> element, it will be rendered as grey on the page
- there is a show/hide control on the full transcript
remaining issues:
There are still problems with searching for words that happen to be
upper case in the transcript (e.g. search for bonjour and you'll see
there are 4 hits, the two that are lower-case in the transcript show a
link to the occurrence in the transcript, the two that are upper-case
don't show the link). To fix those I need to upgrade the version of the database engine and for some reason I'm unable to do that on my computer (though I can on others). So, until I sort that out, we're stuck on that issue.
We still don't match instances of the search string that occur in the
notes. I think this may be related to the upper-case/lower-case problem,
so a solution to it will have to wait a newer version of the database
engine.
The underline of the space following a reference results from the way
the machinery I'm relying on handles whitespace and is virtually
impossible to fix reliably, so I'm leaving it for now.
The indexing engine (lucene) used by the database does not allow wilcard
characters (? or *) at the start of the search string. No way around that.
1. With iMovie, which was installed on POMME on Wednesday, 8 video files have been edited. ES will write a document to explain the procedure for future use.
2. ES prepared two posters for recruitment. To be displayed at GSS and other significant places on campus.
3. ES will record two new subjects on Friday, May 25 and Wednesday, May 30 (respectively a male 60+, from south of France, and a female 20-30, from south of France).
4. Real Player was installed on POMME as it currently is the only player that will display one digit beyond the seconds and therefore provide enough precision for establishing the utterances for subtitles.
5. New transcripts for accf1, ancf1, and ancf2 were entered on Oxygen. To be continued with all files.
1. Transcripts for "Gary 1" and "Gary 2" are now complete. Once iMovie is installed on POMME and training has been provided, ES will proceed with video editing and timeline setup for subtitles.
2. All transcripts as they currently appear on the website have been saved in a folder named "TranscriptionsOLD" on Dropbox and in a folder named "Old Transcripts" on POMME. Each file contains the timeline and the transcript in Plain text format. This would allow to quickly copy/paste the info back into Oxygen, should reverting to these versions in the future be needed. The three following files did not have any transcripts: "sngl1", "sngl2", and "mixm1"
3. Agreed with CC today: Interviewer utterances and incidents within utterances will display in a grey color (no italics)
4. Agreed with CC today: Titles of books and films, for example, will show within utterances and within notes in italics (no change of color)
5. Next steps are:
a. contact the two potential contacts for recording;
b. edit "Gary 1" and "Gary 2" (see 1. above);
c. prepare a poster for recruitment that can be posted at GSS;
d. start entering the new annotated transcripts in Oxygen.
used switch --relocate oldURL newURL to point local copy to new URL for svn repo.
example:
switch --relocate https://revision.tapor.uvic.ca/svn/reponame, https://revision.hcmc.uvic.ca/svn/reponame
updated my local files, then used the exist admin client to upload 4 modified data files to the database.
Root of svn tree is at https://revision.hcmc.uvic.ca/svn/hcmc/
This structure in the xml data file:
<ref type="info">pépés<note> : <mentioned>Pépé<mentioned> est généralement utilisé par les enfants.</note></ref>
Was originally processed by this xsl:
<xsl:template match="tei:ref[@type='info']">
<xhtml:a href="#" class="tooltip">
<xsl:value-of select="./child::text()"/>
<xhtml:span class="hover_off">
<xsl:value-of select="tei:note"/>
</xhtml:span>
</xhtml:a>
</xsl:template>
Generating this output (note the "Pépé" is passed through as plain text, whereas user wants it italicized)
<a class="tooltip" href="#">pépés<span class="hover_off">Pépé est généralement utilisé par les enfants.</span></a>
I modified the xsl to this:
<xsl:template match="tei:ref[@type='info']">
<xhtml:a href="#" class="tooltip">
<xsl:value-of select="./child::text()"/>
<xhtml:span class="hover_off">
<!--<xsl:value-of select="tei:note"/>-->
<xsl:apply-templates/>
</xhtml:span>
</xhtml:a>
</xsl:template>
Which generates this output (note the "pépés" appears in the span as well as outside it):
<a class="tooltip" href="#">pépés<span class="hover_off">pépés : <em>Pépé</em> est généralement utilisé par les enfants.</span></a>
I've got to come with some xsl that gives me this output from the given input, but ran out of time today:
<a class="tooltip" href="#">pépés<span class="hover_off"> : <em>Pépé</em> est généralement utilisé par les enfants.</span></a>
When I do, I can delete the leading " : " which is only there as a kludge around this problem.
With critical input from Martin on the syntax of the java command, I managed to create a new rng file derived from the existing data files using the oddbyexample utility from TEI.
Here are my notes.
minimal instructions here: http://tei-l.970651.n3.nabble.com/ODD-by-example-utility-td2344937.html
download for saxon jar files : http://saxon.sourceforge.net/#F9.4HE
download for oddbyexample.xsl and getfiles.xsl : http://tei.svn.sourceforge.net/viewvc/tei/trunk/Stylesheets/tools/
my setup:
in folder: /System/Library/Java/Extensions (which is in the java classpath)
- saxon9he.jar (working jar file in System)
- saxon9-unpack.jar (working jar file in System)
all other files in folder: /Users/sarneil/Documents/Projects/french/FrancoToile/oddbyexample/
- data folder containing all the data files to use in creating the odd file (I removed child values folder)
- oddbyexample.xsl
- getfiles.xsl
- saxon9he.jar (backup of jar file in System, not used otherwise)
- saxon9-unpack.jar (backup of jar file in System, not used otherwise)
- ftodd (file created by running the java command below)
- francotoile.rng (file created by running ftodd file through Roma as detailed below)
- this readme file.
command I issued:
java -jar /System/Library/Java/Extensions/saxon9he.jar -it:main -o:/Users/sarneil/Documents/Projects/french/FrancoToile/oddbyexample/ftodd /Users/sarneil/Documents/Projects/french/FrancoToile/oddbyexample/oddbyexample.xsl corpus=/Users/sarneil/Documents/Projects/french/FrancoToile/oddbyexample/data
Everything (i.e. paths) is spelled out explicitly as otherwise there's just too much voodoo magic for me.
Tell java to run the jar file specified in the following argument (i.e. saxon9he.jar)
The -it switch presumably tells java which class to run first (not sure).
The -o switch provides the path and file name for the output file (e.g. /root/path/path/path/nameOfODDfile)
The next argument provides the path and file name of the oddbyexample.xsl file to run
The corpus= argument provides the path to the folder containing the tei data files to run the oddbyexample.xsl against to generate the ftodd file
Once you've the odd file
Go to http://www.tei-c.org/Roma/
Click the Open existing customization button and browse to the odd file you've just created
Click the start button
In the Customize tab, change the filename to what you want your schema's filename to be (e.g. francotoile) without any extension
Click the save button
In the Schema tab, select RELAX NG schema (XML syntax) not compact syntax
Click the generate button
Roma will generate the file francotoile.rng (using the name you provided and the extension based on the schema format you selected)
Save that file and move it wherever you want it to go.
Where the data files are expecting that rng file to be for francotoile:
<?oxygen RNGSchema="http://pear.hcmc.uvic.ca:8081/francotoile/rest/db/site/schema/francotoile.rng" type="xml"?>
Will test shortly.
The first line in each of the xml data files appeared as:
<?oxygen RNGSchema="http://francotoile.uvic.ca/includes/schema/francotoile.rng" type="xml"?>
Oxygen (and anything else) is unable to access the rng file using that URL. Greg worked out that this URL is publicly accessible (though never seen by the public) so can be used:
<?oxygen RNGSchema="http://pear.hcmc.uvic.ca:8081/francotoile/rest/db/site/schema/francotoile.rng" type="xml"?>
Edited all the data files to change the first line of the file, updated the svn repository.
The files don't validate against that schema file, so I'll have to figure out what's with that.
Getting used to using an svn repository. Exported the site from eXist using the jnlp client, put the contents in the trunk folder on local drive, cleaned out extraneous files (__comment__), added to repository.
The francotoile SVN repository is at
https://revision.tapor.uvic.ca/svn/francotoile/
The data folder (for francotoileeditors) is at
https://revision.tapor.uvic.ca/svn/francotoile/trunk/db/site/data
Greg created a francotoileeditors group with CC and ES as members, they have rw access to the data folder, but no access beyond that.
To create a copy of the data folder on local drive (instructions for a Mac)
- start : Terminal in Applications/Utilities
- type : cd Documents
- press : enter key (to move to the Documents directory)
- type : mkdir FrancotoileData
- press : enter key (to make a folder called "FrancotoileData" in the Documents folder)
- type : cd FrancotoileData
- press : enter key (to move to the FrancotoileData folder)
- type : svn checkout https://revision.tapor.uvic.ca/svn/francotoile/trunk/db/site/data/ . (note the space and period at the end, those are critical)
- press : enter key (that will make a copy of the data folder on the server to the FrancotoileData folder on your computer and set up the version control system
You're now ready to edit the files on your local computer just as you would normally.
At end of each session :
- click : on the icon for the running Terminal application or start Terminal and cd to your data folder
- type : svn commit -m "brief description of what you did here"
- press : enter key (to synchronize files on the server with those from your computer)
- quit Terminal
At start of each session (other than the very first when you checkedout the repository) :
- start : Terminal in Applications/Utilities
- type : cd Documents/FrancotoileData (or whatever you called your data folder)
- press : enter key (to move to your data folder)
- type : svn update
- press : enter key (to synchronize files on your computer from those on the server)
CC noticed that in a full transcript, if there is a link (<a href="#" class="tooltip">) on the bottom line of text that causes a popup tooltip thingee to appear below the text, that div appears outside the viewport on the page. The page generates a scrollbar to (in principle) allow the user to scroll down to see the gloss, but if the user moves the mouse off the link, then the gloss window disappears.
I added some padding to the bottom of the detailpanel div (which contains the full transcript) - enough to accommodate a three or possibly four line gloss. The popup handling in the individual subtitles positions itself properly, so is not an issue.
You can't put xql logic up with the declare statements in the head of the xql file. So, if you need to modify any of those dynamically, here's what you do:
1) hard code in the default case
2) write a function that tests the condition(s) you care about
2) then in the part of the file that is actually processed (and that seems to be anywhere within the root element (e.g. html)), invoke the function and then for each value returned by the function use a util:declare-option statement.
For example, if I want to write out one exist:serialize argument for IE and another one for all other browsers:
1) hard-code the default case (i.e., the one that works for all browsers other than IE) as usual near the top of the file :
declare option exist:serialize "method=html5 media-type=application/xhtml+xml encoding=utf-8 indent=yes doctype-public=''";
2) write a function that tests the condition you care about and put that up at the head of the file too (declare the namespace if necessary):
declare function local:isIE() as xs:boolean{
let $user-agent := request:get-header("user-agent")
return
if (fn:contains($user-agent,"MSIE"))
then (true())
else (false())
};
3) somewhere within the html element invoke the function and re-declare the option:
{if (local:isIE())
then (util:declare-option("exist:serialize", "method=html5 media-type=text/html encoding=utf-8 indent=yes doctype-public='' "))
else ()
}
I'm not sure if there are any constraints or best-practices regarding where to put the test and redeclaration, but in my brief testing it doesn't seem to matter.
In the interests of only maintining one set of up-to-date XML files - at least for now - I've moved the data files on Pomme from LCC's (/Users/lauren) and CC's (/Users/ccaws) accounts to /Users/admin/Documents/francotoile_old so that they're safely tucked away.
The most current version of the data is on the website: http://francotoile.uvic.ca
I put the schema online at: http://francotoile.uvic.ca/includes/schema/francotoile.rng
for-each-group
, in case something like XPath's distinct-values
function isn't sufficient (for example, when you want to group a series of elements by a certain value but you still want to access all of the other nodes in the elements):
<?xml version="1.0" encoding="UTF-8"?>
<markers xsl:version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:tei="http://www.tei-c.org/ns/1.0">
<xsl:for-each-group select=".//tei:teiHeader" group-by=".//tei:nationality/text()">
<xsl:variable name="nationality" select=".//tei:nationality/text()"/>
<xsl:variable name="geotags" select="normalize-space(.//tei:geo/text())"/>
<xsl:variable name="lat" select="substring-before($geotags, ' ')"/>
<xsl:variable name="lng" select="substring-after($geotags, ' ')"/>
<xsl:element name="marker">
<xsl:attribute name="name" select="$nationality"/>
<xsl:attribute name="address" select="$nationality"/>
<xsl:attribute name="lat" select="$lat"/>
<xsl:attribute name="lng" select="$lng"/>
</xsl:element>
</xsl:for-each-group>
</markers>
Added <geo> elements containing latitude and longitude for all the videos so that they can be efficiently mapped without having to resort to on-the-fly geotagging. As per the TEI schema I used the <geoDecl> element within <encodingDesc>.
To look up the latitude and longitude of each location, I used this tool: http://universimmedia.pagesperso-orange.fr/geo/loc.htm
declare option exist:serialize "method=html5 media-type=application/xhtml+xml encoding=utf-8 indent=yes doctype-public=''";
The important bits are the method and doctype-public options. Without both of those set, the doctype will not output properly.
<?xml version="1.0" encoding="UTF-8"?>
<collection xmlns="http://exist-db.org/collection-config/1.0" xmlns:tei="http://www.tei-c.org/ns/1.0">
<index>
<lucene>
<analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
<analyzer id="ws" class="org.apache.lucene.analysis.WhitespaceAnalyzer"/>
<text qname="tei:body" analyzer="ws">
<ignore qname="tei:note"/>
</text>
</lucene>
</index>
</collection>
The important notes are:
- You need to declare the namespaces you use in your documents in the <collection> element (e.g. xmlns:tei above)
- Use the full element names with namespaces when adding the indexes (e.g. tei:body)
Used these Exist docs as guides:
The structure of the utterances for the FrancoToile transcripts looks something like this:
<u>
Text of an utterance here...
</u>
<u>
Some more text here. <ref type="info">Special keyword <note>with an annotation</note></ref> here.
</u>
Each utterance is within a <u> element. Some of the words and phrases in utterances are marked by annotations. These special phrases are inside <ref> elements, and the annotations to go along with them inside <note> elements.
When searching the utterances for a keyword, I needed a way to exclude all text within <note> elements from the search, since they're not part of the actual utterance text. Martin and I spent a long (long!) time coming up with a good solution, but couldn't find anything satisfying. Then, thanks to Stack Overflow, I finally found found something that works:
//textNodeToSearch//text()[not(ancestor::note) and contains(., "searchTerm")]
Phew. This will search your text node (whatever you use for textNodeToSearch) for the search term but exclude all <note> elements from the search.
The complete XQuery used in the FrancoToile search, which orders the results by number of utterances found and also returns those utterances, is:
declare namespace tei = "http://www.tei-c.org/ns/1.0";
declare namespace exist = "http://exist.sourceforge.net/NS/exist";
for $result in collection('francotoile/data')//tei:TEI[tei:text/tei:body//text()[not(ancestor::note) and contains(., "searchTerm")]]
return
<match>
{
let $articleBody := $result//tei:body
let $id := $result//tei:TEI/@xml:id
let $articleTitle := $result//tei:titleStmt/tei:title
let $timeline := $result//tei:TEI/tei:text/tei:body/tei:timeline
return
<info>
<title>{data($articleTitle)}</title>
<refid>{data($id)}</refid>
<count>{text:match-count($result)}</count>
<timeline>{data($timeline)}</timeline>
</info>
}
<utterances>
{
for $utter in $result//tei:u
let $start := $result//tei:timeline/tei:when[@xml:id=$utter/@start]/@absolute
let $end := $result//tei:timeline/tei:when[@xml:id=$utter/@end]/@absolute
where matches($utter, 'searchTerm')
return
<utterance>
<start>{data($start)}</start>
<end>{data($end)}</end>
<text>{data($utter)}</text>
</utterance>
}
</utterances>
</match>
Made the following changes today from my to-do list from the last meeting (earlier today, April 07/11):
- Moved the 'dev' site to http://francotoile.uvic.ca
- Made off-site links in transcript annotations open in a new window
- Fixed the "Browse" link from the help page so that it goes to the correct map page
- Removed timestamps from the search results page and replace with a simple ordered list of results (#1, #2, etc.)
- Removed superfluous controls from the Google map
- Added a "No related videos" message when there are no matches found for a video's "related" sidebar
CC, PS and myself met this morning to discuss the state of the website and any changes that need to be done before CC presents the site at CALICO. Our next meeting is scheduled for Monday, May 9th at 10:00 AM. Our task list before that meeting looks like this:
Jamie will:
- Replace the old version of the website with the "dev" site, so that http://francotoile.uvic.ca points to the new version
- Make off-site links in transcript annotations open in a new window
- Investigate and fix "pulsating" utterance hover boxes
- Fix the "Browse" link from the help page so that it goes to the correct map page
- Remove timestamps from the search results page and replace with a simple ordered list of results (#1, #2, etc.)
- Ensure that search queries do not search annotations within transcripts
- Remove superfluous controls from the Google map
- Add a "No related videos" message when there are no matches found for a video's "related" sidebar
Pat will:
- Change the size and location of the UVic and HCMC logos on the Help page
- Make a new 'Reel' favicon
- Add a copyright notice to the site-wide footer
Quirky little gotcha when using the mb_* suite of PHP functions, such as mb_substr(), with UTF-8 data. If you don't explicitly set the encoding to UTF-8, then unicode characters will be chopped up incorrectly if handled by a mb_* function which is expecting a two-byte character rather than a three-byte character. Found this out the hard way when constructing the excerpt() function for the Francotoile search results page. So, before using the mb_* functions with your UTF-8 data, remember to set the encoding:
mb_internal_encoding("UTF-8");
Throughout the day yesterday and this morning I made some significant progress on the development website, completing most of my tasks. All of the changes are documented in the Subversion log, but here's a rundown:
- Completed the Quicktime -> HTML5 video player conversion. While the player itself has been functional for a couple of weeks, the peripheral functionality - bookmarks, searching, etc. - has been added slowly. Bookmarks are now in and working with more or less the same functionality as in the old site, but with less Javascript.
- The behavior of the "Subtitle" tab has also changed to become a little more streamlined. Rather than opening up a separate tab with a "show/hide subtitles" option, the Subtitle tab link itself simply toggles the subtitles.
- The Google map has been moved from a development page to the main map page. The functionality of the map is complete (pending any changes CC wants to make); the map markers just need some styling magic from PS. I also removed the search column from the map as requested by CC.
While writing the function to grab an 'excerpt' of text from an utterance (i.e. a chunk of text surrounding a search term), I discovered that substr_replace()
is not multibyte-safe. So, if you use it with a multibyte string you might get strange results.
The excerpt()
function, which is otherwise very solid and flexible, is adapted from CakePHP's text helper. I had to replace these two lines:
$excerpt = substr_replace($excerpt, $ending, 0, $phraseLen);
$excerpt = substr_replace($excerpt, $ending, -$phraseLen);
With these multibyte-safe lines:
$excerpt = mb_substr($excerpt, 0, 0) . $ending . mb_substr($excerpt, $phraseLen + 1);
$excerpt = mb_substr($excerpt, 0, -$phraseLen) . $ending . mb_substr($excerpt, $textLen);
So, in a nutshell, here's how to convert your substr_replace()
call to a multibyte version:
// Original call
$string = substr_replace($text, $replacement, $start, $length);
// MB-safe call
$string = mb_substr($text, 0, $start) . $replacement . mb_substr($text, $length);
I've migrated the new version of the website into a Subversion repository. Since there are multiple people editing the site, this'll eliminate the risk of overwriting changes, having things fall through the cracks, etc. The videos are not under version control but all other files within /dev are.
The repository is at: /svn/francotoile
document.write()
, which isn't available in XHTML. While there's a workaround which involves using AJAX to load the map (this is a good discussion about it), the work required to change the Google maps PHP class that I'm using (view the class) would be significant. Instead, I found a script to emulate document.write()
in XHTML. I'm not sure if this is the 'best' solution, but it's time-effective and doesn't break validation.Martin, Greg and I spent quite a bit of time getting the video page (player.php) to validate as XHTML5. The validation started out as part of the debugging process while figuring out why the HTML5 video player wasn't displaying in Firefox.
So, first we changed the document to XHTML5:
- Changed the doctype to html
- Changed the content-type header to: application/xhtml+xml;charset=utf-8
But that was causing hitherto unnoticed character encoding problems to throw outright validation errors. The characters came from the XML data and were mostly French accent characters. The XML itself was fine, so I finally boiled it down to an incorrect usage of PHP's XSLTProcessor library by the PHP eXist database library (which we didn't write). Following a comment made on the PHP page for XSLTProcessor::__construct, I changed this:
$xml_result = $xslt->transformToDoc($dom);
To this:
$xml_result = $xslt->transformToXML($dom);
In other words, from a DOMDocument object to plain old XML string. The DOMDocument transformation was resulting in improper character conversion.
Switching to transformToXML worked, EXCEPT that now, self-closing tags defined in the XSLT (e.g. img) stylesheets weren't being closed when transformed into XHTML, even though they were closed in the XSL files (weird). The solution, after some trial and error, was to change this tag in the XSL file:
<xsl:output method="html" omit-xml-declaration="yes" />
To:
<xsl:output method="xml" omit-xml-declaration="yes" />
So, simply changing the output method from html to xml. And luckily, that did the trick. The page was validating as XHTML5! Yay! Except for one problem...
The OGV videos still weren't playing in Firefox. D'oh. Then something clicked in Greg's mind and he remembered a similar problem on another project that turned out to have something to do with MIME types. Lo and behold, Apache doesn't have an OGV mime type (at least, not in the version running on the Francotoile server). So, I added this to the .htaccess file (along with MP4 types for good measure):
AddType video/ogg .ogv
AddType video/mp4 .mp4
AddType video/x-m4v .m4v
And that was it - video now working in Firefox.
Greg and I met with CC and PS to check in on the progress since our last meeting (two weeks ago) and to discuss some refinements and changes to the website leading up to the April 20th deadline.
A number of tasks emerged from the meeting:
For PS:
- Remove the search sidebar from the map page. The map will simply be a way to 'browse' all the videos; searching will be done on its own page.
- Update the navigation so that the options are: Browse, Search, Help
- Work on validating the site in HTML5
- Minor UI tweaks throughout the site
For CC (and/or assistant):
- Write about us/help text
- Provide translation text throughout site where appropriate
- Decide on various "related videos" categories (nationality, age, gender, subject, etc.)
For Greg:
- Investigate TEI standards for nationality and residence
- Ensure, with CC, that the nationality and residence data follows the TEI standards
For Jamie:
- Investigate (and hopefully implement) Google Maps API for video map points
- Change format of video text search results so that, instead of a timestamp, the search term with ~2 words before and after is displayed
- Ensure the HTML5 video player works in Firefox, IE (8), Safari, and Chrome
- Finish HTML5 version of video captions
No further face-to-face meeting was scheduled, though we'll be in email contact with each other.
find media -name *.mov | xargs -I file HandBrakeCLI -e x264 -q 20.0 --input file --output file.mp4
find media -name '*.mov.mp4' | xargs rename 's/\.mov.mp4$/\.mp4/'
Be sure to specify "-e x264" when using HandBrakeCLI or else it'll use ffmpeg.
find media -name *.mov -exec ffmpeg2theora --videoquality 6 {} \;
Finished the first pass of the dynamic 'related videos' sidebar, which displays a random list videos related to the one the user is currently watching. The code is contained within a findRelatedVideos() function to keep the player.php page decently clean. The function is fully documented and fairly simple to use. Though the function is currently displaying videos based on the <nationality> element within the <person> element, it can switch to any other element value within <person> simply by changing the function arguments.
Here's the function doc verbatim:
* Retrieves a list of videos related to $id based on the contents of $field.
* The field - for example, 'nationality' or 'residence' - should be within the
* <person> element in the video XML file. The default number of videos returned
* is 5, but can be configured with the optional $options array. If the number
* of related videos is greater than the number of videos to be returned, then
* they're chosen randomly.
*
* The videos are transformed into <li> elements in includes/xslt/related.xslt.
*
* Due to the structure of the system, the eXist $db object and the results of
* the XML query must also be passed.
*
* The function returns an associative array:
* - videos: the formatted video <li> elements, transformed via XSLT
* - field: the value of $field
*
* @param string $id Video ID
* @param object $db DB object
* @param mixed $xmlResult Result of query from player.php
* @param string $field Name of related field within <person>
* @param array $options Optional settings OPTIONAL
* @return array An array with the videos and the value of $field
* @author Jamie Nay
* @date 2011-01-31
*/
Jamie and I met with CC and PS to hammer out a priority list of tasks to complete before CALICO.
By April 20th PS will have
1) fixed 2 or 3 small interface bugs we ID'd
2) completed a basic shell for the map (just a GUI wrapper)
3) tweaked the search results page so it produces thumbnailed links (timestamped) in to each video
4) convert new pages to HTML5
5) embed an HTML5 video player (with Jamie's help)
By April 20th HCMC will produce
1) a basic map displaying all locations of speakers currently in the db, with each location providing a hook in to the search system to provide a list of speakers from that location. It will use OpenLayers and KML to begin with.
2) The "Other videos" panel to the right of the video player panel will be dynamic, providing, at first, a selection of other speakers from the same region as the speaker currently in the video player panel. Other possible categorizations were discussed, but not finalized.
After April 20th we will discuss any leftover issues and decide how to proceed. The goal is to present a mostly feature-complete version at CALICO (May 17-21).
Bug reported:
Search “culture” with no other criteria, one result is Hugues with 7 matches.
Search "culture" with location=Canada, one result is Hugues with 8 matches.
Problem is caused by use of "&=" operator. That exist operator causes the match to be included in the count of matches, which is not what I want. Could rejig a bunch of code so that only the matches on the text are counted, or could use the standard "contains" operator to filter returned nodes without affecting match-count. Latter is easier, so that's what I did.
Example before:
for $result in collection('francotoile/data')
//tei:TEI [tei:text/tei:body[. &= 'culture']]
[tei:teiHeader/tei:profileDesc/tei:particDesc/tei:person/tei:residence[ . &= "Canada"]]
order by text:match-count($result) descending
return ...
Example now:
for $result in collection('francotoile/data')
//tei:TEI [tei:text/tei:body[. &= 'culture']]
[tei:teiHeader/tei:profileDesc/tei:particDesc/tei:person/tei:residence[ contains (., "Canada")]]
order by text:match-count($result) descending
return ...
Thanks to Martin for clarifying this difference between the two operators.
Met with CC on future of FrancoToile, which has just obtained a SSHRC grant.
The single long list of subjects is getting unwieldy. Discussed using a map interface to allow users to select which subject they want to see. We'll probably also create a tabular representation allowing people to sort by various attributes. Map will be based on what Greg's done for GRS - which we're also going to use for Medieval.
She's going to do some useability tests on the search interface, report back and then I'll look into making the search engine a little more sophisticated.
She hopes to have significant enough modifications to present at EUROCALL 2011.
In discussing with Greg and Martin, we're considering a rewrite from the current php-based front-end to a cocoon based front-end.
We'd also like to support more video formats than the current quicktime (e.g. Flash as well).
Once we've got the map going, we can use that interface for more than simply looking people up. We can use it to display the locations of people with specified attributes etc.
- 1
- 2