HCMC Blogs

December 5, 2006

TASK: How to handle "includes" for Humanities, compliant with Contribute

Posted by on 05 Dec 2006 in Activity log

As optimal structuring of data may entail duplicate presentation of blocks of content, research into how best to do that in a way compatible with Contribute.

PHP with html includes

Task completed 30 jan 2007

TASK: Analysis of content and proposed new structure for Humanities site

Posted by on 05 Dec 2006 in Tasks

Analysis of current content and proposed new structure for Humanities site, with particular emphasis on targetting all and only relevant information to each of three audiences: students, faculty, public/alumni

deadline moved to Feb 12 to allow for meeting with Jennfier and Andrew a few days earlier

Fixed stats display for bloggers

Posted by on 05 Dec 2006 in Activity log

Regular bloggers (not privileged) didn't see the HCMC Stats link in the admin page. This was due to two things:

1. The HCMC Stats link was set to the same privilege level as the user controls, instead of being set to the "stats" level.

2. Bloggers didn't have permission to access "stats" level material. (I gave them View perms.)

one-page project description for Humanities Site

Posted by on 05 Dec 2006 in Activity log

INITIATION:
Dean Rippin's request

OBJECTIVES:
Goal is to update the website for the Faculty of Humanities to ease maintenance and to optimize the content and structure for each of three specific audiences

PARTICIPANTS:
Andrew Rippin to set objectives and review prototypes, Jennifer Lefler to review for maintainability as she'll be maintaining the content of the site.
Robin Sutherland of web coordinator's office providing templates for UVic standard site.
Stewart will take those templates and create site within them to meet objectives. Will also use project to become familiar with the templates and with the Contribute editor, specifically testing Contribute's abilities with php. Content similar to current, but updated where necessary and centralized where possible.

COMMITMENT:
Andrew Rippin and Jennifer Lefler already maintaining current site (using Contribute) and intend to continue to do so as much as possible.
Jennifer and Andrew willing to apply their current knoweldge of content editing to this new site and to learn enough to do that.

TECHNOLOGY:
Based on UVic standard template, though may be modified to support php includes.
Site will be hosted on unix.uvic.ca

FUNDING AND RESOURCES:
HCMC time and resources out of normal operating budget.

MILESTONES AND END OF THE PROJECT:
Analysis of content and proposed new structure: end of Dec 06
Research into Contribute and how best to use it for this site: end of Dec 06
Prototype approved: end of January 07
Completion of site: end of February 07
Training on maintenance of content: March 07
Likely target date for end of project: March 2007

CRITERIA FOR CESSATION OF PROJECT:
In the unlikely event that the web coordinator announces new templates during the course of this project, this project will be suspended to determine which templates to use.

FIT WITH EXISTING WORK:
Will allow Stewart to become refamiliarized with templates and in better position to comment on needed changes for new templates
Will allow Stewart to become familiar with capabilities and limits of Contribute, and with writing pages (in whatever editor) that are amenable to content editing in Contribute

meet Dean regarding Humanities website

Posted by on 05 Dec 2006 in Activity log

Met with Andrew Rippin on his requirements for the Humanities website:

maintain 3 target audiences
target information more precisely at each of three
scope of current content about right
site was based on UVic templates but a lot of bits and pieces have been added, making maintenance difficult
use templates for new site
investigate centralizing blocks of information (possibly php) in way compatible with DW/Contribute
Aim for new version completed by Mar 31, 2007

Having problems getting templates from Robin Sutherland, once I do I'll investigate how they support modularization themselves and with php, and will see how DW/Contribute deals with php.
Also need to go through current content and work out how to reorganize it suitably

Scraps Project

Posted by on 05 Dec 2006 in Formal Documents

Link: http://lettuce.tapor.uvic.ca/~dbadke/scrapbook/pages/spread.php

Scraps Project Charter

Project Leaders:
Research: Chris Petter and John Durno
Technical: David Badke

Other Participants:
Marnie Swanson, Ken Cooley, Scott Gerrity, Alouette Canada

Roles

Chris Petter: Project lead. Research and digital library objectives; providing quality images in digitized format; along with John Durno, implementing technology in the Library.
David Badke: Technical development of database, interface, utilities.
John Durno. Technology implementation in library.
Marnie Swanson: Funding approval.
Scott Gerrity: HCMC contact, general management/supervision.
Ken Cooley: Prioritization of projects within library. Funding approval.
Dodd’s Scrapbook Project: The “Scraps” software incorporates some code from other open source projects, but most of it is new code, unique to the project. The “Scraps” scrapbook software needs development for Dodds and similar scrapbooks but also, hopefully, for marking up individual articles in historic newspapers like the WW I Canadian Scottish Newspaper.

Funding

3 month term funding for David Badke’s salary. Dates: January – March, 2007. David will remain HCMC employee on term PEA appointment. Library will transfer funds to HCMC account to pay for David’s salary.

Purpose

The general purpose of the project is to create a collaborative virtual environment (tools) in which students and library staff can do mark-up, with approved metadata standards, of scrapbooks, and the Canadian Scottish WW I regimental newspaper, The Brazier.

More specifically, this project will equip cataloguers with tools to create granular, item level metadata of various collections of graphic and text materials. The scrapbook prototype and Image Mark Up tools are the primary technologies for the project. Data sets created during the project will be archived in a database and grown in future iterations of the project.

Both the educational process and project outcomes are important to the project. With these in mind, the tool features sets are being developed with an eye towards user-friendly GUI, version and permissions control, and easy administration of utilities to suit a collaborative project environment. The outcomes will be housed in a project repository on the TAPoR servers.

The McPherson library has funded the creation of the tool based on its potential for crossover application. The library has various books, scrapbooks, photo albums, archives, manuscripts and newspapers to which this technology could be applied.

History

July _Aug 2006: Work was scheduled for summer 2006 using the Image Mark-Up Tool for the Dodds Scrapbook which appears very promising at the first pass.

Benefits

This development benefits the digital library research under question since few tools are available to provide access and display for complex multilayer items like the Dodds scrapbook.
This development will provide archivist, librarians and students with innovative tools for collecting and displaying data for later transmission to a National portal.
Project has high potential for application across the country: Small archives around the country could display and provide access for multi-layered digital files in this way especially scrapbooks and newspapers.
Project provides opportunity for collaborative networking and development between Library and Archives units interested in digital media.

Scope

In past 2 months, David Badke has undertaken substantial troubleshooting and de-bugging of the Dodd's site. Engine, Help Files and a better administrative utility need to be added.

In scope (see project plan below) for his remaining time on the project:

add a module to the prototype viewer to provide access to those items individual high resolution image scans.
Fully develop the administration module to provide editing capability for the metadata and other configuration and maintenance functions.
Add a program to provide page turning for multipage enclosures
Complete documentation including a workflow manual for the project.
Complete development, testing and debugging.
Develop export capability to OAI specifications

Out of scope

Enhancements to allow for linking of newspaper articles that need to be linked from two or more pages.

Constraints

Formal change orders will be posted for approval. Scope of project currently demands full 3 months of development. New features or changes to plan may require additional funding and time.
If project stalls or meets obstacles, HCMC Coordinator will address primary participants and work out a strategy for completing or ending the project.

Assumptions

Image Mark up and scrapbook tools are primary technologies for the project.
All images will be provided to the project in a digitized format.
HCMC, TAPoR and the Library reserve the right to re-use the Scraps software technologies and code in other UVic related software developments. Scraps is Open Source and re-distributed as such under the Mozilla licence.
David Badke will be working full time on this project throughout the 3 month period Jan. – Mar 2007.
David will initiate tasks and document activity within the HCMC Project Blog in the same manner as all HCMC staff. He will update his activities daily, and break down tasks based on the plan.
Project participants will use the blog for tracking and reviewing the project activity. Participants will be given access to make posts and add comments to David's posts, and should specify types of entries David should post if additional reporting is required.
Project wrap up will be posted under formal documents in this blog where all participants can view and comment on it.

Risks

Use for newspapers has not yet been determined. It may be that complex indexing of newpapers is not feasible. Moderate.

Development Plan

There are three major parts to the scrapbook software: Engine, Administration, User Interface. Each of these will need full documentation.

The Dodds scrapbook images and metadata will be used as a test case and first application for the Scraps software.

Conceptual Diagram

Engine

The Engine is the code that interacts with the database and artifact images and renders “components” in the browser. It is partly PHP code and partly Javascript code, and uses AJAX techniques.

The Engine will have a set of “published” functions that can be called by the Administration program and the User Interface to interact with the data and render web pages – that is, the Engine is an Application Program Interface (API).
The Engine will have “components” that can be plugged into a user interface to render web page fragments (e.g. a component to display the artifact page images, a search component, a metadata display component).
The Engine will handle the hierarchical data transparently, allowing drill down from pages to items to subitems to any level.
The Engine should be able to work with more the just PostgreSQL databases (e.g. MySQL, Exist) by segregating database functions in a separate, replaceable module.
A single installation of the Engine should be callable by multiple applications.

Administration

The user needs to be able to configure the application, load data from Image Markup Tool files (possibly other files), maintain records, and perform other administrative tasks. The prototype administration application has facilities to load and maintain data and to produce reports. The following features are missing or incomplete:

The program will work with “projects” that encapsulate one complete artifact, to allow a single installation of the application to manage multiple projects. (This is similar to what IMaP does; some IMaP code can be probably be used for this.)
Functions will be created to allow the user to configure the scrapbook application. This includes the definition of metadata structure and database fields; set up of file locations, image types, and options; possibly assembly of user interface.
The existing program can only load Dublin Core metadata from Image Markup Tool files. It will be able to handle metadata in any valid XML format.
Functions to establish links between artifact pages and items will be created.
An export function to create OAI-compatible output will be added.
The program will use functions in the Engine wherever possible.

User Interface

The existing user interface is a rough prototype, suitable only as a proof of concept. The interface will be built from “plug in” components rendered by the Engine, so that the user can assemble an interface from standard parts.

The interface is the user's responsibility; there will be no single fixed interface. However, there will be at least one fully functional sample interface, and more would be better.
It will be possible for the user to modify a sample interface or create an entirely new one.
The user interface will call functions in the Engine API to render data on screen. It will not access any data directly.
Page layout will be controlled by CSS code.

Order of development

Engine: 80% of code (6 weeks)
Administration and User interface: (4 weeks)
Documentation (2 weeks)
User Approval and Sign off

meeting on publishing and presenting as part of job

Posted by on 05 Dec 2006 in Activity log

Informal meeting with Scott to discuss degree to which writing papers and presenting at conferences should be a part of job responsibilities for HCMC staff, and thus travel budgets should be allocated. Arose out of examples:
1) Catherine Caws and Scott both proposing to present at different conferences on the FrancoToile project and asking for my involvement in one way or another.
2) Scott wishing to present on the blog as an administrative tool and which if any staff should be involved in writing or presenting that.

Fixes for validation problems with Sonnet de Courval

Posted by on 05 Dec 2006 in Activity log

Hi there,

I fixed the following problems with Sonnet de Courval:

1. The docTitle element was put inside the body. docTitle has to be contained in a front element, before the body, like this:

2. There was no div element inside the body. There must be a containing block element inside the body; inline elements or plain text cannot be direct children of the body tag. So I've added a div tag like this:

3. Markup of stanzas was wrong. I see whole stanzas marked up as lines, like this:

<l> v. La Flegmatique : Luy tournera le dos tout au long de la nuict,
L’appellera vilain, lubrique, deshonneste,
Refrongnera le front en luy tournant la teste :
Le mary amoureux fasché de ce refus,
Caresse la servante & veut monter dessus.
La femme devient jalouse et il doit quitter la maison pour un temps.</l>

The <l> tag is a line rather than a stanza; stanzas are <lg>. I think it should appear like this:

<lg>
<l>v. La Flegmatique : Luy tournera le dos tout au long de la nuict,</l>
<l>L’appellera vilain, lubrique, deshonneste,</l>
<l>Refrongnera le front en luy tournant la teste :</l>
<l>Le mary amoureux fasché de ce refus,</l>
<l>Caresse la servante & veut monter dessus.</l>
<l>La femme devient jalouse et il doit quitter la maison pour un temps.</l>
</lg>

4. Plain text appears between page breaks, like this:

<pb n="2"/>Pourquoi vouloir nous emprisonner? Pire que le joug des forçats.
<pb n="3"/>Notre paradis devient un enfer. La pire des conditions.
<pb n="3"/>

That's not allowed by the schema; text must be in a container of some kind, such as a paragraph (p). I've supplied p tags in these cases.

I've commented out all the unmarked or partially-marked-up text, in order to get the document to validate, which it does now. When you're working on a large document, I'd recommend that you work this way:

1. Comment out all the text that you haven't marked up, except for the small section you're working on. Work on one paragraph or one stanza at a time.

2. When you've finished marking a section, validate the document. If it won't validate, it's best to fix the problem immediately; if you continue, you'll just store up more problems for yourself. Validate each section before moving on.

3. When the document validates, un-comment the next small section, and work on that.

I've found, after years of doing XML markup, that this is by far the best way to proceed. If you do a lot of work without doing any validation, the chances are you'll spend hours trying to figure out what the validation problems are at the end, and you may have to re-do a lot of your work (for instance, if you made the same sort of mistake several times).

I've also removed this document from the database, because it's not fully marked up yet.

Removed allard.xml from the db

Posted by on 05 Dec 2006 in Activity log, Academic

...as requested.

Work on TEI P4 to P5 default namespace problem

Posted by on 05 Dec 2006 in Activity log

This work grows out of the conversions I've been doing for P4 TEI to P5, on several projects (including ScanCan, EMLS and ultimately ACH). The TEI provides some sample stylesheets which take an approach to conversion which keeps the output free of any namespacing until right at the end, when a final stylesheet attempts to add the namespace . I was having trouble with this stylesheet, written by Syd Bauman, and began working with him on developing a test case we can use to get some serious advice about the best approach. This morning I worked through some basic tests, and reported as follows to Syd:

I've been trying to figure this one out, and a core problem is that you can't create a valid TEI P5 document which links to a schema (XSD file) but is not already in a namespace. I've done that for the purposes of testing.

Here's the minimal document:

<?xml version="1.0" encoding="UTF-8"?>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title>Minimal test document</title>
</titleStmt>
<publicationStmt><p>Unpublished</p></publicationStmt>
<sourceDesc><p>This electronic file is the original document.</p></sourceDesc>
</fileDesc>
</teiHeader>
<text><body>
<head>Minimal test document</head>
<p>This is an absolute minimal test document for P5 XSLT processing.</p>
</body></text>
</TEI>

Here's the minimal stylesheet:

<?xml version="1.0"?>

<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://www.tei-c.org/ns/1.0">

<xsl:template match="@*|node()|text()|comment()|processing-instruction()" priority="-1">
<xsl:copy>
<xsl:apply-templates select="@*|node()|text()|comment()|processing-instruction()"/>
</xsl:copy>
</xsl:template>

</xsl:stylesheet>

Here are my results, testing under oXygen:

1. With ONLY the copy-anything template:
-No xmlns attribute is added to the root under any of the following circumstances:
-XSLT 1.0 under Xalan, xsltproc, or Saxon 6.
-XSLT 2.0 under Saxon 8.

2. With the TEI match template enabled:
-XSLT 2.0 under Saxon 8: The xmlns attribute IS added to the root, but empty xmlns attributes are also added to its two child nodes (teiHeader and text).
-XSLT 1.0 under Saxon 6: Ditto.
-XSLT 1.0 under xsltproc: Ditto.
-XSLT 1.0 under Xalan: YES! "Correct" result; xmlns attributes is added to root, but NO empty xmlns attributes appear below.

So the situation seems to be that only with XSLT 1.0 under Xalan can we get the result we want, and we can only achieve that by matching the root node and adding a namespace attribute to the xsl:element tag.

Now we try using the apparently-wrong (according to our research) method, where the stylesheet looks like this:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="TEI">
<xsl:element name="TEI">
<xsl:attribute name="xmlns">http://www.tei-c.org/ns/1.0</xsl:attribute>
<xsl:apply-templates />
</xsl:element>
</xsl:template>


<xsl:template match="@*|node()|text()|comment()|processing-instruction()" priority="-1">
<xsl:copy>
<xsl:apply-templates select="@*|node()|text()|comment()|processing-instruction()"/>
</xsl:copy>
</xsl:template>

</xsl:stylesheet>

Results:

-XSLT 2.0 under Saxon 8: Invalid XSLT -- xmlns is not allowed as an attribute name.
-XSLT 1.0 under Saxon 6: No attribute is added. File is unchanged.
-XSLT 1.0 under Xalan: Ditto.
-XSLT 1.0 under xsltproc: Namespace IS added.

So in this case, the only working setup is with xsltproc.

It seems to me there's no reliable way to do this right now, so practically speaking, perhaps the whole approach of generating elements not in a namespace and then trying to put them in a namespace at the end is, if not wrong, then impractical. Perhaps all the stylesheets should carry the xmlns attribute in their root elements just so it's always the default namespace for output right through the process. I haven't tested that, though.

I hope this helps. Let me know if you find any different results, or if the gurus can give you a straight answer about this!