►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
►
---------
Introduction to Humanities Text-Encoding
Using XML-based Tools on Humanities Texts
Martin Holmes
Stewart Arneil
Greg Newton
Session 1 Overview
- Technology in literature: there's nothing new here...
-
Example projects: what might be possible with documents written in XML
-
Brief look at the origins of markup
-
Document analysis: preparing to mark up a text
-
Basic mechanics of XML
-
Marking up a text
==>
Examples of Humanities Computing projects from UVic
Two definitions of "markup" from the OED
- The amount added by a seller to the cost price of goods to cover overheads and provide profit.
- The process of embedding tags in an electronic text so as to distinguish the text's logical, syntactic, or structural components.
==>
Markup is an ancient concept
- Take a look at the extract from Jane Austen's Emma. What instances of "markup" (broad definition) can you see?
==>
Problems with this kind of "markup"
- What is metadata and what is not?
- What do italics actually mean?
- What does a period mean?
- Traditional texts are not machine-readable.
==>
Preparing for markup: Document analysis
- Pick your text
- Identify your audience
- Choose your focus: what features or aspects of the text are important?
- Analyse your document: List the features of the text that you wish to capture in your encoding.
- Example: Page 1 of manuscript of Jane Eyre
==>
XML
-
XML consists mainly of tags and attributes. (See the example.)
-
XML represents a document as a hierarchical tree structure. The structure can only branch in one direction, and elements cannot overlap.
- Open tag: <name>
- Close tag: </name>
- Full tag: <name>Joe</name>
- Empty tag: <pb />
- Attribute: <name type="first">Joe</name>
==>
Rulesets for document structure
-
<abbr expan="United Nations">UN</abbr>
-
<abbreviation full="United Nations">UN</abbreviation>
-
<acronym meaning="United Nations">UN</acronym>
-
Machines can't cope with this. They must know what to expect.
- SCHEMAS (or DTDs) define what tags and attributes are
used, and where.
==>
Schemas
-
Background: HTML page showing a simple XML document and associated ruleset
-
Real rulesets (WARNING: Large file!), a.k.a. DTDs and schemas, are complex, but comprehensible by computers (and humans if need be)
-
The Text Encoding Initiative (TEI) provides a set of standard modular rulesets and tools for creating them
==>
Start New Document with oXygen
-
Start the oXygen application as you would any other program
-
Choose New... from the File Menu "XML document", choose XML Document in the New Document dialog box, and click OK. [Screenshot]
-
Check "Use DTD or schema" checkbox.
-
Choose "RelaxNG", select "XML syntax".
-
Type this into the URL box at the top:
http://hcmc.uvic.ca/tei.rng
-
Make sure "TEI" is selected in the "Document root" box, then click "OK". [Screenshot]
==>
Edit Your Sample Document in oXygen
-
Red wiggly lines indicate errors in the file.
-
Insert appropriate tags - note how oXygen helps / constrains you, based on the schema.
- Build a minimal document structure.
-
Copy and paste your text from http://hcmc.uvic.ca/presentations/xml/material/sonnet_130.htm.
-
Confirm document is well-formed (blue check) and valid (red check).
-
Save your file.
-
That's the end of this session.
==>
Recap: TEI XML Encoding
- Set of modular schemas for XML encoding of humanities documents
- Used for: manuscripts, verse, drama, prose, dictionaries, linguistic data, academic articles, etc.
- Archival format
- Interchange format
- NOT a rendering format
==>
Session 2 Overview
- Introduction to Cascading Stylesheets
- Using CSS to style XML
- CSS rulesets, selectors and properties
- A strategy for writing CSS to style XML
- Limitations of CSS
- The next level: XSLT
==>
Setting up your workspace
A CSS Ruleset
A CSS Ruleset: Selectors
A CSS Ruleset: Properties
A CSS Ruleset: Values
A CSS Ruleset: Punctuation
A CSS Ruleset: Recap
Selectors we will use
-
div{...} (type selector)
-
div p{...} (descendant selector)
-
title[level="m"]{...} (attribute selector)
-
quote:before{...} and quote:after{...} (pseudo-selectors)
-
More details:
http://www.w3.org/TR/CSS21/selector.html
==>
Properties we will use
-
display: block | inline | none; (hiding and showing elements)
-
width: 60%; (sizing elements)
-
margin-top: 1em; (space around elements)
-
text-align: left | right | center | justify;
-
font-size: 150%;
-
font-family: georgia, "times new roman", serif;
-
font-style: italic;
font-weight: bold;
-
color: black;
background-color: white;
==>
Steps in building a stylesheet
-
1. Specify which elements to hide.
-
2. Specify which elements are blocks.
-
3. Set margins on block elements.
-
4. Set text alignment on block elements.
-
5. Set font size on block elements.
-
6. Style inline elements.
==>
CSS Task 1
- Write your own CSS file to make the css_intro.xml document look exactly like the printout we've given you.
==>
Limitations of CSS
- There's no interactivity
- We can't display images or other embedded content.
- CSS is useful for display of simple documents, or for proofing our markup, but not much more.
==>
XSLT: eXtensible Stylesheet Language Transformations
-
XSLT is an XML language
-
The purpose of XSLT is to turn XML into something else.
-
XSLT can produce XML, HTML, or text output.
-
We will be writing XSLT to produce XHTML output.
==>
Getting started with XSLT
-
Create a stylesheet:
File / New / XSL Stylesheet / Version 1.0.
-
Save your file as "css_intro.xsl".
-
Link your XML file to this stylesheet. Replace the old xml-stylesheet instruction with this one:
- <?xml-stylesheet href="css_intro.xsl" type="text/xsl"?>
-
Now simplify your XML file by removing the xmlns attribute in the root element. Also remove the schema declaration.
-
Open the XML file in your browser.
==>
Building the XSLT file
-
First, we need to tell the processor what kind of output we want to create:
-
Add this, between the open and close tags of the XSLT file:
<xsl:output method="html" />
==>
Our first template:
-
<xsl:template match="/">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title><xsl:value-of select="TEI/teiHeader/fileDesc/titleStmt/title/text()" /></title>
</head>
<body>
<xsl:apply-templates select="TEI/text/body" />
</body>
</html>
</xsl:template>
==>
The rest is just templates...
-
...which we really have to demonstrate one at a time in oXygen.
==>
Wrapup: How our projects actually work
-
The XML documents are stored in an XML Database called eXist:
-
The Website is managed by Cocoon:
==>