How to edit the TEI Guidelines

This document is intended to set out the way things are currently managed in the editing of the TEI Guidelines. General notes on the rationale for this state -- why it is the way it is -- may be added here later. The intention is to provide information for Council members wishing to contribute actively to the continued development and maintenance of the text of the Guidelines.

1. Logical organisation of the Guidelines

It cannot have escaped your notice that each chapter (almost) of the Guidelines defines a distinct module. In theory at least, each chapter is organised in more or less the same way:

it begins with a paragraph explaining what the module is for, and containing a lot of links to the individual subsections it contains;
each subsection introduces a (small) group of elements, usually beginning with a <specList>;
each element is then introduced in turn, usually including an appropriate usage example (on examples, see further 3.2. Examples);
a <specGrp>. for each group of elements defined may be given at the end of each section;
a <specGrp> for the whole module is given at the end of the chapter: it includes the other specifications either directly (by means of an entity reference) or indirectly (by means of a <specGrpRef> pointing to a preceding <specGrp>).

The only chapters not organised in this way are those which do not introduce or define particular modules.

2. Physical organization: the ODD files

Each element, class, and macro defined in the Guidelines is declared within its own XML file, containing an <elementSpec>, <classSpec>, or <macroSpec> as appropriate. These files are in the directory Source/Specs. For example, the file Source/Specs/abbr.xml contains the element spec for the <abbr> element.

Note that all translations share a single file in Specs. As a general rule, don't update a translation for any language of which you are not a native speaker. If you feel confident enough to adjust the translation, leave the @versionDate attribute unchanged in order to ensure the translation will be reviewed eventually.

Each chapter of the Guidelines is stored in a file called Source/Guidelines/xx/YY-name.xml where xx is the language (currently only en or fr), YY is the two letter identifier for each chapter (see 7.1. Chapter codes) and name is the name of the module being defined by that chapter.

The file Source/guidelines-xx.xml (where xx is either en or fr) is the ‘driver file’ for the whole shebang. It contains system entity declarations for each of the documents making up the P5 source. These entities are then referenced throughout the source to embed the required component at the right place.¹

Hence, to add a new element (say <saintName>) you might proceed as follows:

Write a new file saintName.xml containing an <elementSpec> for your new element and add it to the Specs folder.
Add a declaration like this to the existing driver file
<!ENTITY saintName SYSTEM "Specs/saintName.xml">
Edit the source of the relevant chapter (presumably ND-namesdates.xml) to include a reference to the element spec (like this &saintName;), and also some discussion of its usage. The former can appear anywhere, but good practice is to include it in an alphabetic list of such declarations near the end of the relevant section. You can also use a <specList> to reference the description from your new spec within the body of the text, like this:
<p>This module also defines the following canonical element: <specList> <specDesc key="saintName"/> </specList> </p>

3. Style Notes

3.1. General

The Guidelines are a reference manual, not a tutorial. You should not talk down to the reader, but assume they have a reasonably well-informed knowledge of the subjects under discussion. Make copious use of cross references, rather than repetition.

Bear in mind however that your reader may not have English as their first language. Avoid needlessly complex sentences and unnecessarily obscure terminology. Make sure that technical terms are glossed on their first appearance: this should be in the chapter on XML in the case of XML-related terminology. If you want to provide other references, do so as footnotes, using the <note> element.

Provide bibliographic citations for any other standards (etc) referenced, following the existing style. Do not introduce bibliographic citations simply in order to demonstrate your learning.

See the Style Guide for Editing the TEI Guidelines, which attempts to state preferred practice on vexed issues issues about spelling, punctuation, etc. The goal of these rules is to avoid inconsistency, and also (wherever possible) to avoid producing text which is markedly either British or American English.

3.2. Examples

The purpose of an example is to illustrate a specific element or feature. Do not include irrelevant encoding which does not contribute to this primary goal. If such encoding is unavoidable (eg to make your example valid), then it must be explained in the supporting text.

Wherever possible, choose your examples from real documents and provide bibliographic citations for them in the file BIB-Bibliography.xml. Use the @corresp attribute on the <egXML> element to link an example to its source note. Note that the @xml:lang attribute is mandatory on <exemplum>: this is to ensure that the ODD processor knows which examples to choose in a given context.

All examples should be valid against a modified TEI schema in which any element can act as a root element: this validity is checked during the build process.

3.3. Good encoding practice

Good encoding practice will ensure not only valid but also highly functional Guidelines.

When referencing figures and to other sections of the Guidelines, use <ptr>, not <ref>, to ensure that the title and number of the referenced item is automatically inserted when the Guidelines are compiled.

The build process validates cross-references. Since the Guidelines is compiled into a single XML document at build time, IDs must be unique across the text and the examples. Consequently, any @xml:id attribute values appearing in your examples must be unique within the text of the whole of the Guidelines. Furthermore, any @target (etc.) values which do not point to anything in the source will be flagged with a warning during the build process.

4. Making a change to the Guidelines

Use a Subversion client to update your local copy of the source text before you make any changes in it.
Make your changes. Make sure your source is still valid against the p5odds.rnc schema.
If you have a locally installed P5 build environment, make sure you can still build, and that the examples are still valid.
If you don't, just use subversion to check your updated version back in and wait for our two Jenkins Continuous Integration servers (http://tei.oucs.ox.ac.uk/jenkins/ and http://teijenkins.hcmc.uvic.ca:8080/) to assess your work

Error messages may appear at any stage. Please do not leave the source in an invalid state (it makes life unnecessarily difficult for others). If you cannot immediately fix a validity error, revert your change while you think about it.

The Jenkins servers monitor the Subversion repository, and when they detect a change, they check it out and commence building several targets, just as you would build them on your local machine. There are a couple of advantages to letting the Jenkins servers check your build for you:

You don't have to have all the TEI packages and other software required for a build installed on your system. This means you can make a quick fix to the Guidelines on any computer you happen to be using, without installing a lot of extra software.
The TEI packages on the Jenkins servers tend to be updated regularly, and we're watching them to make sure they work properly.
Jenkins lets you know by email if there's a problem, and provides useful debugging tools.

If you submit a change, and later get an email from one of the Jenkins servers telling you that the build failed, it will provide a link to the build information on the server. Here's what to do:

First, check that the build is broken on both Jenkins servers. If it's only broken on one of them, it may have been caused by a lag in updates to packages on that server.
If both servers have completed a build since your commit, and both are showing an error, then you need to check where the error is occurring. On the page for that build on the Jenkins server site, click on ‘Parsed Consol Output’ on the left menu. You'll see links to ‘Errors’ and ‘Warnings’; these will show you the exact point in the build script where the errors or warnings occurred. This may give you a useful clue to the cause of the failure.
If you still can't figure out the problem, email the Council list with a link to the build information, and someone will be able to help.
Once you know what the problem is, fix it by editing the source again and committing the change to Subversion. Jenkins will then do its stuff, and you'll know whether your fix worked as expected.

Error messages appearing during the make test phase (the ‘TEIP5-Test’ job on Jenkins) usually indicate that your changes are in conflict with the Birnbaum Doctrine, which decrees that changes in the Guideline schemas should not invalidate existing documents. You may wish to discuss the specific issue with other Council members.

5. Adding Schematron constraints to specifications

The TEI ODD system is primarily concerned with generating schemas in the form of RelaxNG or XML Schema. However, there are often circumstances in which you want to apply constraints to elements and attributes which cannot easily be captured by normal XML schemas. For instance, you might want to apply a co-occurrence constraint on some attributes. The @targetLang attribute is a good example. @targetLang is an optional attribute which ‘specifies the language of the content to be found at the destination referenced by @target, using a ‘language tag’ generated according to BCP 47.’ Obviously, there is no point in using @targetLang if you're not also using @target. However, many such co-occurrence constraints are difficult to express in RelaxNG schemas, and may not survive conversion to other schema formats such as XML Schema or DTD.

For this reason, we often use ISO Schematron to express constraints like this. If you look in att.pointing.xml, where the @targetLang attribute is defined, you'll find this constraint, inside the <attDef> for @targetLang:

<constraintSpec ident="targetLang" scheme="isoschematron" xmlns:sch="http://purl.oclc.org/dsdl/schematron"> <constraint> <sch:rule context="tei:*[not(self::tei:schemaSpec)][@targetLang]"> <sch:assert test="count(@target)">@targetLang can only be used if @target is specified.</sch:assert></sch:rule> </constraint> </constraintSpec>

This Schematron rule is an assertion that if @targetLang is used, @target should also be present. <constraintSpec> has an attribute @scheme (normally set to isoschematron). Inside <constraintSpec>, <constraints>s have <assert> elements, which have @test attributes, which are XPath; if the XPath tests false, the assertion will be fired, and its contents will appear on the console when you build or validate. There is also a <report> element which is similar, but fires when true. In Roma, you can also generate a Schematron schema which you can also use to test your document against. This document is essentially a compilation in Schematron of all the TEI constraints.

<constraintSpec> can appear as a child of <attDef>, <classSpec>, <elementSpec>, <macroSpec>, and <schemaSpec>. We'll go through the process of adding a constraint like this. The constraint we're going to add relates to dating elements (<date>, <birth> etc.) and the @calendar attribute. @calendar ‘indicates the system or calendar to which the date represented by the content of this element belongs.’ In other words, @calendar should only be used if the dating element has textual content. This makes sense (assuming that @calendar points at a valid <calendar> element):

<date calendar="#julian">January, 1622</date>

whereas this is not:

because the <date> element has no textual content to which the @calendar attribute could apply. We're going to express this in the form of a Schematron constraint, along the lines of the one we've examined above. First, we open the att.datable.xml file, and find the <attDef> element which defines @calendar. We can add the <constraintSpec> element immediately after the <datatype> element, like this:

<constraintSpec ident="calendar" scheme="isoschematron" xmlns:sch="http://purl.oclc.org/dsdl/schematron"> <constraint> <sch:rule context="tei:*[@calendar]"> <sch:assert test="string-length(.) gt 0">@calendar indicates the system or calendar to which the date represented by the content of this element belongs, but this element has no textual content.</sch:assert></sch:rule> </constraint> </constraintSpec>

(Obviously, by the time you're reading this, the <constraintSpec> is already in the TEI source, so you'll see it there.) We'll also have to make sure we add the Schematron namespace to the <classSpec> root element, so that the sch: prefix is defined: xmlns:sch="http://purl.oclc.org/dsdl/schematron". Then we commit our changes, and let the TEI build process build all the products, and make sure that we didn't get anything wrong.

That should do the job. However, it's quite difficult for us to test whether this constraint is in fact doing exactly what it should be, unless we build a new copy of Roma and use it to generate a Schematron schema, then validate a test document against it. This is probably not practical for most of us. Fortunately, the TEI build system provides a way for us to do this; in fact, we can put in place a couple of tests that will always be run whenever P5 is built, checking that our schematron constraint is intact and functioning as we expect.

The first thing we're going to do is add a couple of tests that should pass. We'll add a dating element which has both @calendar and some textual content, as well as an empty dating element with no textual content. If these tests pass, then we know that our constraint is not doing anything wrong. (We don't yet know whether it's doing anything at all, of course; that comes later.)

If you look at trunk/P5/Text, you'll see there is a whole folder full of files whose purpose is to test various aspects of the TEI build process and products. We want to add our tests to one of these files. The question is which one? We'll add it to the basic test file, which is testbasic.xml; this is tested against schemas generated from testbasic.odd, which should contain all the dating features we're interested in testing. If we look at that file, we find there are already several date elements in there, so we can try adding our calendar attribute to one of those. Let's choose the date of 1685 on a dictionary entry sense:

<sense> <date calendar="http://en.wikipedia.org/wiki/Julian_calendar">1685</date> <form> <orth>pamplemousse </orth> </form> </sense>

We could go to the trouble of adding <calendarDesc> and <calendar> to the header of the file so we can point to a calendar element in the same document, but since @calendar is a data.pointer, we can point to an external source of calendar information.

We also want to add, somewhere, a date element which has no textual content and no @calendar calendar attribute. We might as well do this in the header, by adding a simple <revisionDesc> element, which gives us the added bonus of being able to describe our change:

<revisionDesc> <change> <date when="2012-09-06"/>MDH: Added @calendar to one date, and the date element in here, for testing a new Schematron constraint.</change> </revisionDesc>

Now we can commit our change, and see if the build of TEIP5-Test completes successfully on our Jenkins servers.

If that build successfully completes, we haven't broken anything. But we still don't know whether our constraint will actually fire when something is wrong. In order to do that, we have to use the "detest" system. In trunk/P5/Test, you'll find the following files:

expected-results/detest.log
detest.odd
detest.xml

detest.odd and detest.xml are test files like the ones we've seen above, but the purpose of the ‘detest’ files is to introduce deliberate errors and make sure that the testing process throws up the expected error results. What happens is basically this:

Schemas are built from detest.odd (including a Schematron schema).
The file detest.xml is validated against those schemas.
Resulting error messages are collected in a file called detest.log (in the Test directory).
That file is compared with the detest.log file in the expected-results subdirectory.
If they are not identical, the test build fails.

So what we need to do is to add some new markup to detest.xml which is designed to fail our Schematron test. The problem is that we cannot reliably predict how it will fail—in other words, we can't know in advance what the resulting detest.log file should look like, because we can't know in what order the tests will run, and what the precise error messages might be. We could find this out if we had a working local build environment of our own, but it's far simpler to let Jenkins do the job for us. So this is what we'll do:

Add our new test to detest.xml.
Commit the change to SVN.
Let Jenkins run the build (which should fail).
Examine the resulting detest.log on Jenkins, and copy it to our local expected-results/detest.log.
Commit that change to SVN.
Let Jenkins build again, and make sure that the build completes successfully.

We'll add this div to the detest.xml file:

<div> <p>Added by MDH. This tests the Schematron constraint that any element with @calendar must have some textual content.</p> <p> <date when="2012-09-06" calendar="http://en.wikipedia.org/wiki/Gregorian_calendar"/> </p> </div>

Now we commit the change to SVN, and Jenkins will start building. The build should fail, and it does. If we now go to the Jenkins workspace here: http://bits.nsms.ox.ac.uk:8080/jenkins/job/TEIP5-Test/ws/Test/ we'll see a file called detest.log, and if we look inside it, we'll find this bit, generated by our constraint: ‘@calendar indicates the system or calendar to which the date represented by the content of this element belongs, but this element has no textual content. (string-length(.) gt 0)’ This line is obviously missing from expected-results/detest.log, so the build failed when the two files were compared. We can fix that very simply:

Download the detest.log file from the TEIP5-Test workspace on the Jenkins server (job/TEIP5-Test/ws/Test/).
Copy its contents into our local file expected-results/detest.log.
Commit this change to SVN.
Watch Jenkins build P5-Test again, and make sure it completes successfully.

6. Building the release

Note: the original content of this section has been removed, because a longer document dedicated to documenting the release process has been created. Please refer to TCW22: Building a TEI Release.

7. Reference section

7.1. Chapter codes

Following a lengthy debate in the Council as to whether the two-character codes originally used to identify individual chapters should be dropped in favour of longer more human-readable names, a compromise solution was reached in which the two character codes were retained as prefixes to longer human-readable names. The same two-character codes are also used to identify the HTML and PDF files generated during the release process.

The following table shows the correspondence between the printed organization of the Guidelines and the corresponding filenames. The order is determined by the driver file Source/guidelines-xx.xml, from which the table is derived.

Section	Title	filename
[i]	Releases of the TEI Guidelines	TitlePageVerso.xml
[ii]	Dedication	Dedication.xml
[iii]	Preface and Acknowledgments	FM1-IntroductoryNote.xml
[iv]	About These Guidelines	AB-About.xml
[v]	A Gentle Introduction to XML	SG-GentleIntroduction.xml
[vi]	Languages and Character Sets	CH-LanguagesCharacterSets.xml
[1]	The TEI Infrastructure	ST-Infrastructure.xml
[2]	The TEI Header	HD-Header.xml
[3]	Elements Available in All TEI Documents	CO-CoreElements.xml
[4]	Default Text Structure	DS-DefaultTextStructure.xml
[5]	Representation of Non-standard Characters and Glyphs	WD-NonStandardCharacters.xml
[6]	Verse	VE-Verse.xml
[7]	Performance Texts	DR-PerformanceTexts.xml
[8]	Transcriptions of Speech	TS-TranscriptionsofSpeech.xml
[9]	Dictionaries	DI-PrintDictionaries.xml
[10]	Manuscript Description	MS-ManuscriptDescription.xml
[11]	Representation of Primary Sources	PH-PrimarySources.xml
[12]	Critical Apparatus	TC-CriticalApparatus.xml
[13]	Names, Dates, People, and Places	ND-NamesDates.xml
[14]	Tables, Formulæ, and Graphics	FT-TablesFormulaeGraphics.xml
[15]	Language Corpora	CC-LanguageCorpora.xml
[16]	Linking, Segmentation, and Alignment	SA-LinkingSegmentationAlignment.xml
[17]	Simple Analytic Mechanisms	AI-AnalyticMechanisms.xml
[18]	Feature Structures	FS-FeatureStructures.xml
[19]	Graphs, Networks, and Trees	GD-GraphsNetworksTrees.xml
[20]	Non-hierarchical Structures	NH-Non-hierarchical.xml
[21]	Certainty, Precision, and Responsibility	CE-CertaintyResponsibility.xml
[22]	Documentation Elements	TD-DocumentationElements.xml
[23]	Using the TEI	USE.xml
[A1]	Model Classes	REF-CLASSES-MODEL.xml
[A2]	Attribute Classes	REF-CLASSES-ATTS.xml
[A3]	Elements	REF-ELEMENTS.xml
[A4]	Attributes	REF-ATTRIBUTES.xml
[A5]	Datatypes and Other Macros	REF-MACROS.xml
[A6]	Bibliography	BIB-Bibliography.xml
[A7]	Prefatory Notes	PrefatoryNote.xml
[A8]	Colophon	COL-Colophon.xml

In most chapters, the two character code is also used as a prefix for the @xml:id values given to each <div> element. Note that every <div> element carries an @xml:id value, whether or not it is actually referenced explicitly elewhere in the Guidelines.

Note that files with names beginning REF contain only <divGen> elements: their content, which provides the reference documentation (sections A1 to A5 inclusive), is automatically generated during the build process.

7.2. Naming conventions

TEI naming conventions have evolved over time, but remain fairly consistent.

generic identifiers

An element and attribute identifiers should be a single natural language word in lowercase if possible. If more than one word is conjoined to form a name, then the first letter of the second and any subsequent word should be uppercased. Hyphens, underscores, dots etc are not used within element or attribute names.

class names

Class names are made up three parts: a name, constructed like an element name, with a prefix and optionally a suffix. The prefix is one of model. or att. and indicates whether this is a model or an attribute class. The suffix, if present, is used to indicate subclassing: for example att.linking.foo is the foo subclass of the attribute class att.linking

xml:id values

The conventions for these vary somewhat. Most of the older chapters of the guidelines have consistently constructed identifiers, derived from the individual section headings. Identifiers must be provided for:-

every <div>, whether or not it is explicitly linked to elsewhere
every bibliographic reference in the BIB.xml file

7.3. File release structure

Currently, the organisation of the /usr/share/xml/tei and /usr/share/doc/tei-* directories on the TEI web site is as follows:

A TEI Project