General Encoding Practices

Introduction

This manual contains TEI instructions for encoding situations that are common to both born-digital documents and transcriptions of primary sources. The encoding instructions for primary sources link to this manual when relevant. When in doubt, always check with a senior member of the MoEML team.

Use the <div> Element

Whether encoding a primary source document or authoring a born-digital document, we follow TEI practice by using <div> elements to divide distinct sections and subsections of text from one another. In the case of born-digital documents, it is important to assign an @xml:id for each <div> element so that the rendering system can automatically generate a table of contents for the document. The @xml:id assigned for a section or subsection <div> should take the form of the @xml:id for the whole TEI document, followed by an underscore (_) and then a "descriptive word" for the section or subsection.
The born-digital document linking.xml serves as an example for how to use <div> elements. This TEI document, which has been assigned an @xml:id of "linking", consists of five sections, the last of which contains three subsections. The author of this document uses <div> elements as follows:
<body>
  <div xml:id="linking_intro">
    <head>Introduction</head>
    <p>Section content.</p>
  </div>
  <div xml:id="linking_external">
    <head>Link to External Web Pages</head>
    <p>Section content.</p>
  </div>
  <div xml:id="linking_internal">
    <head>Link to Other MoEML Pages</head>
    <p>Section content.</p>
  </div>
  <div xml:id="linking_youtube">
    <head>Link to Youtube Videos</head>
    <p>Section content.</p>
  </div>
  <div xml:id="linking_graphics">
    <head>Graphics</head>
    <p>Section content.</p>
  </div>
  <div xml:id="linking_markup">
    <head>Markup (Tag) and Pull Data from Databases</head>
    <p>Section content.</p>
    <div xml:id="linking_locations">
      <head>Linking to Toponyms (Location Files)</head>
      <p>Subsection content.</p>
    </div>
    <div xml:id="linking_people">
      <head>Linking to People in PERS1.xml</head>
      <p>Subsection content.</p>
    </div>
    <div xml:id="linking_reference">
      <head>Linking to Reference Material in BIBL1.xml</head>
      <p>Subsection content.</p>
    </div>
  </div>
</body>
Because the author has assigned @xml:id attributes for each <div> element in this example document, the rendering system will automatically generate a table of contents for this document when it appears on the website. The table of contents for linking.xml appears as follows when it is rendered on the live website:
linking_toc.jpg
Note the similarity between the structural hierarchy of <div> elements in the XML code and the structural hierarchy of the table of contents.

Add Draft Content to a Published Page

To add draft content to a published page, tag the draft content using the <div> element with a @rend value of "hidden". For example,
<body>
  <div>
    <p>This is published content that is visible to the user.</p>
    <div rend="hidden">
      <p>This is draft content that is invisible to the user.</p>
    </div>
    <div>
      <p>This is published content that is visible to the user.</p>
    </div>
  </div>
</body>
Content tagged using the <div> element with a @rend value of "hidden" does not appear on the rendered site nor in the document contents. It is, however, possible to see the hidden <div> on the rendered site by adding ?showDraft=true to the webpage’s url. Note that the hidden <div> will not appear in the document’s table of contents until the document is properly published.
For more information about document statuses, see documentation on revision descriptions.

Encode Spaces Truthfully

It is important that encoders do not add extra spaces inside TEI tags. Note the extra space at the beginning of the <ref> tag in the following example:
the Hall of<ref target="mol:STHE1"> St. Helens Priory</ref>,
This line of XML code claims that the name of the priory begins with a space. It does not, of course, any more than it includes the trailing comma. Furthermore, should this code be uploaded to the site, it would output a hyperlink that begins with a space.

Encode Editorial Notes

We defined editorial notes as notes written by MoEML authors, editors and contributors. These are encoded using the <note> element, with @type="editorial". They will be rendered as clickable footnote numbers in the text which cause a popup to appear containing the note; the notes themselves are also rendered as a numbered list at the foot of the document. Use the @resp attribute to assign responsibility for the note using the person’s @xml:id. Make sure the person’s entry in the personography has an <abbr> element inside <persName> containing their initials; these initials will then be appended to the note.1 For example,
<p><gap reason="editorial"/> an ingenious Say-Maister,<note type="editorial" resp="mol:JENS1">I.e., assay-master.</note> with his Furnaces <gap reason="editorial"/></p>
Notes and marginal fragments that form part of the original text of a primary source document are encoded slightly differently. In many cases, they are not in fact notes at all but marginal labels that serve as finding aids for the reader. See Use the <rendition> Element and @rendition Attribute to learn how to encode marginal notes.

Encode Words and Phrases in a Language other than English

Mark foreign language strings with the element <foreign> and add the attribute @xml:lang="XX(X)" where XX(X) is the two- or three-character code for the language. Note that the content of the <foreign> tag must only contain a text string without mark-up (e.g., no <p>, <title>, or other tags).
MoEML follows the Internet Engineering Task Force guidelines, whose Language Subtag Registry is constructed based on the recommendations in BCP 47. In most cases, this means that where the ISO Standard 639-1 provides a two-letter language code, that code is used, but in the absence of a two-letter code, a three-letter code is chosen from ISO 639-2 (this conforms to the current practice outlined in the TEI Guidelines).
For example,
<p>In the Gréeke a Cittie is tearmed <foreign xml:lang="el">ϖόλις</foreign>.</p>
<p>CIties and well peopled places bee called <foreign xml:lang="la">Oppida</foreign>, in Latine <gap reason="editorial"/></p>
The following language codes occur frequently in MoEML’s early modern texts:
  • Old English or Anglo-Saxon (ca. 450–1100): ang
  • Latin: la
  • Ancient Greek (–1453): grc
  • Modern Greek (1453–): el
  • Middle English (1100–1500): enm
  • French: fr
  • Middle French (ca. 1400–1600): frm
  • Italian: it
  • Spanish: es

Encode Special Characters

Some unicode characters that are integral to XML code cannot be used in a text string. There are only four such characters (i.e., &, <, >, and ") we use in our project. In order to use these special characters in a text string, you must declare them using specific codes as outlined in the following table. Note that these characters are prohibited by the MoEML Style Guide and therefore should only be used in primary source transcriptions or when otherwise absolutely necessary. Double quotation marks (") are rendered outwards using a variety of elements (<title> (with @level="a"), <soCalled>, <quote>) and thus should never be used explicitly, other than for demonstration purposes or primary-source transcriptions. The following table shows the proper encoding of these characters.
Character Symbol Code Example
Ampersand & &amp;
<p>Janelle Jenstad, Kim McLean-Fianer, &amp; Martin Holmes are MoEML’s project directors.</p>
Lesser-than Character < &lt;
<p>The cost of a bible in early modern London was &lt; twenty pennies.</p>
Greater-than Character > &gt;
<p>The cost of a bible in early modern London was &gt; five pennies.</p>
Straight Quotation Mark " &quot;
<p>Tye said <quote>this is how to encode straight quotation marks</quote>.</p>

Encode Non-standard Characters

The TEI Consortium defines non-standard characters as characters not represented in the published repertoire of available characters [in Unicode] (5. Non-standard Characters and Glyph). Therefore, before encoding a non-standard character, always check to ensure that the Unicode Consortium has not already published encoding standards for the character.
The set of practices used to encode a non-standard character may be divided into two parts:
  1. Adding a non-standard character metadata entry to the <teiHeader> of the document in which the non-standard character appears.
  2. Tagging a non-standard character in the <text> of the document and thereby linking the instance of the non-standard character to the character’s metadata entry in the <teiHeader>.

Declare Non-standard Characters in the <teiHeader>

To encode a non-standard character, nest a <charDecl> element within the <encodingDesc> element in the <teiHeader> of the document. Next, nest a <char> element with an @xml:id attribute inside the <charDecl> element. The value of the @xml:id attribute should begin with the document’s @xml:id followed by an underscore (_) and a simplified representation of the character being encoded. For example:
<!-- <teiHeader> --> <encodingDesc>
  <charDecl>
    <char xml:id="DIXI2_ye">
      <!-- [Descriptive content goes here. See below.] -->
    </char>
  </charDecl>
</encodingDesc>
Each non-standard character in the document should correspond with an individual <char> element; if there are five non-standard characters in the document, there should be five individual <char> elements inside the <charDecl> element.
Within the <char> element, nest the following three elements:
  • <desc>
  • <localProp>
  • <mapping>
  1. Use the <localProp> element to tag the name of the character, borrowing form and terminology from the Unicode character database. For example:
    <char>
      <localProp name="name" value="LATIN SMALL LETTER Y WITH REVERSED HOOK ABOVE"></localProp>
    </char>
  2. Use the <desc> element to tag an extended description of the character. Your description should include the history of the form, variant forms of the glyph, and its relationship with similar typographical features or characters. For example:
    <char>
      <desc>An abbreviated form of <mentioned>the</mentioned>. This character takes the form of a small latin letter y with a reversed hook above. The closest Unicode character we have to represent this is a small latin letter y with a combining left half ring above. This character appears only twice in the text, which is in black letter gothic.</desc>
    </char>
    Note that, because there is very little published scholarship on early modern non-standard characters and glyphs, you should consult with the Project Director before writing an extended description of the character.
  3. The <localProp> (with @name="name" element encodes the non-standard character’s name (i.e., what contemporary typographers call it). Then use another <localProp> element with @name="entity" to provide the entity value. For example:
    <char>
      <localProp name="entity" value="yesup"></localProp>
    </char>
  4. Use <mapping> elements to tag and label the various forms in which the non-standard character may appear / has appeared. Each <mapping> element should have a corresponding @type attribute with one of the following values:
    Value Explanation
    "standard" the character as it appears in the document being encoded
    "simplified" the simplified form of the standard character, without accents or ornamentation
    "medieval" the medieval equivalent of the standard character
    "modern" the modern equivalent of the standard character
    The following series of <mapping> elements serves as an example:
    <char>
      <mapping type="standard"></mapping>
      <mapping type="simplified">ye</mapping>
      <mapping type="medieval">þe</mapping>
      <mapping type="modern">the</mapping>
    </char>
Combined, the code for a non-standard character (<char>) entry looks like this:
<encodingDesc>
  <charDecl>
    <char xml:id="DIXI2_ye">
      <localProp name="name" value="LATIN SMALL LETTER Y WITH REVERSED HOOK ABOVE"></localProp>
      <desc>An abbreviated form of <mentioned>the</mentioned>. This character takes the form of a small latin letter y with a reversed hook above. The closest Unicode character we have to represent this is a small latin letter y with a combining left half ring above. This character appears only twice in the text, which is in black letter gothic.</desc>
      <localProp name="entity" value="yesup"></localProp>
      <mapping type="standard"></mapping>
      <mapping type="simplified">ye</mapping>
      <mapping type="medieval">þe</mapping>
      <mapping type="modern">the</mapping>
    </char>
  </charDecl>
</encodingDesc>

Tag Non-standard Characters in the <text>

Use the <g> element to tag the non-standard character in the document text. Add a @ref attribute to the <g> element pointing to the @xml:id of the character, as defined by the <char> element in the <teiHeader>. For example:
<g ref="#DIXI2_ye"></g>
In some cases, a non-standard character functions as an abbreviation (e.g., characters inolving a breve [˘]). Markup such instances using the <g> element as described above, yet also include the <choice> and <abbr> elements per the instructions for encoding abbreviations. For example:
<choice><abbr>Lond<g ref="#DIXI2_breve">ŏ</g></abbr><expan>London</expan></choice>

Encode Roman Numerals

To tag a roman numeral, use the <num> element with a @type value of "roman" and a @value attribute pointing to the arabic equivalent of the tagged roman numeral. For example:
Henry <num type="roman" value="8">VIII</num>

Encode a Table

A table may be nested inside most elements in a born-digital document. To encode a table, use the <table> element with the @rows and @cols attributes. The value of the @rows attribute specifies how many rows are in the table you are encoding. Likewise, the value of the @cols attribute specifies how many columns are in the table you are encoding. For example, a table with five rows and two columns would be encoded thus:
<table rows="5" cols="2">
  <!-- Rows go here. -->
</table>
Next, nest <row> elements inside the <table> element. The number of <row> elements should correspond with the number of rows in your table (specified by the @row attribute attached to the <table> element). Each <row> element should have a @role attribute with a value of either "label" or "data". Use the @role value of "label" to indicate that a row functions as a header (i.e., that its contents do not function as data but rather as descriptive labels of the data in other rows). Normally, the first row of a table will function as a header, so the first <row> element nested in a <table> element will have a @role value of "label". For example, a table with five rows, the first of which is a header, and two columns would be encoded thus:
<table rows="5" cols="2">
  <row role="label"><!-- Cell entry. --> </row>
  <row role="data"><!-- Cell entry. --> </row>
  <row role="data"><!-- Cell entry. --> </row>
  <row role="data"><!-- Cell entry. --> </row>
  <row role="data"><!-- Cell entry. --> </row>
</table>
Finally, nest <cell> elements inside each <row> element. The number of cell elements should correspond with the number of columns in the table (specified by the @cols attribute attached to the <table> element). Therefore, if a table has two columns, there should be two <cell> elements inside each <row> element. Like the <row> element, each <cell> element must also have a @role attribute with a value of either "data" or "label". Generally speaking, the @role value for a <cell> element should always match the @role value of its parent element. For example, a table with five rows, the first of which is a header, and two columns would be further encoded thus:
<table rows="5" cols="2">
  <row role="label">
    <cell role="label"><!-- Cell entry. --> </cell>
    <cell role="label"><!-- Cell entry. --> </cell>
  </row>
  <row role="data">
    <cell role="data"><!-- Cell entry. --> </cell>
    <cell role="data"><!-- Cell entry. --> </cell>
  </row>
  <row role="data">
    <cell role="data"><!-- Cell entry. --> </cell>
    <cell role="data"><!-- Cell entry. --> </cell>
  </row>
  <row role="data">
    <cell role="data"><!-- Cell entry. --> </cell>
    <cell role="data"><!-- Cell entry. --> </cell>
  </row>
  <row role="data">
    <cell role="data"><!-- Cell entry. --> </cell>
    <cell role="data"><!-- Cell entry. --> </cell>
  </row>
</table>
Insert text content inside each <cell> element. You may markup the text content of each <cell> using most xml tags, such as <ref> and <name>. The text content will render in table form in accordance with the code structure inside the <table> element. You may also nest a <head> element above the first <row> element. Use the <head> element to tag a text string that functions as a title or other description for your table. Consider the following table:
Label A Label B
Data Point 1A Data Point 1B
Data Point 2A Data Point 2B
Data Point 3A Data Point 3B
Data Point 3A Data Point 3B
Example Table
This table has been encoded thus:
<table rows="5" cols="2">
  <head>Example Table</head>
  <row role="label">
    <cell role="label">Label A</cell>
    <cell role="label">Label B</cell>
  </row>
  <row role="data">
    <cell role="data">Data Point 1A</cell>
    <cell role="data">Data Point 1B</cell>
  </row>
  <row role="data">
    <cell role="data">Data Point 2A</cell>
    <cell role="data">Data Point 2B</cell>
  </row>
  <row role="data">
    <cell role="data">Data Point 3A</cell>
    <cell role="data">Data Point 3B</cell>
  </row>
  <row role="data">
    <cell role="data">Data Point 3A</cell>
    <cell role="data">Data Point 3B</cell>
  </row>
</table>
Note that MoEML’s style sheet does not support more complex tables at this time (e.g., tables with vertical header labels or tables with vertical and horizontal header labels). In most instances, you should be able to display data in a simple table form with a single row of header labels. If you must display data in a more complex table form, consult with the project’s lead programmer.

Use Split Tags to Represent Overlapping Hierarchies

Occasionally, it may be necessary to split an interrupted referring string into two or more tags. For example, it is possible for a page break, along with running headers or footers, to interrupt a <name> tag in a transcribed text, as follows (encoding simplified here from TRIU2.xml):
<gap reason="editorial"/> Margaret, eldeſt daughter to king <name ref="mol:HENR5" xml:id="TRIU2_HENR5_1" next="#TRIU2_HENR5_2">Henrie </name><lb/>
<fw type="catchword" style="text-align: right;">the</fw>
<pb/>
<fw type="header">re-vnited Britannia</fw>
<name style="font-style: italic;" ref="mol:HENR5" prev="#TRIU2_HENR5_1" xml:id="TRIU2_HENR5_2">the ſeauenth</name>, to Iames the fourth king of Scotland <gap reason="editorial"/>
Here, the name Henrie the ſeauenth is divided by the formwork and page break encoding. To include the formwork and page break information as part of the <name> tag would be untruthful, so the tag must be split. However, to tag Henrie and the ſeauenth as two <name> tags would be equally misleading since it erroneously suggests that each name part is actually a separate mention. The solution shown above is to assign each <name> tag an @xml:id in order to link them with the @next and @prev attributes, which indicate that the two separate tags are related.
In each <name> tag, add an @xml:id attribute with a value that follows the following formula:
[xml:id of document]_[xml:id of person mentioned]_[lowest possible unique integer (i.e., unique to the document)].
Then, add a @next attribute to the first <name> with a value that uses a pound sign (#) to point to the @xml:id of the second (next) <name> element. Finally, add a @prev attribute to the second <name> element with a value that points to the first (previous) <name> element. These attributes link the two separate tags as one.
When the split occurs at a genuine space in the referring string (as in the above example), include the space in the tag (either at the end of the tag before the split or the beginning of the tag after the split). Note that this is an exception to the ordinary rule that <ref>, <name>, and <hi> elements do not end or begin with spaces. That the spaces are included in the case of split tags is especially important when tagging toponyms with <ref> because without the manually entered space, the processor has no way of knowing that the toponym identified by the <ref> includes a space at the split and therefore generates an erroneous variant toponym with no space in our gazetteer, such as LondonBridge for London Bridge.
Note that there is no reason to use split tags when a reference occurs across a line break, because a single <ref> tag can contain one or more self-closing <lb> element(s).

Index Praxis Documentation

When adding new documentation to praxis, always encode a list of index terms associated with the new documentation. To do this, insert an <index> element below the heading (<head>) for each new <div>. Add an @indexName attribute with a value of "documentation_manual" to the <index> element. Nest a series of terms tagged with the <term> element inside the <index> element. For example,
<div xml:id="new_praxis_documentation">
  <head>New Praxis Documentation</head>
  <index indexName="documentation_manual">
    <term>Term 1</term>
    <term>Term 2</term>
    <term>Term 3</term>
    <term>Term 4</term>
  </index>
  <p>Documentation text.</p>
</div>
Your list of index terms should be consistent with terms already used in the index, although it will likely be necessary to use new terms as well. All new terms should be lowercase and plural.
See Applications for Encoders for information on using and regenerating the index file.

Tag an Interesting Snippet

MoEML’s v.6 website now displays interesting snippets on its homepage. Interesting snippets are short one- or two-sentence passages from MoEML library texts or encyclopedia articles that are in some way provocative, compelling, or humorous. Should you come across such a passage in your work as a MoEML encoder, we encourage you to tag it using the <seg> element with a @type value of "interestingSnippet" and a unique @xml:id. The following interesting snippet from The Shoemaker’s Holiday by Thomas Dekker (SHOE1.xml) serves as an example:
<seg type="interestingSnippet" xml:id="SHOE2_argument">The argument of the play I will set down in this epistle: Sir Hugh Lacy, Earl of Lincoln, had a young gentleman of his own name, his near kinsman, that loved the Lord Mayor’s daughter of London;</seg>
Note that the @xml:id should be the document’s @xml:id followed by an underscore (_) and a unique descriptor. Moreover, the text string inside the <seg> tag must be under 400 characters and be contained by a single block-level element such as a <p>.

Add the MoEML Decorative Daisy as a Block Element

It is possible to add the MoEML decorative daisy as a block element in between paragraphs as follows:
<figure type="decorativeFlower"></figure>
Note that we should be judicious in our use of the decorative daisy (i.e., only use it in born-digital, front-end pages).

More Encoding Practices

MoEML also offers further encoding information specific to dates, primary source transcriptions, and mayoral shows.

Notes

  1. This is an example note written by Martin Holmes (HOLM3, initials MDH). (MDH)

References

Cite this page

MLA citation

Landels-Gruenewald, Tye, Martin D. Holmes, and Cameron Butt. General Encoding Practices. The Map of Early Modern London, Edition 7.0, edited by Janelle Jenstad, U of Victoria, 05 May 2022, mapoflondon.uvic.ca/edition/7.0/encoding_practices.htm.

Chicago citation

Landels-Gruenewald, Tye, Martin D. Holmes, and Cameron Butt. General Encoding Practices. The Map of Early Modern London, Edition 7.0. Ed. Janelle Jenstad. Victoria: University of Victoria. Accessed May 05, 2022. mapoflondon.uvic.ca/edition/7.0/encoding_practices.htm.

APA citation

Landels-Gruenewald, T., Holmes, M. D., & Butt, C. 2022. General Encoding Practices. In J. Jenstad (Ed), The Map of Early Modern London (Edition 7.0). Victoria: University of Victoria. Retrieved from https://mapoflondon.uvic.ca/editions/7.0/encoding_practices.htm.

RIS file (for RefMan, RefWorks, EndNote etc.)

Provider: University of Victoria
Database: The Map of Early Modern London
Content: text/plain; charset="utf-8"

TY  - ELEC
A1  - Landels-Gruenewald, Tye
A1  - Holmes, Martin
A1  - Butt, Cameron
ED  - Jenstad, Janelle
T1  - General Encoding Practices
T2  - The Map of Early Modern London
ET  - 7.0
PY  - 2022
DA  - 2022/05/05
CY  - Victoria
PB  - University of Victoria
LA  - English
UR  - https://mapoflondon.uvic.ca/edition/7.0/encoding_practices.htm
UR  - https://mapoflondon.uvic.ca/edition/7.0/xml/standalone/encoding_practices.xml
ER  - 

TEI citation

<bibl type="mla"><author><name ref="#LAND2"><surname>Landels-Gruenewald</surname>, <forename>Tye</forename></name></author>, <author><name ref="#HOLM3"><forename>Martin</forename> <forename>D.</forename> <surname>Holmes</surname></name></author>, and <author><name ref="#BUTT1"><forename>Cameron</forename> <surname>Butt</surname></name></author>. <title level="a">General Encoding Practices</title>. <title level="m">The Map of Early Modern London</title>, Edition <edition>7.0</edition>, edited by <editor><name ref="#JENS1"><forename>Janelle</forename> <surname>Jenstad</surname></name></editor>, <publisher>U of Victoria</publisher>, <date when="2022-05-05">05 May 2022</date>, <ref target="https://mapoflondon.uvic.ca/edition/7.0/encoding_practices.htm">mapoflondon.uvic.ca/edition/7.0/encoding_practices.htm</ref>.</bibl>

Personography