In our XML encoding, identifiers are the xml:id attributes that appear on elements. Our objective in creating identifiers is to provide a project-wide unique id for every item in the collection that we need to find, process, link to or otherwise manipulate. In general, transcribers/encoders should not have to worry about identifiers, because most will be assigned by the programmer or generated through automated processes. However, our policies on identifiers are documented here.
The following section covers transcribed manuscripts. Other document and element types are covered in a subsequent section.
Each manuscript has a document id (<TEI>/xml:id) beginning with ms: ms59, ms60, msPotier1751 etc. These are intended to be short enough to be unique in the project and sufficient to identify an MS. Where the xml:id is a little longer than we would wish, the MS will also have a shorter version of its id for use in linking contexts; for cases such as ms59, this will be identical to the main id, but for cases where the id is longer, it will be a truncation, so msSagard becomes msSag, and msPotier1751 becomes msP51. The shorter id will be encoded in the n attribute on the manuscript's root <TEI> element.
Each entry in a document (i.e. each <entryFree> element), as well as each <form> element, will have a unique xml:id attribute, but these will be generated automatically after the document has been transcribed and its encoding is judged to be relatively stable. The identifiers for <entryFree> elements will be constructed as follows:
The ids for <form> elements will be constructed as follows:
Note that we use four-digit counters for entries and six-digit counters for forms based on our experience with the likely numbers of each in a manuscript file. This may change in future if longer or more complex manuscripts are encoded, but changing this is trivial.
Examples:
The fourth entry in msJCB:
ef_msJCB_0004⚓
The 22nd form in msJCB:
fef_msJCB_000022⚓
The advantages from using this system will be:
Note that we do not encode anything related to the structural location of an element in its id; the id does not tell us whether an <entryFree> is nested inside another <entryFree>, or which <entryFree> a <form> element appears in. This is not necessary because this information can be discovered instantly during any processing, and whenever we list out or render items based on their id, we can provide any relevant or required information about their context.
Once ids are assigned, they will (we hope) never be modified. However, it may be necessary to interpolate new ids if it is discovered (for instance) that a specific <form> or <entryFree> element should actually be split into multiple elements, or a section of text was missed during transcription. These are the protocols for interpolating ids.
In cases where elements need to be deleted for some reason, they may just be deleted, leaving an id in the sequence unused.