Refactoring of longer documents: issues to deal with
Posted by mholmes on 17 Oct 2011 in Activity log
I've now worked through the Henslowe Diary document, which is my pilot for the restructuring of longer documents, and there are a couple of things to report, good and not so good:
- Rendering the whole document as a single page with an auto-generated table-of-contents works fine. I've written some XSLT that does a reasonable job of handling separate
<text>
s in a<group>
, along with nested<div>
s down two more levels. - There is some very odd linking going on inside the document. Here's an example:
Henslowe's Diary is a <ref type="internal" target="render_page.php?id=HENS2&n=16&searchterm=manuscript">manuscript</ref> written by...
[In these examples I'm showing the original code from the live site, rather than the updated and slightly simplified code in my current version of the documents.]
You can find this as the first link on this page:
http://mapoflondon.uvic.ca/render_page.php?id=HENS2&n=4 This link points to a subsection of the document (The Structure of Henslowe's Diary), and then runs a search upon it, so that the word "manuscript" is highlighted. In this subsection, in this case, the word "manuscript" occurs only once, and not in a context which particularly illuminates the use of the word "manuscript" in the original location. In fact, if you click on one of these links, what happens is that you lose your place -- which would be OK if you were being taken to a specific explanation for the word you clicked on, but that's not what happens. I think the intention was to link to the whole subsection titled "The Structure of Henslowe's Diary", which is a detailed description of the manuscript.
In many cases, links like these (links that run searches instead of actually linking to something specific) are interspersed with useful links to real information, so that the same word or name is in some places linked to a vague search, and in others to a personography entry or a bibliography item. For instance, we see this:<ref type="internal" target="render_page.php?id=HENS2&n=6&searchterm=Greg">W.W. Greg</ref>
which links the name "W.W. Greg" to a search for "Greg" in one part of the document, with this:<ref type="bibl" target="GREG1">Greg, <name type="book_title">Diary</name> 2.80-103</ref>
which actually points to an entry in the bibliography: Greg, Walter W., ed. Henslowe's Diary. By Philip Henslowe. 2 vols. London: A.H. Bullen, 1904.
I think these "link-to-a-search" tags are confusing and unhelpful, for the following reasons:- You lose your place when clicking on them, without gaining anything in the way of useful information.
- In many cases, it's not clear whether the intention is to illuminate a person's name, or a book by them, or perhaps both.
- Any reader who wants to search for something in the current document can already do it using Control + F. (This is especially so now that we're rendering the whole document as a single page, rather than a series of segments.)
- It's hard for the reader to distinguish these link-to-search links from regular, useful links, because they look the same; after a couple of bad experiences clicking on them, they may become reluctant to click on any link, and no longer benefit from "real" links that point to specific information.
- Names of people should be linked to entries in the personography. Where these don't yet exist, they should be created.
- References to bibliographical items should be linked to entries in the bibliography. Again, where these don't yet exist, they should be created.
- References to specific topics covered elsewhere in the document (or in another document) should be linked to the specific target location in the document. This is done by creating an xml:id attribute on a
<div>
or<p element>
, and then pointing to that.
- Bibliographical references are handled rather inconsistently. For instance, links like this occur throughout the Henslowe document:
<ref type="bibl" target="GREG1">Greg, <name type="book_title">Diary</name> 2.220</ref>
This correctly points at an entry in the central bibliography, BIBL1.xml, and is converted into a link to that document in the central bibliography. However, the last section of the Henslowe document, "Works Cited and Consulted", contains a completely separate copy of the same bibliographical reference (although in this case marked up erroneously using<lg>
and<l>
tags, for some reason). In this document-specific bibliography, there are also items which do not occur in the main central bibliography, such as this one:Baines, Paul. “Ireland, Samuel (d. 1800).” Oxford Dictionary of National Biography. Ed. H.C.G. Matthew and Brian Harrison. Oxford: Oxford UP, 2004. 15 October 2006 http://www.oxforddnb.com/view/article/14449.
I'd like to make the following changes, along the lines of what I've done in the Guidelines document, as discussed around September 27:
- All bibliographical items should reside only in the main bibliography.
- All other documents that require structured bibliographies should build them using this kind of formulation:
<bibl type="replace"><ref type="bibl" target="CHAM2"/></bibl>
so that in-document bibliographies consist only of lists of these elements, with no actual data. - Documents which merely need to include items in their bibliographies which are not explicitly linked from inside the text can do so by simply including empty links in the text like this:
<ref type="bibl" target="mol:CORR1"/>