Imagine a cargo ship, of the container variety. It is a big boat with an empty space in the middle; that empty space can be filled with containers. The shipping port has facilities to handle containers and load them onto the ship. The ship has facilities to secure the containers, track who owns them, where they are, and where they are going. As long as the containers match some basic specifications - shape, size, weight - only the shippers and receivers of the container care, or possibly even know, what is in it. The shipper probably packed the container in some way that made sense to him based on his knowledge of the product, so that it will arrive intact; the receiver expects to be able to unpack the container and put the contents to use. While that is the primary concern of the shipper and receiver, neither the port or the ship cares about any of that; they just move containers around, and maintain only enough information to ensure the containers get to their intended destination on time and undamaged. The shipper who packed the container doesn't care how it gets to its destination, only that it gets there on time and undamaged. The receiver doesn't care how it got to him, only that it arrives on time and undamaged. The content of a container is only of interest to the shipper and the receiver, and their interest in it is only linked by the nature of the content, not by how it was moved. If the shipper thinks it is necessary that the cargo ship be painted a certain color and have exactly 37 crew members and that the crew be experts in the manufacture and use of the shipper's product, every shipper will need to build and crew his own ships.
Imagine now the good ship SS Scraps. Its port is the Scraps Administration Program, which has facilities to load "containers" into its database cargo hold. The containers have specific contents, but the Scraps Administration Program does not care about the contents, as long as they are packaged correctly. The "shipper" - the person digitizing and marking up some artifact - knows that the contents are an image file and a block of XML-formatted metadata, and the structure and meaning of that data are of great importance to him. The only data Scraps needs to know the meaning of is the data that lets it track and organize the "containers", and this data is of no interest to the "shipper" or the "receiver", who can just assume it is being taken care of. Scraps neither knows nor cares about the internal structure or inherent meaning of the data; as long as it is packaged in a properly-formatted "container", Scraps will store it and deliver it to the "receiver" intact. What the "receiver" - usually a browser-based Web application - does with the data is of no interest to Scraps. Scraps (the Scraps "engine") does provide tools that the browser-based viewer can use to unload the container and present the contents to a user, but even in that role Scraps does not know or care about the data itself - the internal structure of the metadata (Dublin Core, METS, or anything else), what the image is a digital representation of, what any of it means to the creator or the viewer - none of this matters to Scraps. How SS Scraps stores the "containers", tracks them, and delivers them to the viewer should also be of no importance to the "shipper" or the "receiver" as long as the container's content (the meaning and context of the data) is delivered intact. If the "shipper" wants Scraps to be specifically tailored to his product, there will need to be one version of Scraps for every digitization project, which, while possible, is not practical.
This is why Scraps is being designed to be utterly indifferent to the nature of the data it is working with: the creators of the data can focus on marking up and organizing the data in a way that is meaningful for any particular artifact without needing to concern themselves with how the data is stored and maintained, and Scraps can focus on storing and maintaining the data without needing to know anything about the peculiarities of any particular artifact. Thus is order maintained to the benefit of all.
Since there will be nothing much to see on this project for a couple of weeks, I present some basic concepts for your entertainment.
Scraps works with hierarchical data. This does not necessarily mean that the data is intrinsically hierarchical, or that only intrinsically hierarchical data can be used. The hierarchy is imposed on the data by Scraps to allow artifacts to be marked up and displayed at increasing levels of detail.
The top two levels of the hierarchy are fixed; the remainder of the levels are dynamic. The hierarchy can go to any depth, though in practise an excessive depth would become unmanageable.
The top level is the Collection; the second level is the Artifact. A Collection is a set of one or more Artifacts that normally have some relationship to each other. There can be many Collections, though each is handled separately by Scraps; Scraps does not manage anything higher than the Collection (no collection of Collections). An Artifact encompasses all of the data related to the digitized form of a single, usually cohesive, usually physical object, such as a scrapbook, a photograph album, a newspaper, a manuscript - any object that can be digitized. While an Artifact can be a simple object like a single photograph, it is more likely to be a complex object with multiple pages or parts, each of which has been marked up and has its own metadata. A good example of such an object is a scrapbook, which has multiple pages with multiple items on each page, and possibly multiple parts to some of the items.
An Artifact is a set of Objects, where an Object is any part of the Artifact that has its own metadata and can be treated separately. An Artifact can have an unlimited number of Objects, and each of the Objects can have its own set of Objects, and those Objects can have Objects, ad infinitum. For example, a scrapbook has pages (Object level 1); the pages have pasted-in items - photographs, newspaper clippings, drawings, etc. (Object level 2); some of those items may have multiple parts - pages in a pasted-in booklet, for example (Object level 3); the booklet may have photographs (Object level 4); the photographs may have multiple identifiable persons (Object level 5).

Metadata can be attached to Collections, Artifacts and Objects. Metadata is always stored as XML, and the XML must be both well formed and valid. Validation requires a Document Type Definition (DTD), which can be an existing structure (e.g. Dublin Core), built manually or created through the Scraps administration program. Metadata for all of the Objects in an Artifact is defined at the Artifact level. Each Artifact can have a different metadata structure (i.e. a different DTD), but every Object belonging to an Artifact will have the same metadata fields. Before a block of metadata is saved to the database, it will be validated; invalid XML will be rejected.
Each object has a set of actions that control what happens in the viewer when the object is clicked with the mouse, or the mouse cursor is over it, or several other user interactions occur. For example, when the mouse cursor is over an item on a scrapbook page, the rectangle around the item can light up, the metadata can be displayed, hidden objects can be revealed, a pop-up window can display information about the object, etc. Clicking on an object might display a larger image, or switch to displaying that object and its sub-objects. What action is taken is determined by how the object has been defined.
Project Leaders:
Research: Chris Petter and John Durno
Technical: David Badke
Other Participants:
Marnie Swanson, Ken Cooley, Scott Gerrity, Alouette Canada
3 month term funding for David Badke’s salary. Dates: January – March, 2007. David will remain HCMC employee on term PEA appointment. Library will transfer funds to HCMC account to pay for David’s salary.
The general purpose of the project is to create a collaborative virtual environment (tools) in which students and library staff can do mark-up, with approved metadata standards, of scrapbooks, and the Canadian Scottish WW I regimental newspaper, The Brazier.
More specifically, this project will equip cataloguers with tools to create granular, item level metadata of various collections of graphic and text materials. The scrapbook prototype and Image Mark Up tools are the primary technologies for the project. Data sets created during the project will be archived in a database and grown in future iterations of the project.
Both the educational process and project outcomes are important to the project. With these in mind, the tool features sets are being developed with an eye towards user-friendly GUI, version and permissions control, and easy administration of utilities to suit a collaborative project environment. The outcomes will be housed in a project repository on the TAPoR servers.
The McPherson library has funded the creation of the tool based on its potential for crossover application. The library has various books, scrapbooks, photo albums, archives, manuscripts and newspapers to which this technology could be applied.
July _Aug 2006: Work was scheduled for summer 2006 using the Image Mark-Up Tool for the Dodds Scrapbook which appears very promising at the first pass.
In past 2 months, David Badke has undertaken substantial troubleshooting and de-bugging of the Dodd's site. Engine, Help Files and a better administrative utility need to be added.
In scope (see project plan below) for his remaining time on the project:
There are three major parts to the scrapbook software: Engine, Administration, User Interface. Each of these will need full documentation.
The Dodds scrapbook images and metadata will be used as a test case and first application for the Scraps software.
The Engine is the code that interacts with the database and artifact images and renders “components” in the browser. It is partly PHP code and partly Javascript code, and uses AJAX techniques.
The user needs to be able to configure the application, load data from Image Markup Tool files (possibly other files), maintain records, and perform other administrative tasks. The prototype administration application has facilities to load and maintain data and to produce reports. The following features are missing or incomplete:
The existing user interface is a rough prototype, suitable only as a proof of concept. The interface will be built from “plug in” components rendered by the Engine, so that the user can assemble an interface from standard parts.
The Scraps project will produce a web-based system to mark up and display multi-level digitized artifacts, such as scrapbooks, albums, etc. The Image Markup Tool is used to mark up the digitized images. The Scraps administration program uses the IMT files and other user-supplied data to create a hierarchical structure that is displayed by the Scraps viewer. Users can drill down through the layers of the hierarchy to view embedded objects.
| Sun | Mon | Tue | Wed | Thu | Fri | Sat |
|---|---|---|---|---|---|---|
| << < | > >> | |||||
| 1 | ||||||
| 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| 9 | 10 | 11 | 12 | 13 | 14 | 15 |
| 16 | 17 | 18 | 19 | 20 | 21 | 22 |
| 23 | 24 | 25 | 26 | 27 | 28 | 29 |
| 30 | ||||||