Category: Formal Documents


Permalink 08:46:53 am, by David, 723 words, 559 views   English (CA)
Categories: Formal Documents; Mins. worked: 0

An analogy, to dissipate misconceptions

Imagine a cargo ship, of the container variety. It is a big boat with an empty space in the middle; that empty space can be filled with containers. The shipping port has facilities to handle containers and load them onto the ship. The ship has facilities to secure the containers, track who owns them, where they are, and where they are going. As long as the containers match some basic specifications - shape, size, weight - only the shippers and receivers of the container care, or possibly even know, what is in it. The shipper probably packed the container in some way that made sense to him based on his knowledge of the product, so that it will arrive intact; the receiver expects to be able to unpack the container and put the contents to use. While that is the primary concern of the shipper and receiver, neither the port or the ship cares about any of that; they just move containers around, and maintain only enough information to ensure the containers get to their intended destination on time and undamaged. The shipper who packed the container doesn't care how it gets to its destination, only that it gets there on time and undamaged. The receiver doesn't care how it got to him, only that it arrives on time and undamaged. The content of a container is only of interest to the shipper and the receiver, and their interest in it is only linked by the nature of the content, not by how it was moved. If the shipper thinks it is necessary that the cargo ship be painted a certain color and have exactly 37 crew members and that the crew be experts in the manufacture and use of the shipper's product, every shipper will need to build and crew his own ships.

Imagine now the good ship SS Scraps. Its port is the Scraps Administration Program, which has facilities to load "containers" into its database cargo hold. The containers have specific contents, but the Scraps Administration Program does not care about the contents, as long as they are packaged correctly. The "shipper" - the person digitizing and marking up some artifact - knows that the contents are an image file and a block of XML-formatted metadata, and the structure and meaning of that data are of great importance to him. The only data Scraps needs to know the meaning of is the data that lets it track and organize the "containers", and this data is of no interest to the "shipper" or the "receiver", who can just assume it is being taken care of. Scraps neither knows nor cares about the internal structure or inherent meaning of the data; as long as it is packaged in a properly-formatted "container", Scraps will store it and deliver it to the "receiver" intact. What the "receiver" - usually a browser-based Web application - does with the data is of no interest to Scraps. Scraps (the Scraps "engine") does provide tools that the browser-based viewer can use to unload the container and present the contents to a user, but even in that role Scraps does not know or care about the data itself - the internal structure of the metadata (Dublin Core, METS, or anything else), what the image is a digital representation of, what any of it means to the creator or the viewer - none of this matters to Scraps. How SS Scraps stores the "containers", tracks them, and delivers them to the viewer should also be of no importance to the "shipper" or the "receiver" as long as the container's content (the meaning and context of the data) is delivered intact. If the "shipper" wants Scraps to be specifically tailored to his product, there will need to be one version of Scraps for every digitization project, which, while possible, is not practical.

This is why Scraps is being designed to be utterly indifferent to the nature of the data it is working with: the creators of the data can focus on marking up and organizing the data in a way that is meaningful for any particular artifact without needing to concern themselves with how the data is stored and maintained, and Scraps can focus on storing and maintaining the data without needing to know anything about the peculiarities of any particular artifact. Thus is order maintained to the benefit of all.


Permalink 09:19:30 am, by David, 623 words, 1208 views   English (CA)
Categories: Formal Documents; Mins. worked: 0

Some basic concepts

Since there will be nothing much to see on this project for a couple of weeks, I present some basic concepts for your entertainment.


Scraps works with hierarchical data. This does not necessarily mean that the data is intrinsically hierarchical, or that only intrinsically hierarchical data can be used. The hierarchy is imposed on the data by Scraps to allow artifacts to be marked up and displayed at increasing levels of detail.

The top two levels of the hierarchy are fixed; the remainder of the levels are dynamic. The hierarchy can go to any depth, though in practise an excessive depth would become unmanageable.

The top level is the Collection; the second level is the Artifact. A Collection is a set of one or more Artifacts that normally have some relationship to each other. There can be many Collections, though each is handled separately by Scraps; Scraps does not manage anything higher than the Collection (no collection of Collections). An Artifact encompasses all of the data related to the digitized form of a single, usually cohesive, usually physical object, such as a scrapbook, a photograph album, a newspaper, a manuscript - any object that can be digitized. While an Artifact can be a simple object like a single photograph, it is more likely to be a complex object with multiple pages or parts, each of which has been marked up and has its own metadata. A good example of such an object is a scrapbook, which has multiple pages with multiple items on each page, and possibly multiple parts to some of the items.

An Artifact is a set of Objects, where an Object is any part of the Artifact that has its own metadata and can be treated separately. An Artifact can have an unlimited number of Objects, and each of the Objects can have its own set of Objects, and those Objects can have Objects, ad infinitum. For example, a scrapbook has pages (Object level 1); the pages have pasted-in items - photographs, newspaper clippings, drawings, etc. (Object level 2); some of those items may have multiple parts - pages in a pasted-in booklet, for example (Object level 3); the booklet may have photographs (Object level 4); the photographs may have multiple identifiable persons (Object level 5).


An Object can have any number of sub-Objects, of any type. An Artifact can have any number of Objects. A Collection can have any number of Artifacts.


Metadata can be attached to Collections, Artifacts and Objects. Metadata is always stored as XML, and the XML must be both well formed and valid. Validation requires a Document Type Definition (DTD), which can be an existing structure (e.g. Dublin Core), built manually or created through the Scraps administration program. Metadata for all of the Objects in an Artifact is defined at the Artifact level. Each Artifact can have a different metadata structure (i.e. a different DTD), but every Object belonging to an Artifact will have the same metadata fields. Before a block of metadata is saved to the database, it will be validated; invalid XML will be rejected.


Each object has a set of actions that control what happens in the viewer when the object is clicked with the mouse, or the mouse cursor is over it, or several other user interactions occur. For example, when the mouse cursor is over an item on a scrapbook page, the rectangle around the item can light up, the metadata can be displayed, hidden objects can be revealed, a pop-up window can display information about the object, etc. Clicking on an object might display a larger image, or switch to displaying that object and its sub-objects. What action is taken is determined by how the object has been defined.


Permalink 12:22:02 pm, by sgerrity, 1391 words, 594 views   English (CA)
Categories: Formal Documents; Mins. worked: 90

Scraps Project

Scraps Project Charter

Project Leaders:
Research: Chris Petter and John Durno
Technical: David Badke

Other Participants:
Marnie Swanson, Ken Cooley, Scott Gerrity, Alouette Canada


  • Chris Petter: Project lead. Research and digital library objectives; providing quality images in digitized format; along with John Durno, implementing technology in the Library.
  • David Badke: Technical development of database, interface, utilities.
  • John Durno. Technology implementation in library.
  • Marnie Swanson: Funding approval.
  • Scott Gerrity: HCMC contact, general management/supervision.
  • Ken Cooley: Prioritization of projects within library. Funding approval.
  • Dodd’s Scrapbook Project: The “Scraps” software incorporates some code from other open source projects, but most of it is new code, unique to the project. The “Scraps” scrapbook software needs development for Dodds and similar scrapbooks but also, hopefully, for marking up individual articles in historic newspapers like the WW I Canadian Scottish Newspaper.


3 month term funding for David Badke’s salary. Dates: January – March, 2007. David will remain HCMC employee on term PEA appointment. Library will transfer funds to HCMC account to pay for David’s salary.


The general purpose of the project is to create a collaborative virtual environment (tools) in which students and library staff can do mark-up, with approved metadata standards, of scrapbooks, and the Canadian Scottish WW I regimental newspaper, The Brazier.

More specifically, this project will equip cataloguers with tools to create granular, item level metadata of various collections of graphic and text materials. The scrapbook prototype and Image Mark Up tools are the primary technologies for the project. Data sets created during the project will be archived in a database and grown in future iterations of the project.

Both the educational process and project outcomes are important to the project. With these in mind, the tool features sets are being developed with an eye towards user-friendly GUI, version and permissions control, and easy administration of utilities to suit a collaborative project environment. The outcomes will be housed in a project repository on the TAPoR servers.

The McPherson library has funded the creation of the tool based on its potential for crossover application. The library has various books, scrapbooks, photo albums, archives, manuscripts and newspapers to which this technology could be applied.


July _Aug 2006: Work was scheduled for summer 2006 using the Image Mark-Up Tool for the Dodds Scrapbook which appears very promising at the first pass.


  • This development benefits the digital library research under question since few tools are available to provide access and display for complex multilayer items like the Dodds scrapbook.
  • This development will provide archivist, librarians and students with innovative tools for collecting and displaying data for later transmission to a National portal.
  • Project has high potential for application across the country: Small archives around the country could display and provide access for multi-layered digital files in this way especially scrapbooks and newspapers.
  • Project provides opportunity for collaborative networking and development between Library and Archives units interested in digital media.


In past 2 months, David Badke has undertaken substantial troubleshooting and de-bugging of the Dodd's site. Engine, Help Files and a better administrative utility need to be added.

In scope (see project plan below) for his remaining time on the project:

  • add a module to the prototype viewer to provide access to those items individual high resolution image scans.
  • Fully develop the administration module to provide editing capability for the metadata and other configuration and maintenance functions.
  • Add a program to provide page turning for multipage enclosures
  • Complete documentation including a workflow manual for the project.
  • Complete development, testing and debugging.
  • Develop export capability to OAI specifications

Out of scope

  • Enhancements to allow for linking of newspaper articles that need to be linked from two or more pages.


  • Formal change orders will be posted for approval. Scope of project currently demands full 3 months of development. New features or changes to plan may require additional funding and time.
  • If project stalls or meets obstacles, HCMC Coordinator will address primary participants and work out a strategy for completing or ending the project.


  • Image Mark up and scrapbook tools are primary technologies for the project.
  • All images will be provided to the project in a digitized format.
  • HCMC, TAPoR and the Library reserve the right to re-use the Scraps software technologies and code in other UVic related software developments. Scraps is Open Source and re-distributed as such under the Mozilla licence.
  • David Badke will be working full time on this project throughout the 3 month period Jan. – Mar 2007.
  • David will initiate tasks and document activity within the HCMC Project Blog in the same manner as all HCMC staff. He will update his activities daily, and break down tasks based on the plan.
  • Project participants will use the blog for tracking and reviewing the project activity. Participants will be given access to make posts and add comments to David's posts, and should specify types of entries David should post if additional reporting is required.
  • Project wrap up will be posted under formal documents in this blog where all participants can view and comment on it.


  • Use for newspapers has not yet been determined. It may be that complex indexing of newpapers is not feasible. Moderate.

Development Plan

There are three major parts to the scrapbook software: Engine, Administration, User Interface. Each of these will need full documentation.

The Dodds scrapbook images and metadata will be used as a test case and first application for the Scraps software.

Conceptual Diagram


The Engine is the code that interacts with the database and artifact images and renders “components” in the browser. It is partly PHP code and partly Javascript code, and uses AJAX techniques.

  • The Engine will have a set of “published” functions that can be called by the Administration program and the User Interface to interact with the data and render web pages – that is, the Engine is an Application Program Interface (API).
  • The Engine will have “components” that can be plugged into a user interface to render web page fragments (e.g. a component to display the artifact page images, a search component, a metadata display component).
  • The Engine will handle the hierarchical data transparently, allowing drill down from pages to items to subitems to any level.
  • The Engine should be able to work with more the just PostgreSQL databases (e.g. MySQL, Exist) by segregating database functions in a separate, replaceable module.
  • A single installation of the Engine should be callable by multiple applications.


The user needs to be able to configure the application, load data from Image Markup Tool files (possibly other files), maintain records, and perform other administrative tasks. The prototype administration application has facilities to load and maintain data and to produce reports. The following features are missing or incomplete:

  • The program will work with “projects” that encapsulate one complete artifact, to allow a single installation of the application to manage multiple projects. (This is similar to what IMaP does; some IMaP code can be probably be used for this.)
  • Functions will be created to allow the user to configure the scrapbook application. This includes the definition of metadata structure and database fields; set up of file locations, image types, and options; possibly assembly of user interface.
  • The existing program can only load Dublin Core metadata from Image Markup Tool files. It will be able to handle metadata in any valid XML format.
  • Functions to establish links between artifact pages and items will be created.
  • An export function to create OAI-compatible output will be added.
  • The program will use functions in the Engine wherever possible.

User Interface

The existing user interface is a rough prototype, suitable only as a proof of concept. The interface will be built from “plug in” components rendered by the Engine, so that the user can assemble an interface from standard parts.

  • The interface is the user's responsibility; there will be no single fixed interface. However, there will be at least one fully functional sample interface, and more would be better.
  • It will be possible for the user to modify a sample interface or create an entirely new one.
  • The user interface will call functions in the Engine API to render data on screen. It will not access any data directly.
  • Page layout will be controlled by CSS code.

Order of development

  1. Engine: 80% of code (6 weeks)
  2. Administration and User interface: (4 weeks)
  3. Documentation (2 weeks)
  4. User Approval and Sign off


The Scraps project will produce a web-based system to mark up and display multi-level digitized artifacts, such as scrapbooks, albums, etc. The Image Markup Tool is used to mark up the digitized images. The Scraps administration program uses the IMT files and other user-supplied data to create a hierarchical structure that is displayed by the Scraps viewer. Users can drill down through the layers of the hierarchy to view embedded objects.



August 2014
Sun Mon Tue Wed Thu Fri Sat
 << <   > >>
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30

XML Feeds