<?xml version="1.0" encoding="UTF-8"?>
<TEI.2 id="paper_102_da_sylva">
   <teiHeader>
      <fileDesc>
         <titleStmt>
            <title>Using Ancillary Text to Index Web-Based Multimedia Objects</title>
            <author>
               <name reg="Da Sylva, Lyne">Lyne Da Sylva</name>
            </author>
            <author>
               <name reg="Turner, James">James Turner</name>
            </author>
            <respStmt>
               <resp>Marked up by </resp>
               <name reg="Holmes, Martin">Martin Holmes</name>
               <lb/>
               <name reg="Baer, Patricia">Patricia Baer</name>
            </respStmt>
         </titleStmt>
         <publicationStmt>
            <p>Marked up to be included in the ACH/ALLC 2005 Conference Abstracts book.</p>
         </publicationStmt>
         <sourceDesc>
            <p>None</p>
         </sourceDesc>
      </fileDesc>
      <profileDesc>
         <textClass>
            <classCode>paper</classCode>
            <keywords>
               <list>
                  <item>image retrieval</item>
                  <item>indexing multimedia objects</item>
                  <item>multilingual metadata</item>
               </list>
            </keywords>
         </textClass>
      </profileDesc>
      <revisionDesc>
         <list>
            <item>MDH: Created from John Bradley's XML <date value="2005-03">March 2005</date>
            </item>
            <item>MDH: Proofed by Ray Siemens <date value="2005-04-03">3 April 2005</date>
            </item>
         </list>
      </revisionDesc>
   </teiHeader>
   <text>
      <front>
         <docTitle n="Using Ancillary Text to Index Web-Based Multimedia Objects">
            <titlePart>Using Ancillary Text to Index Web-Based Multimedia Objects</titlePart>
         </docTitle>
         <docAuthor>
            <name reg="Da Sylva, Lyne">Lyne Da Sylva</name>
            <address>
               <addrLine>lyne.da.sylva@umontreal.ca</addrLine>
            </address>
         </docAuthor>
         <titlePart type="affil">EBSI, Université de Montréal</titlePart>
         <docAuthor>
            <name reg="Turner, James">James Turner</name>
            <address>
               <addrLine>james.turner@umontreal.ca</addrLine>
            </address>
         </docAuthor>
         <titlePart type="affil">EBSI, Université de Montréal</titlePart>
      </front>
      <body>
         <div0>
            <p>
               <title level="m">PériCulture</title>  is the name of a research project at the Université de Montréal which is part of a larger project based at the Université de Sherbrooke. The parent project aimed to form a research network for managing Canadian digital cultural content. The project was financed by Canadian Heritage and was conducted during the fiscal year 2003-2004. <title level="m">PériCulture</title> takes its name from <hi rend="foreign">péritexte</hi> and culture, <hi rend="foreign">péritexte</hi> being one of a number of terms used (in French, our working language) to mean ancillary text associated with images and sound. It is a sister project to <title level="m">DigiCulture</title>, another part of the same larger research project which studied user behaviours in interactions with Canadian digital cultural content. The general research objective of <title level="m">PériCulture</title> was to study indexing methods for Web-based nontextual cultural content, specifically still images, video, and sound. Specific objectives included:
<list type="ordered">
                  <item>identifying properties of ancillary text useful for indexing;</item>
                  <item>comparing various combinations of these properties in terms of performance in retrieval;</item>
                  <item>contributing to the development of bilingual and multilingual searching environments;</item>
                  <item>developing retrieval strategies using ancillary text and synonyms of useful terms found therein.</item>
               </list>
            </p>
            <p>In computer science, research into indexing images and sound focuses on the low-level approach, performing statistical manipulations on primitives in order to identify semantic content. This approach is also referred to as the <soCalled>content-based approach</soCalled> (e.g. Gupta and Jain, Lew). In information science, research into indexing images and sound focuses on associating textual information with the nontextual elements, and this often involves manipulating ancillary text. This approach is referred to as the <soCalled>high-level</soCalled> or <soCalled>concept-based approach</soCalled> (e.g. Rasmussen, O'Connor, O'Connor, and Abbas). A number of factors militate in favour of automating the high-level approach as much as possible. These include the very large volume of Web-based materials available, the disparity among cataloguing and indexing methods from one collection to another, and the high cost and relative inconsistency of human indexing.</p>
            <p>Our work in this project focuses on text associated with Web-based still images, and builds on previous work in this area of information science (e.g. Goodrum and Spink, Jörgensen, Jörgensen et al., Turner and Hudon). We identified a number of Web sites that met our criteria, i.e., that contained multimedia objects, that had text associated with these objects that was broader than file names and captions, that were bilingual (English and French), and that housed Canadian digital cultural content. We identified keywords that were useful in indexing and studied their proximity to the object described. We looked at indexing information contained in the <hi rend="code">Meta</hi> and <hi rend="code">Alt</hi>  tags, and whether other tags contained useful indexing terms. We studied whether standards such as the <title>Dublin Core</title> were used. We identified Web-based resources for gathering synonyms for the keywords.</p>
            <p>Our study found that a large number of useful indexing terms are available in the ancillary text of many Web sites with cultural content. We evaluated various types of ancillary text as to their usefulness in retrieval. Our results suggest that these terms can be manipulated in a number of ways in automated retrieval systems to improve search results. Cross-language comparison of the results reinforces our previous research results, which suggest that indexing in other languages can be generated automatically from a single language using Web-based tools.</p>
            <p>Rich information that can be used for retrieval is available in many places on Web sites with cultural content, from the file name to explicit information in captions to descriptive information in surrounding text to the contents of various HTML tags. Algorithms need to be developed to exploit this information in order to improve retrieval.</p>
            <p>Finally, we feel that our work is useful because of the synergy created by the approaches we use. We are both interested in image indexing, but come from different fields. Lyne Da Sylva's expertise is in linguistics and James Turner's in information science. By working together, we are able to pool our knowledge and develop richer methods than would otherwise be available to either of us for approaching the question of automating indexing for images and other multimedia objects.</p>
         </div0>
      </body>
      <back>
         <div type="Bibliography">
            <head>Bibliography</head>
            <listBibl>
               <biblStruct>
                  <analytic>
                     <author>
                        <name reg="Goodrum, A.">A. Goodrum</name>
                     </author>
                     <author>
                        <name reg="A. Spink">A. Spink</name>
                     </author>
                     <title level="a">Image searching on the Excite web search engine</title>
                  </analytic>
                  <monogr>
                     <title level="j">Information Processing and Management</title>
                     <imprint>
                        <biblScope type="vol">27.2</biblScope>
                        <biblScope type="pages">295-312</biblScope>
                        <date value="2001">2001</date>
                     </imprint>
                  </monogr>
               </biblStruct>
               <biblStruct>
                  <analytic>
                     <author>
                        <name reg="Gupta, A.">A. Gupta</name>
                     </author>
                     <author>
                        <name reg="Ramesh C. Jain">Ramesh C. Jain</name>
                     </author>
                     <title level="a"> Visual information retrieval</title>
                  </analytic>
                  <monogr>
                     <title level="j">Communications of the ACM</title>
                     <imprint>
                        <biblScope type="vol">40.5</biblScope>
                        <biblScope type="pages">71-79</biblScope>
                        <date value="2004">71-79</date>
                     </imprint>
                  </monogr>
               </biblStruct>
               <biblStruct>
                  <monogr>
                     <author>
                        <name reg="Jörgensen, Corinne">Corinne Jörgensen</name>
                     </author>
                     <title level="m">Image attributes: an investigation</title>
                     <imprint>
                        <publisher>PhD thesis, Syracuse University</publisher>
                        <date value="1995">1995</date>
                     </imprint>
                  </monogr>
               </biblStruct>
               <biblStruct>
                  <analytic>
                     <author>
                        <name reg="Jörgensen, Corinne">Corinne Jörgensen</name>
                     </author>
                     <title level="a">Image attributes in describing tasks: an investigation</title>
                  </analytic>
                  <monogr>
                     <title level="j">Information Processing and Management</title>
                     <imprint>
                        <biblScope type="vol">34.2/3</biblScope>
                        <biblScope type="pages">161-174</biblScope>
                        <date value="1998">1998</date>
                     </imprint>
                  </monogr>
               </biblStruct>
               <biblStruct>
                  <analytic>
                     <author>
                        <name reg="Jörgensen, Corinne">Corinne Jörgensen</name>
                     </author>
                     <author>
                        <name reg="Alejandro Jaimes">Alejandro Jaimes</name>
                     </author>
                     <author>
                        <name reg="Ana B. Benitez">Ana B. Benitez</name>
                     </author>
                     <author>
                        <name reg="Shih-Fu Chang">Shih-Fu Chang</name>
                     </author>
                     <title level="a">A conceptual framework and empirical research for classifying visual descriptors</title>
                  </analytic>
                  <monogr>
                     <title level="j">Journal of the American Society for Information Science and Technology (JASIST)</title>
                     <imprint>
                        <biblScope type="vol">52.11</biblScope>
                        <biblScope type="pages">938-947</biblScope>
                        <date value="2001">2001</date>
                     </imprint>
                  </monogr>
               </biblStruct>
               <biblStruct>
                  <monogr>
                     <author>
                        <name reg="Lew, Michael S.">Michael S. Lew</name>
                     </author>
                     <title level="m">Principles of visual information retrieval</title>
                     <imprint>
                        <publisher>Springer</publisher>
                        <pubPlace>New York</pubPlace>
                        <date value="2001">2001</date>
                     </imprint>
                  </monogr>
               </biblStruct>
               <biblStruct>
                  <analytic>
                     <author>
                        <name reg="O'Connor, Brian C.">Brian C. O'Connor</name>
                     </author>
                     <author>
                        <name reg="Mary K. O'Connor">Mary K. O'Connor</name>
                     </author>
                     <author>
                        <name reg="June M. Abbas">June M. Abbas</name>
                     </author>
                     <title level="a">User reactions as access mechanism: an exploration based upon captions for images</title>
                  </analytic>
                  <monogr>
                     <title level="j">Journal of the American Society for Information Science</title>
                     <imprint>
                        <biblScope type="vol">50.8</biblScope>
                        <biblScope type="pages">681-697</biblScope>
                        <date value="1999">1999</date>
                     </imprint>
                  </monogr>
               </biblStruct>
               <biblStruct>
                  <analytic>
                     <author>
                        <name reg="Rasmussen, Edie M.">Edie M. Rasmussen</name>
                     </author>
                     <title level="a">Indexing images</title>
                  </analytic>
                  <monogr>
                     <editor>
                        <name reg="Williams,  Martha E.">Martha E. Williams</name>
                     </editor>
                     <title level="j">Annual Review of Information Science and Technology</title>
                     <imprint>
                        <biblScope type="vol">32</biblScope>
                        <biblScope type="pages">169-196</biblScope>
                        <date value="2004">2004</date>
                     </imprint>
                  </monogr>
               </biblStruct>
               <biblStruct>
                  <analytic>
                     <author>
                        <name reg="Turner, James M.">James M. Turner</name>
                     </author>
                     <author>
                        <name reg="Michèle Hudon">Michèle Hudon</name>
                     </author>
                     <title level="a">Multilingual metadata for moving image databases: preliminary results</title>
                  </analytic>
                  <monogr>
                     <editor>
                        <name reg="Howarth,  Lynne C.">Lynne C. Howarth</name>
                     </editor>
                     <editor>
                        <name reg="Christopher Cronin">Christopher Cronin</name>
                     </editor>
                     <editor>
                        <name reg="Anna T. Slawek">Anna T. Slawek</name>
                     </editor>
                     <title level="m">L'avancement du savoir : élargir les horizons des sciences de l'information, Travaux du 30e congrès annuel de l'Association canadienne des scicnces de l'information</title>
                     <imprint>
                        <pubPlace>Toronto</pubPlace>
                        <date value="2002">2002</date>
                        <biblScope type="pages">34-45</biblScope>
                     </imprint>
                  </monogr>
               </biblStruct>
            </listBibl>
         </div>
      </back>
   </text>
</TEI.2>