The LICHEN Project: The LInguistic and Cultural Heritage Electronic Network Lisa Lena Opas-Hanninen lisa.lena.opas-hanninen@oulu.fi University of Oulu Jean Anderson j.anderson@arts.gla.ac.uk University of Glasgow Ilkka Juuso ilkka.juuso@ee.oulu.fi University of Oulu Tapio Seppänen tapio@ee.oulu.fi University of Oulu The international, interdisciplinary and multilingual LICHEN project focuses on the languages and cultures of the northern circumpolar region, that is the region north of the 55th parallel. Its underlying assumption is that language and culture are as important to the survival and well-being of populations as more obvious ecological, social and health issues. We believe that the creation of a digital portal giving access to written and spoken texts in the languages of the region will further its well-being. Faced with minority languages, governments in the recent past have pursued policies of assimilation. This has applied to indigenous languages in Canada, to Gaelic and Scots in Scotland, and to Finnic minority languages in the Circumpolar region: Meänkieli and Swedish Finnish in Sweden; the Kven language in Norway; Viena Karelian, Olonets Karelian and Vepsian in Russia; the Võro and Seto languages in Estonia; and Livonian in Latvia. LICHEN aims to collect and disseminate information about the languages spoken in the circumpolar region in order to promote the linguistic confidence and self-image of their speakers. It will promote cultural awareness among the peoples of the North, facilitating cross-cultural communication between them in an age of rapid global change. LICHEN will create communication between research units in order to promote discussion on the common needs of research on the minority languages of the North. We are doing this by: •creating an electronic framework for the collection, management, online display, and exploitation of corpora of the languages of the circumpolar regions; •creating a website with information on these languages and the peoples speaking them (the LICHEN website will be launched in January 2005); •creating a virtual learning environment for teaching the linguistic and cultural heritage of the circumpolar region; •carrying out a pilot project on the Meänkieli and Kven languages; •identifying research on topics of immediate importance and common interest; •setting up an inter-institutional doctoral research programme. LICHEN has existing resources and work has begun. We have Meänkieli tapes totalling about 150 hours and Kven language tapes of about 100 hours for our immediate use. These tapes are now being digitized. We have access to both the structure and contents of the Scottish Corpus of Texts and Speech (SCOTS) at the University of Glasgow, currently totalling 0.5 million words of spoken and written Scots and Scottish English. We have the nucleus of a research team based on the English and Finnish Departments and the Department of Electrical Engineering at Oulu and the English Language Department and SCOTS project at Glasgow. This team has considerable linguistic and computing expertise. Our first aim in year 1 is to complete the technical specifications for the electronic framework through consultations between language and computing staff team members. We are also consulting other people working in the field of corpus building at meetings and conferences, and by email. In addition to housing the data, the system will accommodate management, administration and programs for concordancing and searching the data. The ultimate goal of the development of the computing tools is a shell which can be adapted to any language. For many languages at risk there is a need both to preserve existing materials and develop new ones. During this year, we will design and implement a prototype of this shell. A longer term goal is to provide an interface to the tools which allows the end user to define or rename all functions in their own language. We will continue work on the Meänkieli and Kven recorded language material. We will work generally on the problem of languages without standard written forms, starting with Kven and Scots (a worldwide problem as people endeavour to record languages before they vanish). As an initial solution to the problem of written forms, it is proposed that several Kven speakers should be asked to transcribe a short passage and the results compared. Scottish Language Dictionaries will be consulted here. We will investigate the feasibility of working through community groups in minority language areas. In addition to harnessing local knowledge, we hope that a policy of local workshops will stimulate skills development and job creation. A poster presentation at ACH/ALLC 2005 will enable us to publicise the project to other minority language scholars and to enlist the considerable expertise of the conference participants in the discussion of the design of the online corpus tools. It is our intention that the functions of the corpus shell should include all the basic requirements for a corpus builder and for a corpus user in an easy-to-use environment. We know the end users will include many who are not technically sophisticated and would not have other avenues for finding advice on digitization or access to an Internet platform to share their materials. Our idea of 'basic' requirements for online use include corpus browsing, word and phrase searches, wildcard searches, concordancing; for corpus building we will include guidelines on recording, digitization, copyright and data protection. We would welcome this opportunity to discuss our proposed designs with the expert community of ACH/ALLC. Bibliography Scottish Corpus of Texts and Speech Institute for the Languages of Scotland Anderson, J., et al. The SCOTS Corpus Models and Methods in the Handling of Unconventional Digital Corpora Palgrave Macmillan Houndsmills Forthcoming Volume 1: Synchronic Corpora Review of Michael P. Oakes, Statistics for Corpus Linguistics Opas, L.L. Tweedie, F.J. Literary and Linguistic Computing 14.4 541-543 1999 Palander, M. Opas-Hänninen, L.L. Tweedie, F.J. Neighbours or enemies? Competing variants causing differences in transitional dialects Computers and the Humanities 37.4 2003 Thule Institute MediaTeam Oulu research group Meänkieli information Winsa, Birger Language attitudes and social identity. Oppression and revival of a minority language in Sweden Applied Linguistics Association of Australia Occasional paper 17 1998 Kven bibliography Ruija kvenmuseum Ruijan Kaiku newspaper TAPoR tools Newcastle Electronic Corpus of Tyneside English University Centre for Computer Corpus Research on Language Linguistic Data Consortium Linguistic Atlas Projects Text Encoding Initiative