Discovered I had not incorporated the 1896 data into the database, so did that.
Extracted text from .rtf file provided by JL.
Ran it through the process described in HowToProcessColonistFiles.txt on my Mac, with particular attention to normalizing the contractions Leona uses a lot in the transcripts, and to assigning the correct topic numbers in place of the topic codes she uses.
To import into the database, noticed that the transcript and cemetary fields were not in the order the db expected and also that the current version of mySQL requires a carriage return at the end of the last line (the previous one did not), so had to redo the upload after sorting those issues out. Settings for upload are found in sql_for_load_data.txt file on my Mac.
Tidied up the file topics.txt which contains all the topic codes and numerical ids
Created the contractions.txt file which contains all Leona's contractions and standardarized plain english substitutions.
No Pingbacks for this post yet...
The goal of this project is to take a collection of transcripts of new stories from early editions of the Times Colonist newspaper which are currently in text files containing special codes for various bits of information, normalize the records, put them into an SQL database and then write a querying front-end.
|<< <||> >>|