write code to process TC transcripts
Posted by sarneil on 16 Jan 2007 in Activity log
Wrote a set of simple search and replaces to normalize the data and provide explicit field delimiters
Wrote GREP search and replaces to extract subject codes, publication dates, event dates and cemetary plots and populate the appropriate field in each record
Wrote javascript method (quick and dirty) to provide "derived" publication and event dates for those records lacking one or both of those pieces of information and to normalize the dates to a format acceptable to SQL
Still need to do a bunch of simple search and replaces on all the topic codes and abbreviations found in the text
Wrote John Lutz with a number of questions remaining to further normalize the data (on page numbers, records that consist of aggregates of a number of actual newspaper articles over numerous issues etc).