filter by topic code, in particular second and subsequent code of records which contain more than one topic code
process data for 1896 and 1897 and upload into database - done Dec. 2007 / Jan 2007
I need to:
- renumber the remaining topics
- in the database do a number of search and replaces so that the values in the "topic" field for each record are correct
- put the text for each topic into the appropriate table.
Then ready to go back to code for query and reporting on this field.
Still to do:
improve text search (case-insensitivity for highlighting)
filter by topic code
process results to xhtml or tab delimited text for export
there are also anomalies in the data introduced by the proofers that will have to be removed, and John has a couple more years of data for the general index coming.
Create GUI to allow:
text search OR records with subject code OR all records
restrict date range for any of above queries
return results in xhtml table.
By mid-April, had basic and advanced text search, date range and filter by paper name or collection done.
Still to do: improve text search (case-insensitivity for highlighting) filter by topic code and process results to xhtml or tab delimited text for export.
Add field to TC main index to include name of paper.
Look at structure of data files for other datasets and write code to transform them to same data structure.
By end, should have one file for each dataset, ready to be imported into an sql table - one table for each dataset.
Got data from John early April and finalized procedures then.
The goal of this project is to take a collection of transcripts of new stories from early editions of the Times Colonist newspaper which are currently in text files containing special codes for various bits of information, normalize the records, put them into an SQL database and then write a querying front-end.
|<< <||> >>|