Category: Activity log

31/03/08

Permalink 03:20:21 pm, by sarneil, 147 words, 476 views   English (CA)
Categories: Activity log; Mins. worked: 90

Cleaned up topics listing

In the database, deleted all the records in the topics table that had no instances pointing to them (either never had any or the topic had been collapsed into another topic, e.g. ChHx into Chinese History).

Checked all the ids for the about to be deleted topics to ensure no instances pointing to them in the articles table. Found only one instance of a pointer to id 148 (originally coded as "X" by author), which I deleted from the field in the articles record.

Updated topics.txt, and sent JL and email saying if he wanted to collapse any more topics, I'm pretty confident we could do it without problem, though we are no longer able to use the text files to rebuild the db content if necessary, as they contain pointers to non-existant topic id's, though a bunch of global search and replaces would solve that.

16/01/08

Permalink 02:16:25 pm, by sarneil, 162 words, 499 views   English (CA)
Categories: Activity log; Mins. worked: 240

add 1896 data to database

Discovered I had not incorporated the 1896 data into the database, so did that.
Extracted text from .rtf file provided by JL.
Ran it through the process described in HowToProcessColonistFiles.txt on my Mac, with particular attention to normalizing the contractions Leona uses a lot in the transcripts, and to assigning the correct topic numbers in place of the topic codes she uses.
To import into the database, noticed that the transcript and cemetary fields were not in the order the db expected and also that the current version of mySQL requires a carriage return at the end of the last line (the previous one did not), so had to redo the upload after sorting those issues out. Settings for upload are found in sql_for_load_data.txt file on my Mac.
Tidied up the file topics.txt which contains all the topic codes and numerical ids
Created the contractions.txt file which contains all Leona's contractions and standardarized plain english substitutions.

06/07/07

Permalink 04:07:15 pm, by sarneil, 51 words, 610 views   English (CA)
Categories: Activity log; Mins. worked: 90

More work on finding nth item in topics list in TC

article 52989 has topics field 148,91 article 54104 has topics field 92,148 SELECT * FROM articles WHERE 148 IN (articles.topic) returns record 52989 SELECT * FROM articles WHERE 91 IN (articles.topic) returns no records SELECT * FROM articles WHERE 92 IN (articles.topic) returns record 54104 Still have to figure out a way to find the nth item in that field
Permalink 12:06:33 pm, by sarneil, 110 words, 593 views   English (CA)
Categories: Activity log; Mins. worked: 180

use left-joins to solve multiple-table problem in TC

On David's suggestion, I used a syntax like this:
SELECT * FROM articles LEFT JOIN topics ON topics.topicId = articles.topic
instead of the previous syntax:
SELECT * FROM articles,topics with a complicated AND clauses in the WHERE clause
as the COUNT query encompassing both tables exploded with the old syntax.

Currently the topics machinery presents the topics in alphabetical order and allows the user to select as many as they want (using an OR operator between each). It successfully handles one argument, but can not handle more than one argument.
What I think I'll do is use an "IS IN" operator to handle multiple arguments coming from the query interface.

Permalink 12:00:19 pm, by sarneil, 130 words, 605 views   English (CA)
Categories: Activity log; Mins. worked: 120

update topics list in TC index

Put plain-english values in the name field for each record in the topics table.

Added the following code:
Gold 151

These codes are redundant, so any future instances should be remapped as follows:
Artist 3 --> 2
Ch 31 --> 30
Grk 55 --> 53
LgHx 64 --> 63
Naturalized 81 --> 80
Postal 96 --> 93
RCNX 102 --> 101
RRHx 108 --> 107
School 111 --> 43
SkHx 113 --> 112
SVet 117 --> 115
Teacher 120 --> 43
Vet 132 --> 128
VetLetter 133 --> 135
VXL 139 --> 136
XV 150 --> 138

These codes had no instances, so I'm going to remove them from the table of topics:
BD 19 remove
C 27 remove
Death 37 remove
DM 40 remove
HuHx 57 remove
JT 61 remove
LUSH 69 remove
Misc 74 remove
RW 109 remove
X 148 remove
For now, I've preceded them with "XXX-" so they sort to the bottom of the list. I'll actually delete them when I'm sure no problems will result from doing so.

04/07/07

Permalink 04:36:50 pm, by sarneil, 183 words, 558 views   English (CA)
Categories: Activity log; Mins. worked: 180

sql problem when trying to use topics table

Put GUI elements in for selecting Topics (still haven't updated topic list).
Rudimentary code to take one selected value and write where clause based on that to hidden form element (similar to other filters, but for now passing in the string rather than the index int value).
Modified search.php to include that information in the SQL statement, but get a "can't find topics table" error. When I copy and paste the identical query into the phpMyAdmin sql query window, the query works.

SELECT collections.collectionName, newspapers.newspaperName, authors.authorName, topics.*, articles.* FROM collections,newspapers,authors,topics,articles WHERE articles.transcript LIKE '%teach%' AND (topics.topicName = 'Education') AND collections.collectionId = articles.collectionId AND newspapers.newspaperId = articles.newspaperId AND authors.authorId = articles.authorId AND topics.topicId = articles.topic ORDER BY pubDate ASC LIMIT 20

David suggests using a FROM clause on the articles table with an explicit LEFT JOIN argument for each of the additional tables with an ON clause equivalent to the current AND statement instead of the current FROM with all the tables and numerous AND statements.

Will try that tomorrow.

Permalink 11:05:17 am, by sarneil, 49 words, 567 views   English (CA)
Categories: Activity log; Mins. worked: 90

normalizing and alpha-sorting TC topics

Went through listing of topics, normalizing duplicates, removing items Leona said were errors, and adding entry for Gold. Also thinking about ways to present the list of topics alphabetically to user - may use an alpha field or may just rely on sorting. I'll do some tests and see.

06/06/07

Permalink 04:26:00 pm, by sarneil, 101 words, 611 views   English (CA)
Categories: Activity log; Mins. worked: 90

create new index numbers for TC topics

went through list of topics provided by Leona, assigned new index numbers based on alphabetical order of full text entries and excluding those topics which she instructed should not be included, and adding one new topic "Gold". Emailed her for confirmation before actually making any changes in db. Assuming she's ok, then I'll need to renumber all the current values (and there may be more than one in the topic field of any given record) to a temp value and then renumber those temp values to the final values (which in the case of those topics being eliminated will be null

31/05/07

Permalink 08:23:37 am, by sarneil, 87 words, 370 views   English (CA)
Categories: Activity log; Mins. worked: 60

clean up list of topics for TC index

Started on cleaning up the list of topics based on feedback from author. A number of the topics have to be removed and one "gold" has to be added. I need to renumber the remaining topics, then in the database do a number of search and replaces so that the values in the "topic" field for each record are correct. Also have to put the text for each topic into the appropriate table. Once all that is done, then I'll get back to writing the querying code.

10/05/07

Permalink 10:48:27 am, by sarneil, 117 words, 386 views   English (CA)
Categories: Activity log; Mins. worked: 30

feedback on search strings and dates within transcripts

Dealt with two queries on Newspaper db:
1) a number of the transcripts (e.g. search for modern California bungalow) contain data like this: "modern California bungalow; Apr 25, 10 - 29 buildings ordered destroyed by city... full list of bldg/owners" the question is what the "10" means (year or page number). In any case, almost certainly the fix will be manual editing of the data, as the records contain a publication date in standard format at the start of the record which is successfully parsed and entered in the pubDate field for that record.

2) someone asking about ability to search for variant spellings (for names). I confirmed current capabilities and asked what further capabilities and interface person has in mind.

:: Next Page >>

Times Colonist Transcript Database

The goal of this project is to take a collection of transcripts of new stories from early editions of the Times Colonist newspaper which are currently in text files containing special codes for various bits of information, normalize the records, put them into an SQL database and then write a querying front-end.

Reports

Categories

May 2013
Sun Mon Tue Wed Thu Fri Sat
 << <   > >>
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31  

XML Feeds