Archives for: 2011

04/11/11

Permalink 10:41:54 am, by mholmes, 29 words, 129 views   English (CA)
Categories: Activity log; Mins. worked: 60

Wrote a preliminary project proposal for phase 2

Following our meeting last week, I've now drafted a preliminary shot at a project proposal for phase two, and forwarded it to JS-R for additional data, edits and gap-filling.

23/08/11

Permalink 12:16:00 pm, by mholmes, 65 words, 137 views   English (CA)
Categories: Activity log; Mins. worked: 30

Finding duplicate title codes

There are 117. Here's how I found them, for future reference:

SELECT a.`ttl_title_id`, b.`ttl_title_id`, a.`ttl_title_code`
FROM `titles` a, `titles` b
WHERE a.`ttl_title_code` = b.`ttl_title_code`
AND a.`ttl_title_id` <> b.`ttl_title_id`
AND a.`ttl_title_id` < b.`ttl_title_id`
ORDER BY a.`ttl_title_id`
LIMIT 0, 1000
Permalink 11:58:50 am, by mholmes, 199 words, 135 views   English (CA)
Categories: Activity log; Mins. worked: 60

Catching data entry errors using automation

Negative duration values, in the case of 11 records in the VW_trans_composite_eth_prop view, have revealed some data entry errors in the "preceding titles" and "related titles (newer)" fields. This suggests that there may be other such errors which are not apparent, because their dates happen to be in the correct sequence. Therefore we propose creating an automated check on the data, to locate such inconsistencies. This is an outline of how the check would be done:

  • For each distinct property:
    • List all the titles for that property in date order. For each title:
      • Check that its preceding-title links (there can be more than one) point to titles which precede it in date order.
      • Check that its related-title-newer links (again, multiple) point to titles which follow it in date order.

However, I see a problem with this. If a title is created as a result of a merger of two or more preceding titles, each of which has a different date, it will be impossible to tell whether there is an error or not. Ditto in the case of subsequent titles, in the case of a subdivision. Waiting for word from JS-R on how to handle this.

Permalink 08:21:53 am, by mholmes, 539 words, 146 views   English (CA)
Categories: Activity log; Mins. worked: 45

Spreadsheet data now inserted

I have now carried out the insertion, and constructed a detailed audit trail to pass on to JS-R and VG. Here's the opening section of the audit document (the rest is very long, and consists of precise records of every operation):


____AUDIT TRAIL FOR THE AUTOMATED INSERTION OF SPREADSHEET DATA____

Use this document to check the validity of inserted records and trace the source of any errors that may be found.


WHAT HAS BEEN DONE:

 - A new title record has been inserted for each record in the spreadsheet.

 - The title has been linked to its property. All properties referred to in the spreadsheet appear to have already existed in the database. so no new property records were created.

 - All owners recorded in the spreadsheet have either been:
      - discovered as already existing in the database, or 
      - inserted as new records into the database
   and each owner has been linked to their title(s).

 - A new mortgage record has been inserted for each mortgage in the spreadsheet.

 - All lenders recorded in the spreadsheet have either been:
      - discovered as already existing in the database, or 
      - inserted as new records into the database
   and each lender has been linked to their mortgage(s).

 - A simple algorithm has been used to determine whether owners and lenders are individual or institutional, and their records have been configured accordingly. I have strong confidence in this algorithm, but there may be occasional errors.

 - All "other documents" recorded in the spreadsheet have either been:
      - discovered as already existing in the database, or 
      - inserted as new records into the database
   and each document has been linked to its title(s).


WHAT HAS NOT BEEN DONE:

 - No attempt has been made to set values for the ethnicity of lenders or owners. This cannot be done in an automated fashion.

 - No attempt has been made to discover seller information for the new records. This is also best done by humans, but it's possible we might be able to automate it if time permits.



BACKUPS

I have backups of the database taken immediately before and immediately after the process.


AUTO_INCREMENT VALUES PRIOR TO INSERTION OF NEW RECORDS FROM SPREADSHEET

These values represent the FIRST id and LAST id of a record inserted as part of this process. Within this range, data in the table has been inserted by the automated process.

    TABLE               FIRST ID          LAST ID

titles                    2300             2738
owners                    1863             2292
lenders                    553              691
mortgages                  929             1306
other_docs                 388              518
owners_to_titles          9678            10363
mortgages_to_titles       3246             3623
docs_to_titles            1086             1328
lenders_to_mortgages      1130             1564

No new records were inserted in the props table; presumably all the new titles relate to properties already in the database.


FULL INSERTION RECORD

This is the complete record of record creation through the process. Use this to figure out exactly what happened in the case of any given record.

Operation # 1

Processing record with id # d1e1150
Found 1 property records.
Using property id #109
Inserted title record # 2300
Processing owners
Inserted owner id # 1863
Inserted owner-to-title link id # 9678
Inserted mortgage id # 929
Inserted mortgage-to-title link id # 3246
Processing lenders for this mortgage.
Inserted lender id # 553
Inserted lender_to_mortgage link id # 1130
Processing other documents.
Inserted other_doc id # 388
Inserted doc-to-title link id # 1086
Finished processing record d1e1150.

[...]

22/08/11

Permalink 03:40:19 pm, by mholmes, 272 words, 107 views   English (CA)
Categories: Activity log; Mins. worked: 240

Finished the import code

I've now finished and tested the code for importing records into the db. The end result (which creates 438 primary title inserts, but thousands of ancillary records) was too large to run in a single operation, so I'm using <xsl:result-document> to split it into five sub-operations. Each one produces a detailed report on what it's done, so that, combined with the source XML file, should be all we need in terms of an audit trail.

It took a little longer than expected because of two additional aspects I hadn't planned for initially. Both the lenders and the owners tables have mechanisms for distinguishing between individuals and institutions, and I ended up adding some code to determine whether new records being added fell into one or the other category, by examining the name of the new owner or lender. In addition, I decided to parse out the forenames and surnames of individuals.

I would post the script here, but it's very long, very messy, based on invented XML in the source document, and it's a one-shot task that we won't have to do again, so it's not of any general use. However, the techniques I've figured out for this probably will be used again.

I haven't yet run it against the live database, only against the dev db; I don't want to make major changes right at the end of the day, so I'm going to do it tomorrow morning, after taking a fresh backup, and recording the upper bounds of the current db tables, so that I can report to J-SR and VG, and they can look at the results.

19/08/11

Permalink 02:27:26 pm, by mholmes, 53 words, 108 views   English (CA)
Categories: Activity log; Mins. worked: 180

Significant progress

Working with a single record, I've got the following stages done:

  • Property identification or creation.
  • Title creation.
  • Owner identification or creation.
  • Owner linking to title.
  • Mortgage creation.
  • Mortgage linking to title.

I'm now working on lenders. Once I have one record working, I'll start trying blocks of 5, 10, 20, 50, etc. till the whole thing works.

18/08/11

Permalink 03:21:13 pm, by mholmes, 202 words, 170 views   English (CA)
Categories: Activity log; Mins. worked: 120

Going the PHP route

I've decided to go the PHP route for the moment. These two blocks of code would be run before and after running the PHP process, and I'm putting them here so I can get them when I need them. They only need to be run once, and I'll certainly be hacking away with my scripts for many hours in between.

    # Add the extra columns we need.
ALTER TABLE `props` ADD `temp_tracking` VARCHAR(1024);
ALTER TABLE `titles` ADD `temp_tracking` VARCHAR(1024);
ALTER TABLE `owners` ADD `temp_tracking` VARCHAR(1024);
ALTER TABLE `lenders` ADD `temp_tracking` VARCHAR(1024);
ALTER TABLE `mortgages` ADD `temp_tracking` VARCHAR(1024);
ALTER TABLE `other_docs` ADD `temp_tracking` VARCHAR(1024);
    
    # Delete the extra columns we added.
    
ALTER TABLE `props` DROP COLUMN `temp_tracking`;
ALTER TABLE `titles` DROP COLUMN `temp_tracking`;
ALTER TABLE `owners` DROP COLUMN `temp_tracking`;
ALTER TABLE `lenders` DROP COLUMN `temp_tracking`;
ALTER TABLE `mortgages` DROP COLUMN `temp_tracking`;
ALTER TABLE `other_docs` DROP COLUMN `temp_tracking`;

I've written the basic code to generate the SQL to add the core title table records, built in the PHP framework for running queries, and tested with a single record to demonstrate that it's practical. Now I can start building the more complicated queries.

16/08/11

Permalink 04:39:42 pm, by mholmes, 325 words, 338 views   English (CA)
Categories: Activity log; Mins. worked: 180

Problems working with stored procedures in phpMyAdmin

The insertion code I'm writing needs to use stored procedures in order to do slightly complex things such as IF THEN ELSE structures. However, I've discovered that it's impossible to call a stored procedure which has more than one statement in it from inside phpMyAdmin because the CLIENT_MULTI_STATEMENTS flag is not set by phpMyAdmin in its connection string. This is not configurable in phpMyAdmin; you'd need to change the actual source to make this possible.

I confirmed this by running a test at the mysql command line on mysqldev.hcmc.uvic.ca (like this: mysql -u hcmc -p properties_dev). From the CLI, I could successfully define a procedure and call it, but from inside phpMyAdmin I could only define the procedure; attempts to call it result in Error 1312 (0A000): PROCEDURE proc_name can't return a result set in the given context. The same applies to SQL scripts uploaded into phpMyAdmin via the Import function.

So it seems to me that one useful working method would be one whereby I have phpMyAdmin open (for easy checking of the results of my script), and I copy my script from the local machine (where it's generated by XSLT which runs against the XML data from the spreadsheet) over to mysqldev; then I run the script from the command line, and check the results. Not ideal at all. But possible.

Another approach would be to generate not SQL but PHP, with embedded SQL, and run the PHP script; the PHP connection method enables you to set the requisite flag when making the connection. That might be easier, but again I'd need to save it and copy it over to home1t in order to be able to run it; and the extra embedding, given the scale of the operation I want to do, might be a problem. So the first option is probably the best.

It's taken me half the day to figure this out. Curses.

12/08/11

Permalink 08:10:00 am, by mholmes, 1156 words, 121 views   English (CA)
Categories: Activity log; Mins. worked: 120

Plan for generating SQL for insertion into db

Now that I have the data in a usable XML format (an example record is below), I need to plan the process by which SQL statements are generated and sequenced in order to add it to the database. There are a number of key issues that need to be carefully handled:

  • Insertions will be made into the following tables:
    • Properties
    • Titles (the core table)
    • Owners
    • Lenders
    • Mortgages
    • Other documents
  • The insertions will have to be made in such a way that links can be established between related records. For this reason, each record has a unique id, and each mortgage does too (because mortgages and lenders will have to be linked).
  • Because there are lots of one-to-many relationships, and also because many of the items such as owners and lenders being added will be duplicates of existing ones, we need to stage the insertions carefully.

This is the proposed sequence:

  • First, add a temporary "tracking" field to each of the tables we're using. This will be used to store a list of ids to which the record is linked, so we can create links later on.
  • For each record:
    • Handle its property information:
      • Check for an existing record with the same property information as this one. If it exists, add the record id to its tracking field (followed by a space). If it doesn't, create it, and add the id.
    • Create its title record (we assume that these titles don't yet exist in the database). Place the record id in its tracking field.
    • Create the owners. For each owner:
      • Check for an existing owner with the same display name, street number and street name. If it exists, add the record id to its tracking field. If it doesn't exist, create it, and place the record id in its tracking field.
    • Create the mortgages. For each mortgage:
      • Create a new mortgage record (we assume that mortgages are unique to titles, and therefore unique in the db). Store the mortgage id in its tracking field.
      • Create the lenders. For each lender:
        • Check for an existing lender with the same display name, street number and street name. If it exists, add the mortgage id to its tracking field. If it doesn't exist, create it, and place the mortgage id in its tracking field.
    • Handle the "other documents". For each document:
      • Check for an existing record with the same document number. If it exists, add the record id to its tracking field; if not, create it, and place the record id in its tracking field.
  • Create the links. For each title with an id in its tracking field (meaning we've just added it), get its ttl_title_id, and:
    • Get the first record in the property table which has the record's tracking id in its tracking field (there should only be one), and set the title record's ttl_property_id_fk field to its prp_property_id.
    • For each owner which has the same id in its tracking field, get its own_owner_id, and create a new record in the owners_to_titles table.
    • For each mortgage which has the record id as the middle part of its tracking id:
      • Get its mgg_mortgage_id.
      • Create a new record in the mortgages_to_titles table.
      • For each lender which has the mortgage's tracking id in its tracking field:
        • Get its lnd_lender_id.
        • Create a record in the lenders_to_mortgages table.
    • For each record in the other_docs table which has the record's id in its tracking field, get its odc_doc_id and create a record in the docs_to_titles table.
  • Empty all the tracking fields. We may as well keep them, in case they're useful in future.
<record gen-id="d1e30029" orig-id="5701075">
      <property>
         <district>196</district>
         <plan>196</plan>
         <block>70</block>
         <lot>2</lot>
         <census-tract>57.01</census-tract>
      </property>
      <title>
         <title-code>J68726L</title-code>
         <traces/>
         <t1-doc-lots>1, 2 and 3</t1-doc-lots>
         <traces>H9471L,H9472L, na, na, na, na, na, na, na, na, na, na</traces>
         <date>1981-09-02</date>
         <ownership-split>Shek Holdings Co. Ltd. (Incorporation no. 241563): 7/10
A. Youngson Investments Ltd (Incorporation no. 240974): 3/10</ownership-split>
         <notes/>
      </title>
      <owners>
         <owner>
            <name>Shek Holdings Co. Ltd. (Incorporation no. 241563)</name>
            <occupation/>
            <street-num>539</street-num>
            <street>Main St</street>
            <city/>
            <province/>
            <post-code/>
            <undivided-interest>7/10</undivided-interest>
         </owner>
         <owner>
            <name>A. Youngson Investments Ltd (Incorporation no. 240974)</name>
            <occupation/>
            <street-num>940</street-num>
            <street>West King Edward Ave</street>
            <city>n</city>
            <province/>
            <post-code/>
            <undivided-interest>3/10</undivided-interest>
         </owner>
      </owners>
      <mortgages>
         <mortgage gen-id="mort_d1e30029_d1e30236">
            <doc-num>M H6566</doc-num>
            <year/>
            <cancelled>n</cancelled>
            <cancelled-date/>
            <value/>
            <modified/>
            <modified-date/>
            <modified-by/>
            <term-date/>
            <interest/>
            <remarks>Lots 1 and 2</remarks>
            <lenders>
               <lender>The Imperial Life Assurance Company of Canada</lender>
            </lenders>
         </mortgage>
         <mortgage gen-id="mort_d1e30029_d1e30263">
            <doc-num>M H7796</doc-num>
            <year>1980</year>
            <cancelled>y</cancelled>
            <cancelled-date>1981</cancelled-date>
            <value/>
            <modified/>
            <modified-date/>
            <modified-by/>
            <term-date/>
            <interest/>
            <remarks>Lots 1 and 2</remarks>
            <lenders>
               <lender>Canadian Imperial Bank of Commerce</lender>
            </lenders>
         </mortgage>
         <mortgage gen-id="mort_d1e30029_d1e30295">
            <doc-num>M H45708</doc-num>
            <year>1980</year>
            <cancelled>y</cancelled>
            <cancelled-date>1981</cancelled-date>
            <value/>
            <modified/>
            <modified-date/>
            <modified-by/>
            <term-date/>
            <interest/>
            <remarks>Lots 1 and 2</remarks>
            <lenders>
               <lender>Canadian Imperial Bank of Commerce</lender>
            </lenders>
         </mortgage>
      </mortgages>
      <other-docs>
         <doc>
            <doc-num>AR H6567</doc-num>
            <year>1980</year>
            <parties>The Imperial Life Assurance Company of Canada</parties>
            <remarks>Lots 1 and 2</remarks>
         </doc>
      </other-docs>
   </record>


10/08/11

Permalink 04:00:10 pm, by mholmes, 119 words, 105 views   English (CA)
Categories: Activity log; Mins. worked: 120

More work on importing old spreadsheet data

I've now finished the second stage of this work, with nest_data.xsl complete and working; all mortgages, lenders and associated documents are now organized into a structured XML document which can be used to generate SQL. In the process, I had to manually fix some of the column headers in unwrapped_fods.xml, because they were inconsistent in terms of spelling and case; that means the original unwrap_fods.xsl can't be run again to re-run the whole process without doing the same manual intervention. However, I don't think that will be necessary, since the data in unwrapped_fods.xml is complete. So now we have unwrapped_fods_nested.xml, and I can start generating SQL from that.

09/08/11

Permalink 03:51:58 pm, by mholmes, 346 words, 66 views   English (CA)
Categories: Activity log; Mins. worked: 360

Working on importing old spreadsheet data

I've spent most of today figuring out an approach for importing the old spreadsheet data discussed in previous posts. This is what I've got to so far:

  • I opened the file in LibreOffice, deleted all but the first sheet (only the first sheet has corrected data), and saved it as a FODS file.
  • I wrote an transformation called unwrap_fods.xsl to eliminate all elements and attributes we don't need, and to unravel colspans so that we have the same number of cells in each row. At the same time, I eliminated all instances of "na" from cells where it was the only content; this is (I assume) the same as empty.
  • I've written most of a second XSLT stylesheet called nest_data.xsl which is generating a single record from each row, but which uses nesting to manage the owner, mortgage etc. blocks in a logical manner. At the moment, every row has a complete set of cells for up to eight owners, even where there's only a single owner. I'm now processing all this data into a single <owners> element containing only one <owner> element for each owner who actually exists. (The XML format is invented just for this purpose, because it's purely transitional.) I now have to do the same for the 22 mortgage blocks.

Once I have this XML, the plan is to create four XSL stylesheets which produce SQL inserts for the property, mortgage, owner and title tables. Each of these tables will have a new field added to it which can be used to store a unique value for each source record (row in the original table); then these fields can be used to create records in the linking tables. Finally, a process will have to be worked out which can generate lists of potential duplicate records, which can then be checked by a human, and merged if appropriate. This problem is not directly related to the current one, but the issue of duplicates will be much worsened by the automated adding of data through this process.

30/06/11

Permalink 10:13:42 am, by mholmes, 615 words, 91 views   English (CA)
Categories: Activity log; Mins. worked: 120

Completed institution/ethnicity demuxing

The following remaining tasks from yesterday are now completed:

  • DONE: The ethnicities table values have been abused to store information about institutions; we need to separate out that data. Copy data from the owners_to_ethnicities table to the owners table as follows:
    • DONE: If a record exists for a given owner with the ethnicity value 11 (ethnic institution), and there's also one with the value 10 (private institution), delete the latter, since the former supercedes it (for now).
    • DONE: For each record where the ethnicity value is 11, set the corresponding owner field to 2.
    • DONE: For each record where the ethnicity value is 10, set the corresponding owner field to 3.
    • DONE: For each record where the ethnicity value is 9, set the corresponding owner field to 4.
    • DONE: Delete all records in the owners_to_ethnicities table where the ethnicities values are 9, 10 or 11.
  • DONE: Integrate recent changes in the trunk of the AdaptiveDB codebase into Properties. This may give a small increase in performance, and a couple of useful improvements.

This remains to be done:

  • Make changes to the AdaptiveDB code for this and other projects to split the tabs into separate files. That will improve page loading dramatically.

This is the SQL for the db side of the changes:

/**
**  SECOND PHASE: June 30 changes. 
*/


/* Set the institution type field to 1 (the default) for all records. */
UPDATE `owners` SET `own_insttype_id_fk` = "1";

/* 
** I wanted to delete records in the owners_to_ethnicities table where there's both an 11
** and a 10; in this case, we don't need the 10 because the 11 supercedes it.
*/

/* 
** First check what we have. There were 97 records before deletion.
*/

SELECT * FROM `owners_to_ethnicities` AS ote1 WHERE ote1.ote_ethnicity_id_fk = "10" AND EXISTS (SELECT * FROM `owners_to_ethnicities` AS ote2 WHERE ote1.ote_owner_id_fk = ote2.ote_owner_id_fk AND ote2.ote_ethnicity_id_fk = "11");

/* 
** We can't actually delete records because you can't query the same table in a subquery as you're deleting from in the main clause; this is a bit of a showstopper. So we'll have to take a different approach. We first set the value for the ethnicity 10 (Private institution), then override it where there's an 11. Finally we set the value for 9.
*/

SELECT * FROM `owners` WHERE EXISTS (SELECT * FROM `owners_to_ethnicities` AS ote1 WHERE ote1.ote_ethnicity_id_fk = "10" AND owners.own_owner_id = ote1.ote_owner_id_fk);

/* 266 results. */

UPDATE `owners` SET `own_insttype_id_fk` = "3" WHERE EXISTS (SELECT * FROM `owners_to_ethnicities` AS ote1 WHERE ote1.ote_ethnicity_id_fk = "10" AND owners.own_owner_id = ote1.ote_owner_id_fk);

SELECT * FROM `owners` WHERE EXISTS (SELECT * FROM `owners_to_ethnicities` AS ote1 WHERE ote1.ote_ethnicity_id_fk = "11" AND owners.own_owner_id = ote1.ote_owner_id_fk);

/* 97 results */

UPDATE `owners` SET `own_insttype_id_fk` = "2" WHERE EXISTS (SELECT * FROM `owners_to_ethnicities` AS ote1 WHERE ote1.ote_ethnicity_id_fk = "11" AND owners.own_owner_id = ote1.ote_owner_id_fk);

SELECT * FROM `owners` WHERE EXISTS (SELECT * FROM `owners_to_ethnicities` AS ote1 WHERE ote1.ote_ethnicity_id_fk = "9" AND owners.own_owner_id = ote1.ote_owner_id_fk);

/* 10 results */

UPDATE `owners` SET `own_insttype_id_fk` = "4" WHERE EXISTS (SELECT * FROM `owners_to_ethnicities` AS ote1 WHERE ote1.ote_ethnicity_id_fk = "9" AND owners.own_owner_id = ote1.ote_owner_id_fk);

/* 
** Now we can delete the three values from the ethnicities table, which will result in the deletion of the linked records.
*/

SELECT * FROM ethnicities where eth_ethnicity_id in (9,10,11); 

/* 3 results */

SELECT * FROM owners_to_ethnicities where ote_ethnicity_id_fk in (9,10,11);

/* 373 results: 266 + 97 + 10 */

DELETE FROM ethnicities where eth_ethnicity_id in (9,10,11); 

/* Check that linking records were deleted. */

SELECT * FROM owners_to_ethnicities where ote_ethnicity_id_fk in (9,10,11);

/* 0 results */

29/06/11

Permalink 03:50:03 pm, by mholmes, 843 words, 95 views   English (CA)
Categories: Activity log; Mins. worked: 150

Initial changes made; some remain

From the task list in the previous post, this is what I've accomplished this afternoon (tested in dev, and migrated to live):

  • DONE: Rename the Titles ttl_price field to ttl_consideration.
  • DONE: Modify the local_classes.php file to reflect this change, and move the field to just before the mortgages.
  • DONE: Add two more integer fields right after it: ttl_declaredvalue and ttl_marketvalue.
  • DONE: Add these two fields to the local_classes.php file.
  • DONE: Add a new date field, ttl_transfer_date, to the Titles table, right after the existing date field.
  • DONE: Add this field to the local_classes.php file.
  • DONE: Add a text field for Head Office to the Owners table.
  • DONE: Add this field to the local_classes.php file.
  • DONE: Create a sellers_to_titles table, linked to the owners and the titles.
  • DONE: Create a lawyers_to_titles table, linked ditto.
  • DONE: Create one-to-many fields in the local_classes.php file for each of the above.
  • DONE: Create a new table called institution_type, with id, name and description.
  • DONE: Populate the table with 1. "" (the default), 2. Ethnic institution, 3. Private institution, 4. Public institution.
  • DONE: Add a lookup field to the owners table which points at the new table.
  • DONE: Update scripts such as copy_live_to_dev_db.sql to reflect changes in structure.

The following remain to be completed:

  • The ethnicities table values have been abused to store information about institutions; we need to separate out that data. Copy data from the owners_to_ethnicities table to the owners table as follows:
    • If a record exists for a given owner with the ethnicity value 11 (ethnic institution), and there's also one with the value 10 (private institution), delete the latter, since the former supercedes it (for now).
    • For each record where the ethnicity value is 11, set the corresponding owner field to 2.
    • For each record where the ethnicity value is 10, set the corresponding owner field to 3.
    • For each record where the ethnicity value is 9, set the corresponding owner field to 4.
    • Delete all records in the owners_to_ethnicities table where the ethnicities values are 9, 10 or 11.
  • NEW: Integrate recent changes in the trunk of the AdaptiveDB codebase into Properties. This may give a small increase in performance, and a couple of useful improvements.
  • NEW: Make changes to the AdaptiveDB code for this and other projects to split the tabs into separate files. That will improve page loading dramatically.

This is the SQL for the db side of the changes:

/*
** This is a list of changes in db structure and massages to data
** which were done starting June 29 2011. See the blog posting of
** the same date for more details and explanation.
*/


ALTER TABLE titles CHANGE COLUMN ttl_price ttl_consideration int(11) default NULL;

ALTER TABLE titles ADD ttl_declaredvalue int(11) default NULL;
ALTER TABLE titles ADD ttl_marketvalue int(11) default NULL;

ALTER TABLE titles ADD ttl_transferdate date not NULL AFTER ttl_date;

ALTER TABLE owners ADD own_head_office varchar(128);


CREATE TABLE IF NOT EXISTS `sellers_to_titles` (
  `stt_stt_id` int(11) NOT NULL auto_increment,
  `stt_owner_id_fk` int(11) NOT NULL,
  `stt_title_id_fk` int(11) NOT NULL,
  PRIMARY KEY  (`stt_stt_id`),
  KEY `stt_owner_id_fk` (`stt_owner_id_fk`),
  KEY `stt_title_id_fk` (`stt_title_id_fk`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;


ALTER TABLE `sellers_to_titles`
  ADD CONSTRAINT `sellers_to_titles_ibfk_1` FOREIGN KEY (`stt_owner_id_fk`) REFERENCES `owners` (`own_owner_id`) ON DELETE CASCADE ON UPDATE CASCADE,
  ADD CONSTRAINT `sellers_to_titles_ibfk_2` FOREIGN KEY (`stt_title_id_fk`) REFERENCES `titles` (`ttl_title_id`) ON DELETE CASCADE ON UPDATE CASCADE;
  
  
  
CREATE TABLE IF NOT EXISTS `lawyers_to_titles` (
  `ltt_ltt_id` int(11) NOT NULL auto_increment,
  `ltt_owner_id_fk` int(11) NOT NULL,
  `ltt_title_id_fk` int(11) NOT NULL,
  PRIMARY KEY  (`ltt_ltt_id`),
  KEY `ltt_owner_id_fk` (`ltt_owner_id_fk`),
  KEY `ltt_title_id_fk` (`ltt_title_id_fk`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;


ALTER TABLE `lawyers_to_titles`
  ADD CONSTRAINT `lawyers_to_titles_ibfk_1` FOREIGN KEY (`ltt_owner_id_fk`) REFERENCES `owners` (`own_owner_id`) ON DELETE CASCADE ON UPDATE CASCADE,
  ADD CONSTRAINT `lawyers_to_titles_ibfk_2` FOREIGN KEY (`ltt_title_id_fk`) REFERENCES `titles` (`ttl_title_id`) ON DELETE CASCADE ON UPDATE CASCADE;


CREATE TABLE IF NOT EXISTS `institution_types` (
  `ins_ins_id` int(11) NOT NULL auto_increment,
  `ins_name` varchar(45) collate utf8_unicode_ci NOT NULL,
  `ins_desc` varchar(256) collate utf8_unicode_ci NOT NULL,
  PRIMARY KEY  (`ins_ins_id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;


INSERT INTO `institution_types` (ins_ins_id, ins_name, ins_desc) VALUES (1, "", "Not an institution."), (2, "Ethnic institution", "Insert description of what this means."), (3, "Private institution", "Insert description of what this means."), (4, "Public institution", "Insert description of what this means.") ;


ALTER TABLE `owners` ADD `own_insttype_id_fk` int(11) default 1;
ALTER TABLE `owners`
  ADD CONSTRAINT `owners_ibfk_1` FOREIGN KEY (`own_insttype_id_fk`) REFERENCES `institution_types` (`ins_ins_id`) ON DELETE CASCADE ON UPDATE CASCADE;
Permalink 02:32:54 pm, by mholmes, 491 words, 109 views   English (CA)
Categories: Activity log; Mins. worked: 150

Meeting notes and latest task list

Met with JS-R and VG to discuss the next phase of data entry. These are the changes we need to make:

  • Rename the Titles ttl_price field to ttl_consideration.
  • Modify the local_classes.php file to reflect this change, and move the field to just before the mortgages.
  • Add two more integer fields right after it: ttl_declaredvalue and ttl_marketvalue.
  • Add these two fields to the local_classes.php file.
  • Add a new date field, ttl_transfer_date, to the Titles table, right after the existing date field.
  • Add this field to the local_classes.php file.
  • Add a text field for Head Office to the Owners table.
  • Add this field to the local_classes.php file.
  • Create a sellers_to_titles table, linked to the owners and the titles.
  • Create a lawyers_to_titles table, linked ditto.
  • Create one-to-many fields in the local_classes.php file for each of the above.
  • Create a new table called institution_type, with id, name and description.
  • Populate the table with 1. n/a (the default), 2. Ethnic institution, 3. Private institution, 4. Public institution.
  • Add a lookup field to the owners table which points at the new table.
  • The ethnicities table values have been abused to store information about institutions; we need to separate out that data. Copy data from the owners_to_ethnicities table to the owners table as follows:
    • If a record exists for a given owner with the ethnicity value 11 (ethnic institution), and there's also one with the value 10 (private institution), delete the latter, since the former supercedes it (for now).
    • For each record where the ethnicity value is 11, set the corresponding owner field to 2.
    • For each record where the ethnicity value is 10, set the corresponding owner field to 3.
    • For each record where the ethnicity value is 9, set the corresponding owner field to 4.
    • Delete all records in the owners_to_ethnicities table where the ethnicities values are 9, 10 or 11.
  • Update scripts such as copy_live_to_dev_db.sql to reflect changes in structure.

Coming down the pipe, there's also a requirement to parse a complex spreadsheet containing many hundreds of title records into the db somehow. I'm thinking this:

  • CSV to XML via a save from LibreOffice.
  • XML to more rational XML using XSLT; this would remove empty cells, expand spans, and generally clean up the structure.
  • XML to SQL using XSLT. This would generate a series of inserts for each row. The main problem is detecting whether (for instance) owners already exist in the db, and linking owners with titles. It would be necessary to insert the title first, perhaps using a special temporary field in which an XSLT-generated unique id could be inserted; then when the owner records are being constructed, we can first check for the existence of that owner, then insert if necessary; and then we can use the unique id for the current row to construct the links between titles and owners. Ditto with other relational data.

02/06/11

Permalink 10:26:49 am, by mholmes, 66 words, 1024 views   English (CA)
Categories: Activity log; Mins. worked: 5

Splitting out some ethnicities

Currently we're using ethnicity to store info which is not really suited to it (Private_institution and Public_institution). This info belongs in the institution field, but that's currently a boolean, so it needs to be turned into a lookup to another table. Then the data from the ethnicities field needs to be migrated to it, and deleted from the ethnicities tables, simplifying ethnicity and munging.

Permalink 10:20:10 am, by mholmes, 144 words, 85 views   English (CA)
Categories: Activity log; Mins. worked: 15

Rewriting ethnicity munging

Right now we have a system which munges ethnicities based on this algorithm:

(SELECT CASE
WHEN group_concat(ethnicities.eth_name separator ', ') LIKE "%Japanese%" AND group_concat(ethnicities.eth_name separator ', ') LIKE "%Chinese%" THEN "c"
WHEN group_concat(ethnicities.eth_name separator ', ') LIKE "%Japanese%" AND (SELECT COUNT(DISTINCT(ethnicities.eth_name))) > 1 THEN "b"
WHEN group_concat(ethnicities.eth_name separator ', ') LIKE "%Chinese%" AND (SELECT COUNT(DISTINCT(ethnicities.eth_name))) > 1 THEN "a"
WHEN (SELECT COUNT(DISTINCT(ethnicities.eth_name))) > 1 THEN "d"
ELSE SUBSTRING_INDEX(group_concat(ethnicities.eth_name separator ', '), ',', 1)
END) AS munged_ethnicity

However, there are now more ethnicities, including Private_institution, which need their own specific handling during the munge process. That algorithm needs to be rewritten, and in the process we should replace the anonymous variables with something more useful.

Permalink 09:48:45 am, by mholmes, 88 words, 93 views   English (CA)
Categories: Activity log; Mins. worked: 20

Tweak to property data view used in main table

On JS-R's instructions, simplified the view which creates the abbreviated pseudo-field in the titles table which contains property information:

RENAME TABLE `props_abbr` TO `props_abbr_old`;

CREATE VIEW `props_abbr` AS select `props`.`prp_property_id` AS `prpabbr_property_id`,concat(_utf8' B:',LPAD(`props`.`prp_block`, 3, '0'),_utf8' L:',`props`.`prp_lot`) AS `prpabbr_property_info` from `props`;

This also pads the block data with leading zeroes. We can't do that with the lot info because lot identifiers are not numeric.

26/05/11

Permalink 09:02:55 am, by mholmes, 123 words, 96 views   English (CA)
Categories: Activity log; Mins. worked: 40

Fix for prectitles table

It appears that we should not have based the titles_to_prectitles table on the traces data; this did include some links to previous titles, but also some title codes which were simply written on the record and were not necessarily intended to point to previous titles. Therefore this is what I've done:

  • Emptied titles_to_titles and restored it from a backup from two days ago (removing 214 entries generated by reference to the titles_to_prectitles data which is now suspect).
  • Emptied titles_to_prectitles and regenerated it from titles_to_titles using the SQL from the previous post. They are now exact mirrors of each other, and all the data comes from titles_to_titles, which is known to be good.

25/05/11

Permalink 09:23:17 am, by mholmes, 188 words, 99 views   English (CA)
Categories: Activity log; Mins. worked: 60

SQL to complete partial tables

Following my previous post, this is the SQL to complete the titles_to_prectitles table based on the content of the titles_to_titles table:

INSERT INTO titles_to_prectitles (ttp_title_id_fk, ttp_prectitle_id_fk) 
SELECT ttt_newer_title_id_fk, ttt_older_title_id_fk FROM titles_to_titles as ttt
WHERE NOT
EXISTS (
SELECT *
FROM titles_to_prectitles AS ttp
WHERE (
ttp.ttp_title_id_fk = ttt.ttt_newer_title_id_fk
AND ttp.ttp_prectitle_id_fk = ttt.ttt_older_title_id_fk
)
)

This results in the insertion of 647 new records into titles_to_prectitles.

The same operation can be done in reverse, to add 214 more records to titles_to_titles:

INSERT INTO titles_to_titles (ttt_newer_title_id_fk, ttt_older_title_id_fk) 
SELECT ttp_title_id_fk, ttp_prectitle_id_fk FROM titles_to_prectitles as ttp
WHERE NOT
EXISTS (
SELECT *
FROM titles_to_titles AS ttt
WHERE (
ttp.ttp_title_id_fk = ttt.ttt_newer_title_id_fk
AND ttp.ttp_prectitle_id_fk = ttt.ttt_older_title_id_fk
)
)

This results in both tables having 2193 records.

24/05/11

Permalink 08:41:45 am, by mholmes, 252 words, 91 views   English (CA)
Categories: Activity log; Mins. worked: 60

Completing data based on titles_to_titles

It appears that many forward links were created (titles_to_titles) from titles to subsequent titles which were not mirrored by corresponding backward links (titles_to_prectitles). Since we only used titles_to_prectitles in our views, many title progressions are not yet reflected accurately.

The solution is to create titles_to_prectitles links directly from the titles_to_titles table -- they should essentially be mirrors of each other. This should be possible fairly simply using SQL, and changes would immediately be reflected in the views. We should also do the reverse, ensuring that both tables are in sync.

This SQL gives us (if I'm correct in my assumptions) an overview of the situation we're in with regard to records missing from titles_to_prectitles:

SELECT ttt.*,
old_titles.ttl_title_code AS older_title_code,
new_titles.ttl_title_code AS newer_title_code
FROM `titles_to_titles` AS ttt
LEFT JOIN titles AS old_titles on ttt.ttt_older_title_id_fk = old_titles.ttl_title_id
LEFT JOIN titles AS new_titles on ttt.ttt_newer_title_id_fk = new_titles.ttl_title_id
WHERE NOT
EXISTS (
SELECT *
FROM titles_to_prectitles AS ttp
WHERE (
ttp.ttp_title_id_fk = ttt.ttt_newer_title_id_fk
AND ttp.ttp_prectitle_id_fk = ttt.ttt_older_title_id_fk
)
)
LIMIT 0 , 700

This can be extended to create tehe missing records in titles_to_prectitles, and then re-engineered to do the reverse and create any missing records in titles_to_titles.

28/04/11

Permalink 05:54:09 pm, by mholmes, 84 words, 92 views   English (CA)
Categories: Activity log; Mins. worked: 60

Tried temporary tables instead of views

In breaks between workshop sessions, I tried to address the problem of the length of time queries are taking by using temporary tables constructed directly in the query instead of joining to views. The approach I took (joining to a temporary table created in the join clause) had no effect -- the query took exactly the same time. But there may be a way to create a temporary table once earlier in the query, and then join to it later. Still working on this.

27/04/11

Permalink 04:40:29 pm, by mholmes, 136 words, 67 views   English (CA)
Categories: Activity log; Mins. worked: 45

Cleaned up owners_full and examined mega-view data

Reworked VW_owners_full so that it includes more records and data (a regular join changed to a left join):

DROP VIEW IF EXISTS owners_full;

CREATE VIEW VW_owners_full AS 
(
SELECT owners.*,
owners_to_ethnicities.ote_ethnicity_id_fk,
ethnicities.eth_name,
owners_to_titles.ott_title_id_fk AS title_id,
titles.ttl_title_id,
titles.ttl_title_code,
titles.ttl_date
FROM owners_to_titles 
LEFT JOIN owners ON owners.own_owner_id = owners_to_titles.ott_owner_id_fk
LEFT JOIN titles ON owners_to_titles.ott_title_id_fk = titles.ttl_title_id
LEFT JOIN owners_to_ethnicities ON owners.own_owner_id = owners_to_ethnicities.ote_owner_id_fk 
LEFT JOIN ethnicities ON owners_to_ethnicities.ote_ethnicity_id_fk = ethnicities.eth_ethnicity_id
order by owners.own_surname
)
Permalink 02:43:55 pm, by mholmes, 268 words, 63 views   English (CA)
Categories: Activity log; Mins. worked: 15

Updated the mega-view to include total institutional buyers and sellers

DROP VIEW IF EXISTS VW_trans_composite_eth_prop;

CREATE VIEW VW_trans_composite_eth_prop AS

(
SELECT 

seller_titles.ttl_title_id AS seller_title_id, seller_titles.ttl_date AS seller_title_date, seller_titles.ttl_title_code AS seller_title_code,
buyer_titles.ttl_title_id AS buyer_title_id, buyer_titles.ttl_date AS buyer_title_date, buyer_titles.ttl_title_code AS buyer_title_code,
DATEDIFF(buyer_titles.ttl_date, seller_titles.ttl_date) as seller_duration_days,
census_tracts.census_tract_code,
props.*,
sellers.concat_owners AS concat_sellers, sellers.concat_ethnicities AS seller_ethnicities, sellers.total_owners AS total_sellers,
sellers.munged_ethnicity AS seller_munged_eth,
sellers.total_institutional AS institutional_sellers,
buyers.concat_owners AS concat_buyers, buyers.concat_ethnicities AS buyer_ethnicities, buyers.total_owners AS total_buyers,
buyers.munged_ethnicity AS buyer_munged_eth,
buyers.total_institutional AS institutional_buyers

FROM titles AS buyer_titles
LEFT JOIN titles_to_prectitles ON buyer_titles.ttl_title_id = titles_to_prectitles.ttp_title_id_fk
LEFT JOIN titles AS seller_titles ON seller_titles.ttl_title_id = titles_to_prectitles.ttp_prectitle_id_fk
LEFT JOIN VW_titles_composite_eth AS sellers ON seller_titles.ttl_title_id = sellers.ttl_title_id
LEFT JOIN VW_titles_composite_eth AS buyers ON buyer_titles.ttl_title_id = buyers.ttl_title_id

LEFT JOIN props AS props ON seller_titles.ttl_property_id_fk = props.prp_property_id
LEFT JOIN census_tracts ON props.prp_census_tract_id_fk = census_tracts.census_tract_id

ORDER BY seller_titles.ttl_title_id
)
Permalink 02:41:41 pm, by mholmes, 282 words, 57 views   English (CA)
Categories: Activity log; Mins. worked: 30

Revised one view

Revised the VW_titles_composite_eth view so that it includes even owners who have no ethnicity (are institutional):

DROP VIEW IF EXISTS VW_titles_composite_eth;

CREATE VIEW VW_titles_composite_eth AS
(
SELECT 
titles.ttl_title_id,
titles.ttl_title_code,
titles.ttl_date,
CAST(group_concat(owners.own_owner_id separator ', ') AS CHAR) AS concat_owners,
SUM(owners.own_institutional) AS total_institutional,
CAST(group_concat(owners_to_ethnicities.ote_ethnicity_id_fk separator ', ') AS CHAR) AS concat_eth_ids,
group_concat(ethnicities.eth_name separator ', ') AS concat_ethnicities,
(SELECT IF (COUNT(owners.own_owner_id) > 1, 'multiple', 'single')) AS multi_owner,
(COUNT(owners.own_owner_id)) AS total_owners,
(SELECT CASE
WHEN group_concat(ethnicities.eth_name separator ', ') LIKE "%Japanese%" AND group_concat(ethnicities.eth_name separator ', ') LIKE "%Chinese%" THEN "c"
WHEN group_concat(ethnicities.eth_name separator ', ') LIKE "%Japanese%" AND (SELECT COUNT(DISTINCT(ethnicities.eth_name))) > 1 THEN "b"
WHEN group_concat(ethnicities.eth_name separator ', ') LIKE "%Chinese%" AND (SELECT COUNT(DISTINCT(ethnicities.eth_name))) > 1 THEN "a"
WHEN (SELECT COUNT(DISTINCT(ethnicities.eth_name))) > 1 THEN "d"
ELSE SUBSTRING_INDEX(group_concat(ethnicities.eth_name separator ', '), ',', 1)
END) AS munged_ethnicity
FROM titles
LEFT JOIN owners_to_titles ON owners_to_titles.ott_title_id_fk = titles.ttl_title_id
LEFT JOIN owners ON owners_to_titles.ott_owner_id_fk = owners.own_owner_id
LEFT JOIN owners_to_ethnicities ON owners.own_owner_id = owners_to_ethnicities.ote_owner_id_fk
LEFT JOIN ethnicities ON owners_to_ethnicities.ote_ethnicity_id_fk = ethnicities.eth_ethnicity_id
GROUP BY titles.ttl_title_id
)
Permalink 10:07:55 am, by mholmes, 19 words, 74 views   English (CA)
Categories: Activity log; Mins. worked: 30

More info needed for VW_trans_composite_eth_prop

The latest requirement is a field containing a comma-separated list of the owner names where the owners are institutional.

26/04/11

Permalink 09:41:38 am, by mholmes, 284 words, 75 views   English (CA)
Categories: Activity log; Mins. worked: 15

Expanded the large view

JS-R needed to add the full property information to the transactions view, so I've added a new view:

DROP VIEW IF EXISTS VW_trans_composite_eth_prop;

CREATE VIEW VW_trans_composite_eth_prop AS

(
SELECT 

seller_titles.ttl_title_id AS seller_title_id, seller_titles.ttl_date AS seller_title_date, seller_titles.ttl_title_code AS seller_title_code,
buyer_titles.ttl_title_id AS buyer_title_id, buyer_titles.ttl_date AS buyer_title_date, buyer_titles.ttl_title_code AS buyer_title_code,
DATEDIFF(buyer_titles.ttl_date, seller_titles.ttl_date) as seller_duration_days,
census_tracts.census_tract_code,
props.*,
sellers.concat_owners AS concat_sellers, sellers.concat_ethnicities AS seller_ethnicities, sellers.total_owners AS total_sellers,
sellers.munged_ethnicity AS seller_munged_eth,
buyers.concat_owners AS concat_buyers, buyers.concat_ethnicities AS buyer_ethnicities, buyers.total_owners AS total_buyers,
buyers.munged_ethnicity AS buyer_munged_eth

FROM titles AS buyer_titles
LEFT JOIN titles_to_prectitles ON buyer_titles.ttl_title_id = titles_to_prectitles.ttp_title_id_fk
LEFT JOIN titles AS seller_titles ON seller_titles.ttl_title_id = titles_to_prectitles.ttp_prectitle_id_fk
LEFT JOIN VW_titles_composite_eth AS sellers ON seller_titles.ttl_title_id = sellers.ttl_title_id
LEFT JOIN VW_titles_composite_eth AS buyers ON buyer_titles.ttl_title_id = buyers.ttl_title_id

LEFT JOIN props AS props ON seller_titles.ttl_property_id_fk = props.prp_property_id
LEFT JOIN census_tracts ON props.prp_census_tract_id_fk = census_tracts.census_tract_id

ORDER BY seller_titles.ttl_title_id
)

This takes even longer to generate than the original, unfortunately.

20/04/11

Permalink 03:52:52 pm, by mholmes, 268 words, 72 views   English (CA)
Categories: Activity log; Mins. worked: 45

Figured out the showstopper...

Had to use a separate view to aggregate owners against titles. It's working, but it's mighty slow!

DROP VIEW IF EXISTS VW_trans_composite_eth;

CREATE VIEW VW_trans_composite_eth AS

(
SELECT 

seller_titles.ttl_title_id AS seller_title_id, seller_titles.ttl_date AS seller_title_date, seller_titles.ttl_title_code AS seller_title_code,
buyer_titles.ttl_title_id AS buyer_title_id, buyer_titles.ttl_date AS buyer_title_date, buyer_titles.ttl_title_code AS buyer_title_code,
DATEDIFF(buyer_titles.ttl_date, seller_titles.ttl_date) as seller_duration_days,
census_tracts.census_tract_code,
sellers.concat_owners AS concat_sellers, sellers.concat_ethnicities AS seller_ethnicities, sellers.total_owners AS total_sellers,
sellers.munged_ethnicity AS seller_munged_eth,
buyers.concat_owners AS concat_buyers, buyers.concat_ethnicities AS buyer_ethnicities, buyers.total_owners AS total_buyers,
buyers.munged_ethnicity AS buyer_munged_eth

FROM titles AS buyer_titles
LEFT JOIN titles_to_prectitles ON buyer_titles.ttl_title_id = titles_to_prectitles.ttp_title_id_fk
LEFT JOIN titles AS seller_titles ON seller_titles.ttl_title_id = titles_to_prectitles.ttp_prectitle_id_fk
LEFT JOIN VW_titles_composite_eth AS sellers ON seller_titles.ttl_title_id = sellers.ttl_title_id
LEFT JOIN VW_titles_composite_eth AS buyers ON buyer_titles.ttl_title_id = buyers.ttl_title_id

LEFT JOIN props ON seller_titles.ttl_property_id_fk = props.prp_property_id
LEFT JOIN census_tracts ON props.prp_census_tract_id_fk = census_tracts.census_tract_id

ORDER BY seller_titles.ttl_title_id
)
Permalink 03:11:41 pm, by mholmes, 482 words, 65 views   English (CA)
Categories: Activity log; Mins. worked: 120

New views: some progress, one showstopper

I've completed two of the three tasks from this morning: the VW_sellers_duration view has been modified to rename its fields, and the VW_mortgate_ethnicity view has been created.

DROP VIEW IF EXISTS VW_mortgage_ethnicity;
CREATE VIEW VW_mortgage_ethnicity AS
(SELECT mortgages.*,
titles.ttl_title_id, titles.ttl_title_code, titles.ttl_date,
census_tracts.census_tract_code,
CAST(group_concat(owners.own_owner_id separator ', ') AS CHAR) AS concat_owners,
CAST(group_concat(owners_to_ethnicities.ote_ethnicity_id_fk separator ', ') AS CHAR) AS concat_eth_ids,
group_concat(ethnicities.eth_name separator ', ') AS concat_ethnicities,
(SELECT IF (COUNT(owners.own_owner_id) > 1, 'multiple', 'single')) AS multi_owner,
(COUNT(owners.own_owner_id)) AS total_owners,
(SELECT CASE
WHEN group_concat(ethnicities.eth_name separator ', ') LIKE "%Japanese%" AND group_concat(ethnicities.eth_name separator ', ') LIKE "%Chinese%" THEN "c"
WHEN group_concat(ethnicities.eth_name separator ', ') LIKE "%Japanese%" AND (SELECT COUNT(DISTINCT(ethnicities.eth_name))) > 1 THEN "b"
WHEN group_concat(ethnicities.eth_name separator ', ') LIKE "%Chinese%" AND (SELECT COUNT(DISTINCT(ethnicities.eth_name))) > 1 THEN "a"
WHEN (SELECT COUNT(DISTINCT(ethnicities.eth_name))) > 1 THEN "d"
ELSE SUBSTRING_INDEX(group_concat(ethnicities.eth_name separator ', '), ',', 1)
END) AS munged_ethnicity

FROM mortgages
LEFT JOIN mortgages_to_titles ON mortgages.mgg_mortgage_id = mortgages_to_titles.mtt_mortgage_id_fk
LEFT JOIN titles on mortgages_to_titles.mtt_title_id_fk = titles.ttl_title_id
LEFT JOIN owners_to_titles ON owners_to_titles.ott_title_id_fk = titles.ttl_title_id
LEFT JOIN owners ON owners_to_titles.ott_owner_id_fk = owners.own_owner_id
LEFT JOIN owners_to_ethnicities ON owners.own_owner_id = owners_to_ethnicities.ote_owner_id_fk
LEFT JOIN ethnicities ON owners_to_ethnicities.ote_ethnicity_id_fk = ethnicities.eth_ethnicity_id
LEFT JOIN props ON titles.ttl_property_id_fk = props.prp_property_id
LEFT JOIN census_tracts ON props.prp_census_tract_id_fk = census_tracts.census_tract_id
GROUP BY mortgages.mgg_mortgage_id)

I'm now working on the transaction view, but I've run into an issue variously known as the "Cartesian product" or "Cascading aggregates" problem: in simple terms, when I aggregate both the buyers and the sellers in a single row, I end up with multiple instances. For example, if there are two buyers and three sellers, then each buyer appears three times and each seller twice, for spurious totals of six buyers and six sellers.

What this means is that my strategy for munging buyers OR sellers works only when we're only looking at one of them; when we need to munge both of them in the same row, a different strategy has to be adopted. Right now, I don't know what that strategy might be, but I'm doing some research.

Permalink 09:31:57 am, by mholmes, 108 words, 71 views   English (CA)
Categories: Activity log; Mins. worked: 60

Plans for new views

Met with JS-R and worked through requirements for new views of the data:

  • Existing view should not be discarded; they're all potentially useful.
  • VW_sellers_duration needs some renamed fields: anything beginning with "old_" should be renamed "seller_", "new_" s/b "buyer_", and "duration_days" s/b "seller_duration_days".
  • A new view needs to be created like VW_sellers_duration, but rooted on the title instead; it should included munged seller ethnicity and munged buyer ethnicity.
  • A new view of mortgages is required, with 1 row per mortgage, including all the mortgage info, and the core title info, followed by the munged owners, and also incorporating property info.

18/04/11

Permalink 04:44:24 pm, by mholmes, 358 words, 73 views   English (CA)
Categories: Activity log; Mins. worked: 120

Working on a new view of titles with munged ethnicity

Started experimenting with building a view of the titles table which will include a sort of composite ethnicity column, as detailed in the preceding post. I now have the key owner and ethnicity data munged together in a view created like this:

CREATE VIEW VW_titles_composite_eth AS
(SELECT 
titles.ttl_title_id,
titles.ttl_title_code,
titles.ttl_date,
CAST(group_concat(owners.own_owner_id separator ', ') AS CHAR) AS concat_owners,
CAST(group_concat(owners_to_ethnicities.ote_ethnicity_id_fk separator ', ') AS CHAR) AS concat_eth_ids,
group_concat(ethnicities.eth_name separator ', ') AS concat_ethnicities,
(SELECT IF (COUNT(owners.own_owner_id) > 1, 'multiple', 'single')) AS multi_owner,
(COUNT(owners.own_owner_id)) AS total_owners,
(SELECT CASE
WHEN group_concat(ethnicities.eth_name separator ', ') LIKE "%Japanese%" AND group_concat(ethnicities.eth_name separator ', ') LIKE "%Chinese%" THEN "c"
WHEN group_concat(ethnicities.eth_name separator ', ') LIKE "%Japanese%" AND (SELECT COUNT(DISTINCT(ethnicities.eth_name))) > 1 THEN "b"
WHEN group_concat(ethnicities.eth_name separator ', ') LIKE "%Chinese%" AND (SELECT COUNT(DISTINCT(ethnicities.eth_name))) > 1 THEN "a"
WHEN (SELECT COUNT(DISTINCT(ethnicities.eth_name))) > 1 THEN "d"
ELSE SUBSTRING_INDEX(group_concat(ethnicities.eth_name separator ', '), ',', 1)
END) AS munged_ethnicity
FROM titles
JOIN owners_to_titles ON owners_to_titles.ott_title_id_fk = titles.ttl_title_id
JOIN owners ON owners_to_titles.ott_owner_id_fk = owners.own_owner_id
JOIN owners_to_ethnicities ON owners.own_owner_id = owners_to_ethnicities.ote_owner_id_fk
JOIN ethnicities ON owners_to_ethnicities.ote_ethnicity_id_fk = ethnicities.eth_ethnicity_id
GROUP BY titles.ttl_title_id)

This has the following values, as previously requested:

  • a = Mixed ethnicity with Chinese
  • b = Mixed Ethnicity with Japanese (include Japanese provisional)
  • c = Mixed ethnicity with Chinese and Japanese
  • d = Mixed ethnicity--ALL Others (all not included above, including mixes that include the Asia: other category but neither Japanese or Chinese, if they include both Asia other and Japanese or Chinese, categorize as indicated above)
  • ELSE--> copy the single ethnicity (or shared ethnicity) here

06/04/11

Permalink 03:43:40 pm, by mholmes, 139 words, 57 views   English (CA)
Categories: Activity log; Mins. worked: 20

New derived column for titles table

JS-R has made the following request for a new column in the titles table, which would handle a kind of munged-together value for all the owners' ethnicities. It would be a string field, presumably (unless we want to give these numbers to keep them short).

Maybe we need two new variables as characteristics of titles:

 1. Multiple or single owner:
      a. multiple
      b. single

 2. Ownership ethnicity (don't worry here whether they are individuals or groups):
     a. Mixed ethnicity with Chinese
     b. Mixed Ethnicity with Japanese (include Japanese provisional)
     c. Mixed ethnicity with Chinese and Japanese
     d. Mixed ethnicity--ALL Others (all not included above, including mixes that include the Asia: other category but neither Japanese or Chinese, if they include both Asia other and Japanese or Chinese, categorize as indicated above)
     e. ELSE-->  copy the single ethnicity (or shared ethnicity) here

01/04/11

Permalink 09:33:34 am, by mholmes, 402 words, 68 views   English (CA)
Categories: Activity log; Mins. worked: 90

Change to one view requirement

It turns out that this one should be buyers instead of vendors: Vendor ethnicity by mortgage institutional or not (we will fill in this data as we go) (with analysis possible by date and by sample). So here we go with:

Buyer ethnicity by mortgage institutional or not (we will fill in this data as we go) (with analysis possible by date and by sample)

Here's the SQL:

CREATE VIEW VW_buyers_and_lenders AS

(SELECT 
titles.ttl_title_id, titles.ttl_title_code, titles.ttl_date, 
owners.own_owner_id, owners.own_surname, owners.own_forenames,
buyer_ethnicities.eth_name AS buyer_ethnicity,
mortgages.mgg_mortgage_id, mortgages.mgg_doc_num, mortgages.mgg_value, mortgages.mgg_interest,
lenders.lnd_display_name, lenders.lnd_individual,
lender_ethnicities.eth_name AS lender_ethnicity

FROM titles
JOIN owners_to_titles ON titles.ttl_title_id = owners_to_titles.ott_title_id_fk
LEFT JOIN owners ON owners_to_titles.ott_owner_id_fk = owners.own_owner_id
LEFT JOIN owners_to_ethnicities ON owners.own_owner_id = owners_to_ethnicities.ote_owner_id_fk
LEFT JOIN ethnicities AS buyer_ethnicities ON owners_to_ethnicities.ote_ethnicity_id_fk = buyer_ethnicities.eth_ethnicity_id
JOIN mortgages_to_titles ON titles.ttl_title_id = mortgages_to_titles.mtt_title_id_fk
LEFT JOIN mortgages ON mortgages_to_titles.mtt_mortgage_id_fk = mortgages.mgg_mortgage_id
LEFT JOIN lenders_to_mortgages ON mortgages.mgg_mortgage_id = lenders_to_mortgages.ltm_mortgage_id_fk
LEFT JOIN lenders ON lenders_to_mortgages.ltm_lender_id_fk = lenders.lnd_lender_id
LEFT JOIN lenders_to_ethnicities ON lenders.lnd_lender_id = lenders_to_ethnicities.lte_lender_id_fk
LEFT JOIN ethnicities AS lender_ethnicities ON lenders_to_ethnicities.lte_ethnicity_id_fk = lender_ethnicities.eth_ethnicity_id)

I've included lender ethnicities and lender "individual", even though no data has been entered for those yet.

We now have six views, which I've renamed to make them more obviously views in the phpMyAdmin interface:

  • VW_buyers_and_lenders
  • VW_buyers_and_sellers
  • VW_owners_full
  • VW_sellers_duration
  • VW_trans_by_date
  • VW_trans_by_property

I think these cover all the current needs. VW_owners_full is just a view of all the core data about owners; I was intending to use it as part of other views, but didn't need to. I'll leave it there in case it's useful.

31/03/11

Permalink 02:12:07 pm, by mholmes, 266 words, 64 views   English (CA)
Categories: Activity log; Mins. worked: 60

Sellers by ownship duration

Created a new view for sellers with full details of the transaction, but also including ownership duration:

CREATE VIEW VW_sellers_duration AS

(SELECT 

old_titles.ttl_title_id AS old_title_id, old_titles.ttl_date AS old_title_date, old_titles.ttl_title_code AS old_title_code,
new_titles.ttl_title_id AS new_title_id, new_titles.ttl_date AS new_title_date, new_titles.ttl_title_code AS new_title_code,
DATEDIFF(new_titles.ttl_date, old_titles.ttl_date) as duration_days,
census_tracts.census_tract_code,
sellers.own_owner_id AS seller_id, sellers.own_surname AS seller_surname, sellers.own_forenames AS seller_forenames,
seller_eth.eth_name as seller_ethnicity

FROM titles AS new_titles
JOIN titles_to_prectitles ON new_titles.ttl_title_id = titles_to_prectitles.ttp_title_id_fk
JOIN titles AS old_titles ON old_titles.ttl_title_id = titles_to_prectitles.ttp_prectitle_id_fk
JOIN owners_to_titles AS sellers_to_titles on old_titles.ttl_title_id = sellers_to_titles.ott_title_id_fk
JOIN owners AS sellers ON sellers_to_titles.ott_owner_id_fk = sellers.own_owner_id
LEFT JOIN owners_to_ethnicities AS sellers_to_ethnicities ON sellers.own_owner_id = sellers_to_ethnicities.ote_owner_id_fk
LEFT JOIN ethnicities AS seller_eth ON sellers_to_ethnicities.ote_ethnicity_id_fk = seller_eth.eth_ethnicity_id 
LEFT JOIN props ON old_titles.ttl_property_id_fk = props.prp_property_id
LEFT JOIN census_tracts ON props.prp_census_tract_id_fk = census_tracts.census_tract_id)

30/03/11

Permalink 02:34:42 pm, by mholmes, 291 words, 61 views   English (CA)
Categories: Activity log; Mins. worked: 15

Buyers to sellers again...

This slightly amended version gets all buyers linked to a title even where other info is missing:

SELECT 
buyers.own_owner_id AS buyer_id, buyers.own_surname AS buyer_surname, buyers.own_forenames AS buyer_forenames,
buyer_eth.eth_name AS buyer_ethnicity,
census_tracts.census_tract_code,
titles.ttl_title_id, titles.ttl_title_code, titles.ttl_date, 
titles_to_prectitles.ttp_prectitle_id_fk,
sellers.own_owner_id AS seller_id, sellers.own_display_name AS seller_name,
seller_eth.eth_name AS seller_ethnicity

FROM owners AS buyers
LEFT JOIN owners_to_titles on buyers.own_owner_id = owners_to_titles.ott_owner_id_fk
LEFT JOIN titles on owners_to_titles.ott_title_id_fk = titles.ttl_title_id
LEFT JOIN titles_to_prectitles ON titles.ttl_title_id = titles_to_prectitles.ttp_title_id_fk
LEFT JOIN owners_to_ethnicities ON buyers.own_owner_id = owners_to_ethnicities.ote_owner_id_fk
LEFT JOIN ethnicities AS buyer_eth ON owners_to_ethnicities.ote_ethnicity_id_fk = buyer_eth.eth_ethnicity_id
JOIN owners_to_titles AS sellers_to_titles ON titles_to_prectitles.ttp_prectitle_id_fk = sellers_to_titles.ott_title_id_fk
LEFT JOIN owners AS sellers ON sellers_to_titles.ott_owner_id_fk = sellers.own_owner_id
LEFT JOIN owners_to_ethnicities AS sellers_to_ethnicities ON sellers.own_owner_id = sellers_to_ethnicities.ote_owner_id_fk
LEFT JOIN ethnicities AS seller_eth ON sellers_to_ethnicities.ote_ethnicity_id_fk = seller_eth.eth_ethnicity_id 
LEFT JOIN props ON titles.ttl_property_id_fk = props.prp_property_id
LEFT JOIN census_tracts ON props.prp_census_tract_id_fk = census_tracts.census_tract_id
ORDER BY buyers.own_surname
Permalink 02:24:18 pm, by mholmes, 43 words, 86 views   English (CA)
Categories: Activity log; Mins. worked: 10

New views required

Here are the latest requests:

  • ethnicities of all sellers (with info for date, census tract etc)
  • ethnicities of all buyers (with date etc)

I have code elsewhere on the blog which is close to the second of these. The first is more convoluted.

Permalink 11:52:19 am, by mholmes, 546 words, 65 views   English (CA)
Categories: Activity log; Mins. worked: 120

And the next view...

The next view is this one:

Vendor ethnicity by buyer ethnicity (with analysis possible by date and by sample) 

For this we need an individual "transfer" record which includes the following columns:

  • title id
  • title code
  • title date
  • preceding title code
  • preceding title id
  • census tract
  • vendor info (name, ethnicity)

I could start by creating a comprehensive view of owners, though; I could then join to this view in the creation of other views, making the process a bit more modular. Here's the sql:

DROP VIEW IF EXISTS owners_detail

CREATE VIEW owners_full AS (SELECT owners.*, owners_to_ethnicities.ote_ethnicity_id_fk, ethnicities.eth_name, owners_to_titles.ott_title_id_fk AS title_id, titles.ttl_title_id, titles.ttl_title_code, titles.ttl_date FROM (((owners RIGHT JOIN owners_to_titles ON owners.own_owner_id = owners_to_titles.ott_owner_id_fk) JOIN titles on owners_to_titles.ott_title_id_fk = titles.ttl_title_id) JOIN owners_to_ethnicities ON owners.own_owner_id = owners_to_ethnicities.ote_owner_id_fk) JOIN ethnicities ON owners_to_ethnicities.ote_ethnicity_id_fk = ethnicities.eth_ethnicity_id ORDER BY owners.own_surname)

However, I ended up building the entire thing as a single view, like this:

DROP VIEW IF EXISTS buyers_and_sellers

CREATE VIEW buyers_and_sellers AS
(SELECT 
buyers.own_owner_id AS buyer_id, buyers.own_surname AS buyer_surname, buyers.own_forenames AS buyer_forenames,
buyer_eth.eth_name AS buyer_ethnicity,
census_tracts.census_tract_code,
titles.ttl_title_id, titles.ttl_title_code, titles.ttl_date, 
titles_to_prectitles.ttp_prectitle_id_fk,
sellers.own_owner_id AS seller_id, sellers.own_display_name AS seller_name,
seller_eth.eth_name AS seller_ethnicity

FROM owners AS buyers
JOIN owners_to_titles on buyers.own_owner_id = owners_to_titles.ott_owner_id_fk
JOIN titles on owners_to_titles.ott_title_id_fk = titles.ttl_title_id
JOIN titles_to_prectitles ON titles.ttl_title_id = titles_to_prectitles.ttp_title_id_fk
JOIN owners_to_ethnicities ON buyers.own_owner_id = owners_to_ethnicities.ote_owner_id_fk
JOIN ethnicities AS buyer_eth ON owners_to_ethnicities.ote_ethnicity_id_fk = buyer_eth.eth_ethnicity_id
JOIN owners_to_titles AS sellers_to_titles ON titles_to_prectitles.ttp_prectitle_id_fk = sellers_to_titles.ott_title_id_fk
JOIN owners AS sellers ON sellers_to_titles.ott_owner_id_fk = sellers.own_owner_id
JOIN owners_to_ethnicities AS sellers_to_ethnicities ON sellers.own_owner_id = sellers_to_ethnicities.ote_owner_id_fk
JOIN ethnicities AS seller_eth ON sellers_to_ethnicities.ote_ethnicity_id_fk = seller_eth.eth_ethnicity_id 
JOIN props ON titles.ttl_property_id_fk = props.prp_property_id
JOIN census_tracts ON props.prp_census_tract_id_fk = census_tracts.census_tract_id
ORDER BY buyers.own_surname)

Right now this brings back 500 records, so I assume there are only 500 instances where we have a transfer from one person to another in which we know both the buyer and the seller. However, I need to check this carefully; I'll also probably have to loosen up the query to get us transfers for which less info is available.

Permalink 09:21:31 am, by mholmes, 428 words, 68 views   English (CA)
Categories: Activity log; Mins. worked: 90

Building the next view

The next view I've completed is the following:

Total transfers by property (sortable by decade and by sample)

because we have all the data for it.

  • We need a view with the following columns:
    • Property id
    • Some other property data -- columns from the props table
    • Census tract
    • Title id
    • Title code
    • Title date
    The first block comes from the properties table (and the census tracts table), and the second from the titles table. We want a row for each title (presumably), since we'll be including dates, and those only exist in the titles table; so we need a right join.
  • This code gets us a complete table of all the titles with property and census tract information included:
    SELECT props.*, titles.ttl_title_id, titles.ttl_title_code, titles.ttl_date, census_tracts.census_tract_code FROM (`props` RIGHT JOIN titles on titles.ttl_property_id_fk = props.prp_property_id) JOIN census_tracts on props.prp_census_tract_id_fk = census_tracts.census_tract_id ORDER BY titles.ttl_date
    
  • We then need to limit the results to transfers; that means limiting it to titles which have a preceding title; we can do this with a further right join, like this:
    SELECT props.*, titles.ttl_title_id, titles.ttl_title_code, titles.ttl_date, census_tracts.census_tract_code, titles_to_prectitles.ttp_prectitle_id_fk AS `prectitle_id` FROM ((`props` RIGHT JOIN titles on titles.ttl_property_id_fk = props.prp_property_id) JOIN census_tracts on props.prp_census_tract_id_fk = census_tracts.census_tract_id) RIGHT JOIN titles_to_prectitles on titles_to_prectitles.ttp_title_id_fk = titles.ttl_title_id ORDER BY titles.ttl_date
    
  • Turning that into a view gives us this:
    DROP VIEW IF EXISTS trans_by_property
    
    CREATE VIEW trans_by_property AS (SELECT props.prp_property_id, props.prp_property_code, props.prp_district, props.prp_plan, props.prp_block, props.prp_lot, titles.ttl_title_id, titles.ttl_title_code, titles.ttl_date, census_tracts.census_tract_code, titles_to_prectitles.ttp_prectitle_id_fk AS `prectitle_id` FROM ((`props` RIGHT JOIN titles on titles.ttl_property_id_fk = props.prp_property_id) JOIN census_tracts on props.prp_census_tract_id_fk = census_tracts.census_tract_id) RIGHT JOIN titles_to_prectitles on titles_to_prectitles.ttp_title_id_fk = titles.ttl_title_id ORDER BY props.prp_property_id)
    

Sent an XLS dump of this view to JS-R, and also created a user for him so he can read and export views himself.

28/03/11

Permalink 10:24:33 am, by mholmes, 214 words, 74 views   English (CA)
Categories: Activity log; Mins. worked: 60

Creating views for data manipulation

We're beginning the process of creating specific views which can be used for statistical analysis of the data. This is the set of views requested by JS-R:

  • Total transfers by decade by sample
  • Vendor ethnicity by buyer ethnicity (with analysis possible by date and by sample)
  • Vendor ethnicity by duration of ownership (with analysis possible by date and by sample)
  • Vendor ethnicity by mortgage institutional or not (we will fill in this data as we go) (with analysis possible by date and by sample)
  • Total transfers by property (sortable by decade and by sample)

Now we have to formalize those in terms of the various tables and fields that need to be involved. This is my first attempt at the first view:

CREATE VIEW trans_by_date AS SELECT titles_to_prectitles.ttp_ttp_id, titles.ttl_title_id, titles_to_prectitles.ttp_prectitle_id_fk, titles.ttl_title_code, titles.ttl_date, props.prp_census_tract_id_fk, census_tracts.census_tract_code FROM ((titles_to_prectitles JOIN titles on titles_to_prectitles.ttp_title_id_fk = titles.ttl_title_id) JOIN props on titles.ttl_property_id_fk = props.prp_property_id) JOIN census_tracts on props.prp_census_tract_id_fk = census_tracts.census_tract_id ORDER BY titles.ttl_date

24/03/11

Permalink 10:08:13 am, by mholmes, 325 words, 65 views   English (CA)
Categories: Activity log; Mins. worked: 90

Creating preceding title links

With help from JN, this is the SQL that creates links where there is a match:

INSERT INTO `titles_to_prectitles` (`ttp_title_id_fk`, `ttp_prectitle_id_fk`)  SELECT `tt2`.`ttl_title_id` AS `id_1` , `tt1`.`ttl_title_id` AS `id2`
FROM `titles` AS `tt1`
INNER JOIN `titles` AS `tt2` ON tt1.ttl_title_code = tt2.ttl_traces

This checks that there are no duplicate records in the linking table (in case the manual data entry had already created one of the ones we've just created):

SELECT ttp1.ttp_ttp_id, ttp2.ttp_ttp_id FROM titles_to_prectitles ttp1, titles_to_prectitles ttp2
WHERE ttp1.ttp_title_id_fk = ttp2.ttp_title_id_fk and ttp1.ttp_prectitle_id_fk = ttp2.ttp_prectitle_id_fk and ttp1.ttp_ttp_id < ttp2.ttp_ttp_id

This clears out the Traces field wherever these links have been created:

UPDATE titles ttl1 SET ttl_traces = "" WHERE ttl_traces IN (SELECT * FROM (SELECT ttl_title_code FROM titles ttl2) as codes)

Note that the nested subquery is necessary because MySQL won't allow you to use the table you're updating in a WHERE clause.

These are the results:

From a total of 2274 titles:

  • 1315 titles have at least one link to a preceding title.
  • 423 titles have more than one link to a preceding title.
  • 1526 links between a title and a preceding title exist.

1115 titles still have data in their Traces field. Some of these items might be linkable by a human. For instance:

  • There are two titles with the code 40821I.
  • Both have 40616I in their Traces field.
  • There is no title 40616I, but there are two candidates:
    • 40616I-24 (Property 169 B:16 L:30)
    • 40616I-30 (Property 059 B:11 L:24)
    • #17 Title 40821I is Property 167 B:16 L:32, so it doesn't look like it could be matched to anything.
    • #19 Title 40821I is Property 169 B:16 L:30 so it could be linked to 40616I-24.

Obviously these sorts of decisions can only be made by a human.

23/03/11

Permalink 04:51:33 pm, by mholmes, 70 words, 66 views   English (CA)
Categories: Activity log; Mins. worked: 20

More potential hits on preceding titles

This simpler query shows up more potential hits:

SELECT *
FROM `titles` AS `tt1`
WHERE EXISTS (

SELECT *
FROM `titles`
WHERE tt1.ttl_title_code = titles.ttl_traces
)

This query brings back the candidate ids:

SELECT `tt1`.`ttl_title_id` AS `id_1`, `tt2`.`ttl_title_id` AS `id2`
FROM `titles` AS `tt1` LEFT JOIN `titles` AS `tt2`
ON tt1.ttl_title_code = tt2.ttl_traces

We'll put more work into this tomorrow.

Permalink 03:55:22 pm, by mholmes, 224 words, 72 views   English (CA)
Categories: Activity log; Mins. worked: 150

Preceding title links: automation only partially practical

Came to automate the linking of titles to preceding titles as specified in the previous post, but found that with this data, manual intervention was essential. I initially isolated 38 candidate records like this:

SELECT `ttl_title_id` FROM `titles` AS `ts1` WHERE NOT EXISTS (SELECT * FROM `titles_to_prectitles` WHERE `ttp_title_id_fk` = `ts1`.`ttl_title_id`) AND EXISTS (SELECT `ttl_title_id` FROM `titles` AS `ts2` WHERE TRIM(`ts1`.`ttl_traces`) = `ts2`.`ttl_title_id`) ORDER BY `ts1`.`ttl_title_id`

But then I looked at each candidate individually, because in many cases, there was only a partial entry in the Traces field. For instance, looking at a record with "2144K" in the Traces field, we find there are two possible candidates for the precedint title, 2144K-19 and 2144K-21. We can select the correct one by checking the Property field, so (for instance) we can tell that if the Property field for the candidate record has 254 B:39 L:19, and 2144K-19 also has 254 B:39 L:19, then that's the correct preceding title.

With this method I was able to match nearly all the 38 candidates to the correct preceding record(s).

We end up with the following:

  • 491 links between titles and preceding titles
  • 288 titles which are linked to preceding titles
  • 301 titles which are preceding titles for other titles

Don't know how useful this will be.

17/03/11

Permalink 11:09:08 am, by mholmes, 124 words, 65 views   English (CA)
Categories: Activity log; Mins. worked: 10

Task: automated creation of preceding titles links

The manual work on preceding titles has been done, so the automated component can now be undertaken. This is the task:

  • Where there is only one entry in the ttl_traces field (no returns)...
  • ...and there is no entry in the Preceding titles field (meaning there are no records in the titles_to_prectitles table for this ttp_title_id_fk...
  • ...check to see if the entry in the ttl_traces field corresponds with the ttl_title_code of any existing record...
  • ...and if it does, then create a record in the titles_to_prectitles table linking the two.

Caveat: there are some items where there are brackets around one or more characters; we'll probably have to ignore those (and lookups will fail anyway).

25/02/11

Permalink 02:22:30 pm, by mholmes, 251 words, 67 views   English (CA)
Categories: Activity log; Mins. worked: 90

Splitting out surnames and forenames

Meeting with JS-R and his RA today, preparing for the next phase of data entry. Mostly this will be manual fixes to the existing records, but there are a couple of things I need to do.

I had to programmatically split out the forenames and surnames into separate fields from the display name field, in both the owners and lenders table. We agreed to split on the last space, and then fix the results manually. Here's how to get the surname:

SELECT RIGHT( `own_display_name` , LOCATE( ' ', REVERSE( `own_display_name` ) ) - 1 )
 FROM `owners`

and this is the bit preceding the surname:

SELECT LEFT( `own_display_name` , CHAR_LENGTH( `own_display_name`) - LOCATE( ' ', REVERSE( `own_display_name` ) ) )
 FROM `owners`

This is the update command for surnames:

UPDATE `owners` SET `own_surname` = (SELECT RIGHT( `own_display_name` , LOCATE( ' ', REVERSE( `own_display_name` ) ) - 1 )) WHERE CHAR_LENGTH(`own_surname`) = 0

and this is for forenames:

UPDATE `owners` SET `own_forenames` = (SELECT LEFT( `own_display_name` , CHAR_LENGTH( `own_display_name`) - LOCATE( ' ', REVERSE( `own_display_name` ) ) )) WHERE CHAR_LENGTH(`own_forenames`) = 0

These are the commands for lenders:

UPDATE `lenders` SET `lnd_surname` = (SELECT RIGHT( `lnd_display_name` , LOCATE( ' ', REVERSE( `lnd_display_name` ) ) - 1 )) WHERE CHAR_LENGTH(`lnd_surname`) = 0

UPDATE `lenders` SET `lnd_forenames` = (SELECT LEFT( `lnd_display_name` , CHAR_LENGTH( `lnd_display_name`) - LOCATE( ' ', REVERSE( `lnd_display_name` ) ) )) WHERE CHAR_LENGTH(`lnd_forenames`) = 0

Tested on dev, and implemented on live.

21/02/11

Permalink 01:03:34 pm, by mholmes, 694 words, 68 views   English (CA)
Categories: Activity log; Mins. worked: 120

Updates to db structure

Updates mentioned in the preceding post:

ALTER TABLE `props` ADD `prp_street_num` varchar(45) collate utf8_unicode_ci default NULL;
ALTER TABLE `props` ADD `prp_street` varchar(45) collate utf8_unicode_ci default NULL;

In class PropProperty:

      $this->addField(new MdhStrField('prp_street_num', 'Street number', '', array(), 45, true));
      $this->addField(new MdhStrField('prp_street', 'Street name', '', array(), 45, true));

New table for ethnicities for lenders:

--
-- Table structure for table `lenders_to_ethnicities`
--

CREATE TABLE IF NOT EXISTS `lenders_to_ethnicities` (
  `lte_lte_id` int(11) NOT NULL auto_increment,
  `lte_lender_id_fk` int(11) NOT NULL,
  `lte_ethnicity_id_fk` int(11) NOT NULL,
  PRIMARY KEY  (`lte_lte_id`),
  KEY `lte_ethnicity_id_fk_idx` (`lte_ethnicity_id_fk`),
  KEY `lte_lender_id_fk_idx` (`lte_lender_id_fk`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1 ;

--
-- Constraints for table `lenders_to_ethnicities`
--
ALTER TABLE `lenders_to_ethnicities`
  ADD CONSTRAINT `lenders_to_ethnicities_ibfk_1` FOREIGN KEY (`lte_lender_id_fk`) REFERENCES `lenders` (`lnd_lender_id`) ON DELETE CASCADE ON UPDATE CASCADE,
  ADD CONSTRAINT `lenders_to_ethnicities_ibfk_2` FOREIGN KEY (`lte_ethnicity_id_fk`) REFERENCES `ethnicities` (`eth_ethnicity_id`) ON DELETE CASCADE ON UPDATE CASCADE;

In class PropLender:

$this->addField(new MdhOneToManyField('lnd_ethnicities', 'Ethnicities',
                                'lnd_lender_id', 
                                'ethnicities',
                                'eth_ethnicity_id',
                                'eth_name',
                                'lenders_to_ethnicities',
                                'lte_lte_id',
                                'lte_lender_id_fk',
                                'lte_ethnicity_id_fk',
                                true,
                                'eth_name',
                                true));

Linking titles to preceding titles:

--
-- Table structure for table `titles_to_prectitles`
--

CREATE TABLE IF NOT EXISTS `titles_to_prectitles` (
  `ttp_ttp_id` int(11) NOT NULL auto_increment,
  `ttp_title_id_fk` int(11) NOT NULL,
  `ttp_prectitle_id_fk` int(11) NOT NULL,
  PRIMARY KEY  (`ttp_ttp_id`),
  KEY `ttp_title_id` (`ttp_title_id_fk`),
  KEY `ttp_prectitle_id` (`ttp_prectitle_id_fk`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=3157 ;

--
-- Constraints for table `titles_to_prectitles`
--
ALTER TABLE `titles_to_prectitles`
  ADD CONSTRAINT `titles_to_prectitles_ibfk_1` FOREIGN KEY (`ttp_title_id_fk`) REFERENCES `titles` (`ttl_title_id`) ON DELETE CASCADE ON UPDATE CASCADE,
  ADD CONSTRAINT `titles_to_prectitles_ibfk_2` FOREIGN KEY (`ttp_prectitle_id_fk`) REFERENCES `titles` (`ttl_title_id`) ON DELETE CASCADE ON UPDATE CASCADE;

In class PropTitles:

$this->addField(new MdhOneToManyField('ttl_prec_titles', 'Preceding titles',
                                'ttl_title_id', 
                                'titles',
                                'ttl_title_id',
                                'ttl_title_code',
                                'titles_to_prectitles',
                                'ttp_ttp_id',
                                'ttp_title_id_fk',
                                'ttp_prectitle_id_fk',
                                true, 'ttl_title_code', true));

My original intention was to populate this last "Preceding titles" field programmatically by parsing the data in the current Traces field, which should also be pointing back at previous title data. However, after looking at some of that data, I think it's going to have to be done by a human. Here's an example of why:

If you go to the Titles table, then choose the third property in the drop-down Property list (005 B:103 L:B), and filter by it, you'll see five results. These should, in theory, represent a succession of titles relating to the same property, by date; the Traces field for each one should point back to the preceding one. However:

  • The latest title is 561978L, from 1967. Its Traces field points back to 561975L, but there is no such title in the db.
  • The title preceding that by date, 403327L from 1959, points back to two separate titles, 272233L and 256807L. The former does not exist in the database.
  • The latter does, though, and this relationship seems logical, pointing back to a 1951 title, which in turn points back to another 1951 entry, which then points back to yet another entry, A.F.B. 26-148-4041E; but this last one has no date. We can (as humans) guess from this that the last one is the earliest, but its absence of date will make coming to that conclusion programmatically a bit of a stretch.

In short, writing code that does a reasonable job of gleaning the preceding title data from what exists in the Traces field is not going to be practical or reliable; a human should go through the data and work this out, I think.

Permalink 10:16:08 am, by mholmes, 555 words, 70 views   English (CA)
Categories: Activity log; Mins. worked: 150

Properties: questions to be answered through queries

Points arising out of a meeting with JS-R this morning:

The main realization that emerged for me was that the core of the research focus relates to transactions (events whereby one property is transferred from one owner to another); but transactions do not constitute datapoints in the records themselves. Transactions are primarily represented by titles. Each title has a date (which is the date ownership changed), and information about the new owners; but it doesn't have any information about the previous title or owners. Titles are related only in that they relate to the same property.

What we need to do is to generate a Transactions view which brings together all the data about each transaction: property, date, old title, new title, old owner(s), new owner(s), and ethnicities of both sets of owners. This is more complicated than it seems. I think the first stage is probably to create a field in the existing Titles table which hard-links to the previous title on the same property. This can be computed based on Property and date, but there is also the Traces field which should already contain a list of older titles on the same property. The first thing to do is to calculate the previous title based on property and date, and then compare it against the data in the text field to see if we have an acceptable degree of consistency. There are also issues regarding the unification of titles; one title may have multiple precursors rather than just one.

The following is a set of questions that we are going to attempt to answer through queries against the current database, using specially-constructed views, to test the difficulty of addressing research questions through a programmer writing SQL; these will then act as a testbed for an attempt to imagine ways in which an interface could be coded which would allow the researcher to frame such questions without the intervention of a programmer, and get tabular results.

  • How did the ethnic composition of owners shift over time (by sample)? (This probably needs a Transactions view which can be organized by date, owner ethnicity and census tract.)
  • Did owners sell to co-ethnics (by time, by sample, by ethnicity)? (Ditto -- Transactions view again.)
  • Did the Powell Street properties (CT 58.P) shift to Chinese owners in the 40s and 50s? (Ditto.)
  • Did absentee ownership increase over time? (This requires a definition of absentee ownership, which will have to be derived by examination of owner's address compared with property location; this is likely to be a boolean field (absentee or not), or a distance field (how far away the owner lives); the extra field will have to be added to the Titles table once this has been decided, and then completed by a human.)
  • Did sources of mortgage funding increase over time (by sample, by ethnicity)? This will require that mortgages be added to the Transactions view.)

Immediate tasks for me:

  1. Add two fields to the Properties table: Street # and Street Name (as in Owners table).
  2. Add an ethnicity field to lenders.
  3. Add a "previous title" field to Titles, and populate it programmatically.
  4. Create a Transactions View.

Meanwhile, JS-R and his RA will be adding ethnicity data to Owners and Lenders, and splitting out their names, as well as adding the new address info to Titles.

Properties

A database project to collect historical data on properties and titles.

Reports

Categories

2011
 << Current>>
Jan Feb Mar Apr
May Jun Jul Aug
Sep Oct Nov Dec

XML Feeds