Archives for: August 2011, 23

23/08/11

Permalink 12:16:00 pm, by mholmes, 65 words, 137 views   English (CA)
Categories: Activity log; Mins. worked: 30

Finding duplicate title codes

There are 117. Here's how I found them, for future reference:

SELECT a.`ttl_title_id`, b.`ttl_title_id`, a.`ttl_title_code`
FROM `titles` a, `titles` b
WHERE a.`ttl_title_code` = b.`ttl_title_code`
AND a.`ttl_title_id` <> b.`ttl_title_id`
AND a.`ttl_title_id` < b.`ttl_title_id`
ORDER BY a.`ttl_title_id`
LIMIT 0, 1000
Permalink 11:58:50 am, by mholmes, 199 words, 135 views   English (CA)
Categories: Activity log; Mins. worked: 60

Catching data entry errors using automation

Negative duration values, in the case of 11 records in the VW_trans_composite_eth_prop view, have revealed some data entry errors in the "preceding titles" and "related titles (newer)" fields. This suggests that there may be other such errors which are not apparent, because their dates happen to be in the correct sequence. Therefore we propose creating an automated check on the data, to locate such inconsistencies. This is an outline of how the check would be done:

  • For each distinct property:
    • List all the titles for that property in date order. For each title:
      • Check that its preceding-title links (there can be more than one) point to titles which precede it in date order.
      • Check that its related-title-newer links (again, multiple) point to titles which follow it in date order.

However, I see a problem with this. If a title is created as a result of a merger of two or more preceding titles, each of which has a different date, it will be impossible to tell whether there is an error or not. Ditto in the case of subsequent titles, in the case of a subdivision. Waiting for word from JS-R on how to handle this.

Permalink 08:21:53 am, by mholmes, 539 words, 147 views   English (CA)
Categories: Activity log; Mins. worked: 45

Spreadsheet data now inserted

I have now carried out the insertion, and constructed a detailed audit trail to pass on to JS-R and VG. Here's the opening section of the audit document (the rest is very long, and consists of precise records of every operation):


____AUDIT TRAIL FOR THE AUTOMATED INSERTION OF SPREADSHEET DATA____

Use this document to check the validity of inserted records and trace the source of any errors that may be found.


WHAT HAS BEEN DONE:

 - A new title record has been inserted for each record in the spreadsheet.

 - The title has been linked to its property. All properties referred to in the spreadsheet appear to have already existed in the database. so no new property records were created.

 - All owners recorded in the spreadsheet have either been:
      - discovered as already existing in the database, or 
      - inserted as new records into the database
   and each owner has been linked to their title(s).

 - A new mortgage record has been inserted for each mortgage in the spreadsheet.

 - All lenders recorded in the spreadsheet have either been:
      - discovered as already existing in the database, or 
      - inserted as new records into the database
   and each lender has been linked to their mortgage(s).

 - A simple algorithm has been used to determine whether owners and lenders are individual or institutional, and their records have been configured accordingly. I have strong confidence in this algorithm, but there may be occasional errors.

 - All "other documents" recorded in the spreadsheet have either been:
      - discovered as already existing in the database, or 
      - inserted as new records into the database
   and each document has been linked to its title(s).


WHAT HAS NOT BEEN DONE:

 - No attempt has been made to set values for the ethnicity of lenders or owners. This cannot be done in an automated fashion.

 - No attempt has been made to discover seller information for the new records. This is also best done by humans, but it's possible we might be able to automate it if time permits.



BACKUPS

I have backups of the database taken immediately before and immediately after the process.


AUTO_INCREMENT VALUES PRIOR TO INSERTION OF NEW RECORDS FROM SPREADSHEET

These values represent the FIRST id and LAST id of a record inserted as part of this process. Within this range, data in the table has been inserted by the automated process.

    TABLE               FIRST ID          LAST ID

titles                    2300             2738
owners                    1863             2292
lenders                    553              691
mortgages                  929             1306
other_docs                 388              518
owners_to_titles          9678            10363
mortgages_to_titles       3246             3623
docs_to_titles            1086             1328
lenders_to_mortgages      1130             1564

No new records were inserted in the props table; presumably all the new titles relate to properties already in the database.


FULL INSERTION RECORD

This is the complete record of record creation through the process. Use this to figure out exactly what happened in the case of any given record.

Operation # 1

Processing record with id # d1e1150
Found 1 property records.
Using property id #109
Inserted title record # 2300
Processing owners
Inserted owner id # 1863
Inserted owner-to-title link id # 9678
Inserted mortgage id # 929
Inserted mortgage-to-title link id # 3246
Processing lenders for this mortgage.
Inserted lender id # 553
Inserted lender_to_mortgage link id # 1130
Processing other documents.
Inserted other_doc id # 388
Inserted doc-to-title link id # 1086
Finished processing record d1e1150.

[...]

Properties

A database project to collect historical data on properties and titles.

Reports

Categories

August 2011
Sun Mon Tue Wed Thu Fri Sat
 << < Current> >>
  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31      

XML Feeds