Archives for: April 2012

30/04/12

Permalink 05:04:44 pm, by mholmes, 143 words, 47 views   English (CA)
Categories: Activity log; Mins. worked: 360

Working on expanded transaction view

I've spent most of the day working on building the new transaction view in the form of a spreadsheet, and I think it's about 2/3 done. I have ethnicity-munging working in the XSLT, as well as munging of institution types for multiple institutions. In the process, I've found and fixed some data-entry issues (as always).

I now have a two-stage XSLT process. The first stage, working on the XML output from the db, takes several minutes; this generates a view of the database which is centred specifically on transactions, importing all relevant data for each transaction into the transaction itself. Once this is done, the second-stage process then processes that output to create a spreadsheet; this is much faster (a few seconds), and it's easier to work on and test the second process without having to run the first process over and over again.

27/04/12

Permalink 02:37:50 pm, by mholmes, 386 words, 72 views   English (CA)
Categories: Activity log; Mins. worked: 180

Generating candidate duplicate owners

I have a script currently generating a list of candidate duplicate owners. This is how it was done:

  • Export the owners table as XML.
  • Process with XSLT to sort by surname and forename fields.
  • Edit so each each record constitutes a single line.
  • Remove bracketing XML container, and add five blank lines.
  • Run the following script:
#!/bin/bash

#This script is designed to run a series of comparison tests of xml-encoded owner
#records in an attempt to discover possible duplicates, which are then to be investigated
#by the PI manually.


#Threshold below which to consider a possible dupe
MINSIM=0.1

#First, paths to files.
USM_JAR=/home/mholmes/WorkData/netbeans/uniSimMetric/dist/uniSimMetric.jar
NCD_COMMAND="ncd -l "


INPUTFILE=/home/mholmes/WorkData/history/stanger-ross/properties/xml/owners_12_04_27_flattened.txt
OUTFILE="/home/mholmes/WorkData/history/stanger-ross/properties/xml/owner_dupe_candidates_`date +%Y%m%d`.txt"

#Echo the start out to the output file.
echo "Possible duplicate owners found by string comparison using USM">$OUTFILE
echo "">$OUTFILE

#Initialize a counter
C=0
#Read in the inputs line by line
cat $INPUTFILE | while read line; 
do 
#Ignore empty lines. This ensures we can read five lines forward (there are five empty lines at the end of the file).
	let "C=$C+1"
	LEN=${#line}
	if [ $LEN -gt "3" ]; 
		then
		for ((N=$C+1; N<$C+6; N++))
		do
			STR2=`awk NR==${N} $INPUTFILE`; 
#Call the USM to compare them.
			USM=`java -jar $USM_JAR -compare -str1="$line" -str2="$STR2"`
#Call NCD to compare them
#			NCD=`$NCD_COMMAND "$line" "$STR2"`
#NCD outputs the second string on the command line before the score; we need to remove it.
#			NCD=${NCD/$STR2}
#If the threshold similarity is greater than the specified value, output info to the output file.
			if [[ "$USM" < "$MINSIM" ]];
			then
				echo "Found similarity"
				echo $line | sed -n 's/.*<owners><own_owner_id>\(.*\)<\/own_owner_id>.*/\1/p'>>$OUTFILE
				echo $STR2 | sed -n 's/.*<owners><own_owner_id>\(.*\)<\/own_owner_id>.*/\1/p'>>$OUTFILE
				echo "">>$OUTFILE
			fi
		done
	fi
		 
done

#Display the output file.
`gedit $OUTFILE`

echo "Done!"
exit

This is successfully producing a list of candidate matches right now, outputting the ids of the two candidates followed by a blank line, for each candidate match.

26/04/12

Permalink 03:21:44 pm, by mholmes, 28 words, 73 views   English (CA)
Categories: Activity log; Mins. worked: 180

More progress on XSLT

Isolated seven more problematic transactions, and fixed two of them (bad dates); started creating xsl:keys to speed up processing, and mapped tract and property information into transactions.

Permalink 09:31:56 am, by mholmes, 42 words, 62 views   English (CA)
Categories: Activity log; Mins. worked: 15

New request for fields in transaction view

Suggestion from JS-R:

Maybe three variables.

1. A "true"/"false" variable--is at least one of the buyers an institution?

2. If true, what is the institution type:
	(1) Ethnic
	(2) Private
	(3) Public
	(4) Multiple

3. If there is only one institution among the buyers, what is its name?

23/04/12

Permalink 09:12:27 am, by mholmes, 109 words, 58 views   English (CA)
Categories: Activity log; Mins. worked: 25

New field in Owners table

At JS-R's request, dealt with the problem of owner names which were prefixed with "Transfer: " in the following way:

Created a new column for this info:

ALTER TABLE `owners` ADD COLUMN `own_diff_trans_name` BOOLEAN DEFAULT False NOT NULL AFTER `own_display_name`;

Checked how many rows would be affected (139):

SELECT * FROM `owners` WHERE LEFT(`own_display_name` , 10 ) = "Transfer: ";

Moved the info from the display name field to the new field:

UPDATE `owners` SET `own_diff_trans_name` = True WHERE LEFT( `own_display_name` , 10 ) = "Transfer: ";

UPDATE `owners` SET `own_display_name` = REPLACE(`own_display_name`, "Transfer: ", "");

Checked the results:

SELECT * FROM `owners` WHERE `own_diff_trans_name` = True;

19/04/12

Permalink 11:29:55 am, by mholmes, 347 words, 70 views   English (CA)
Categories: Activity log; Mins. worked: 45

First new task done: enhanced transaction view

One task from the meeting the other day was to create a new view of transactions which adds a number of financial fields. I've now done that, using the following SQL:

DROP VIEW IF EXISTS VW_trans_composite_eth_prop_2;

CREATE VIEW VW_trans_composite_eth_prop_2 AS

(
SELECT 

seller_titles.ttl_title_id AS seller_title_id, 
seller_titles.ttl_date AS seller_title_date, 
seller_titles.ttl_title_code AS seller_title_code,
seller_titles.ttl_consideration AS seller_title_consideration,
seller_titles.ttl_declaredvalue AS seller_title_declaredvalue,
seller_titles.ttl_marketvalue AS seller_title_marketvalue,
buyer_titles.ttl_title_id AS buyer_title_id, 
buyer_titles.ttl_date AS buyer_title_date, 
buyer_titles.ttl_title_code AS buyer_title_code,
DATEDIFF(buyer_titles.ttl_date, seller_titles.ttl_date) as seller_duration_days,
buyer_titles.ttl_consideration AS buyer_title_consideration,
buyer_titles.ttl_declaredvalue AS buyer_title_declaredvalue,
buyer_titles.ttl_marketvalue AS buyer_title_marketvalue,
census_tracts.census_tract_code,
props.*,
sellers.concat_owners AS concat_sellers, 
sellers.concat_ethnicities AS seller_ethnicities, 
sellers.total_owners AS total_sellers,
sellers.munged_ethnicity AS seller_munged_eth,
sellers.total_institutional AS institutional_sellers,
buyers.concat_owners AS concat_buyers, buyers.concat_ethnicities AS buyer_ethnicities, buyers.total_owners AS total_buyers,
buyers.munged_ethnicity AS buyer_munged_eth,
buyers.total_institutional AS institutional_buyers

FROM titles AS buyer_titles
LEFT JOIN titles_to_prectitles ON buyer_titles.ttl_title_id = titles_to_prectitles.ttp_title_id_fk
LEFT JOIN titles AS seller_titles ON seller_titles.ttl_title_id = titles_to_prectitles.ttp_prectitle_id_fk
LEFT JOIN VW_titles_composite_eth AS sellers ON seller_titles.ttl_title_id = sellers.ttl_title_id
LEFT JOIN VW_titles_composite_eth AS buyers ON buyer_titles.ttl_title_id = buyers.ttl_title_id

LEFT JOIN props AS props ON seller_titles.ttl_property_id_fk = props.prp_property_id
LEFT JOIN census_tracts ON props.prp_census_tract_id_fk = census_tracts.census_tract_id

ORDER BY seller_titles.ttl_title_id
)

11/04/12

Permalink 02:05:24 pm, by mholmes, 145 words, 139 views   English (CA)
Categories: Activity log, Tasks; Mins. worked: 120

Meeting and new tasks

Met with JS-R and had long discussions about the next phase of the project, and immediate needs. Arising out of this, two short-term tasks for me:

  1. Find a way to generate candidates for identity between different owner records. Use USM to generate data from concatenation of all text fields in the owners table. Start by sorting the table by surname, and test each record against the following five records.
  2. Duplicate VW_trans_composite_eth_prop, but add titles.ttl_consideration, ttl_declaredvalue and ttl_marketvalue, and include only instances where ttle_consideration != NULL. Also, add a field that is TRUE where there's the same individual is one of the buyers and one of the sellers. Next, add a field which is TRUE where one of the buyers and one of the sellers share a surname, and neither of the two is own_institutional (family_transaction).

Properties

A database project to collect historical data on properties and titles.

Reports

Categories

April 2012
Sun Mon Tue Wed Thu Fri Sat
 << < Current> >>
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30          

XML Feeds