Archives for: July 2012

05/07/12

Permalink 12:27:51 pm, by mholmes, 61 words, 113 views   English (CA)
Categories: Activity log; Mins. worked: 5

Transaction-chain processing script

Just for the record:

There is a file called properties/xml/process_properties_201-07.sh, which when invoked at the command line with a parameter which is the XML dump of the db will run two XSLT transformations (so far) to produce "complete, enhanced" output in the form of another XML file, which includes transaction chains and lots of other stuff.

Permalink 11:52:02 am, by mholmes, 612 words, 114 views   English (CA)
Categories: Activity log; Mins. worked: 180

Early results from transaction-chain processing

These are some of the results coming out of the generation of transaction-chains through XSLT:

This is an example of what I'm pulling out so far, and the sorts of oddities that are being revealed:

<transaction-chain>
<title key="206" property-id="101" property-name="B:103 L:003"/>
  <transaction-chain>
     <title key="249" property-id="101" property-name="B:103 L:003"/>
     <title key="204" property-id="101" property-name="B:103 L:003"/>
     <title key="157" property-id="101" property-name="B:103 L:003"/>
     <title key="25" property-id="71" property-name="B:011 L:026"/>
  </transaction-chain>
  <transaction-chain>
     <title key="157" property-id="101" property-name="B:103 L:003"/>
     <title key="25" property-id="71" property-name="B:011 L:026"/>
  </transaction-chain>
</transaction-chain>

This shows nesting chains. Title 206 is the start of the initial chain; 249 is then split from it (while presumably 206 continues?). 249 becomes 204, then the split is re-joined: 157 has both 206 and 204 as preceding-titles.

I don't know if this makes sense -- can a title be split into itself and another title, as seems to be the case here with 206? There do seem to be lots of examples of this in the database.

My system currently captures splits like this well, but it doesn't yet unify chains which come back together again (so the two interior chains in the above example both have 157 -> 25). A subsequent transformation could easily detect such merges and represent them somehow, but it's not clear how. If we don't do that, then you would end up with two distinct chains:

  • 206 -> 249 -> 204 -> 157 -> 25
  • 206 -> 157 -> 25

This would be problematic if you were doing stats which depend on the number of transactions. We could, alternatively, collapse all chains of which one is a reduced subset of the other, so you would end up with just one here:

  • 206 -> 249 -> 204 -> 157 -> 25

However, this would ignore the fact that 157 has 206 as a preceding title. It's also not clear what should happen with chains which diverge but never re-unite, such as this:

<transaction-chain>
  <title key="606" property-id="211" property-name="B:039 L:005"/>
  <transaction-chain>
     <title key="507" property-id="211" property-name="B:039 L:005"/>
     <title key="421" property-id="211" property-name="B:039 L:005"/>
  </transaction-chain>
  <transaction-chain>
     <title key="510" property-id="214" property-name="B:039 L:008"/>
     <title key="422" property-id="214" property-name="B:039 L:008"/>
  </transaction-chain>
</transaction-chain>

Here you would conceivably have two distinct chains:

  • 606 -> 507 -> 421
  • 606 -> 510 -> 422

and any stats based on these would end up counting the sale of 606 twice (which might well be legitimate, because it is split, so there are arguably two transactions).

It's worth noting that in most of the complex chains I'm seeing, an initial split into two or more titles is then followed by their being re-united very quickly.

Some quick stats:

  • 713 primary chains exist (meaning that there are 713 chains which start from a title which has no preceding-title).
  • 411 of these primary chains go nowhere (in other words, there is no subsequent title, so no transactions take place other than the primary title purchase).
  • Therefore there are 302 instances of actual usable chains involving one or more sale.
  • 253 of those chains are simple, in that there are no splits. (There could be unions, though, because I'm not detecting those yet).
  • 53 of the chains split into sub-chains.
  • 28 of the chains involve more than one property.
  • 186 titles appear in more than one root transaction chain (suggesting there may be up to 100 merges between root chains, something distinct from the examples above where a root chain splits and then merges again).
  • 40 root chains feature the same title more than once (meaning that the chain splits, then merges again at some point).

04/07/12

Permalink 12:59:21 pm, by mholmes, 160 words, 103 views   English (CA)
Categories: Activity log; Mins. worked: 180

Processing chain and transaction-chain-building progress

Today I have:

  • Written a basic script to run the two saxon transforms on my original source data. This script will have more transformations added to it eventually, forming a full process from db output XML to CSV file for JS-R.
  • Added detection of liquidated properties and liquidated property controls (although I'm still working on data that doesn't have the required identification of purchasers to allow detection of actual liquidations -- still waiting on JS-R to add that to the db).
  • Implemented basic transaction-chain detection. This is remarkably slow, but does appear to be working. So far it's listing all titles in a single-title transaction chain. Next I need to do something when I reach a fork in the chain (perhaps generate a nested chain, which could be un-nested in the next transformation).

Moving forward. Tomorrow I should be able to finish transaction chains, and presumably get some idea from JS-R of what kind of output format he would like.

03/07/12

Permalink 02:29:44 pm, by mholmes, 89 words, 110 views   English (CA)
Categories: Activity log; Mins. worked: 120

Enhancing the complete data XML

I'm now working on a second transform to be applied to the result of the first. This one already detects sale-to-self situations (although it doesn't find any -- waiting for some known data from JS-R to see why) and possible family sales. I'm now working on building the transaction chains, but I'm not sure whether this can actually be done with XSLT or not, because you need to keep a tally of which items have already been processed, and I can't yet figure out a way to do that.

Properties

A database project to collect historical data on properties and titles.

Reports

Categories

July 2012
Sun Mon Tue Wed Thu Fri Sat
 << < Current> >>
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31        

XML Feeds