Where there's no title date, the CSV generation process (the first one, creating the first CSV) now substitutes the transfer date if there is one. This is something we agreed on at the meeting, but it's now implemented. The only remaining thing arising out of the discussion this week is the requirement to elide transactions with nominal values. That's tricky. Waiting for feedback from JSR on an outline of the algorithm for that.
The spreadsheet using 2014-normalized dollars is now being produced on the build server.
The calculation is not actually very complicated, but parsing the spreadsheet turned out to be a bit tricky, and I think that code could be tightened up a bit to reduce risky assumptions that work in the case of our own spreadsheets but might not generalize very well. Things to think about:
- This of course normalizes $1 transactions, so we'll have to take account of that when we decide which transactions are for nominal amounts in later processing.
- Some transactions take place before 1914, the earliest year for which we have a rate, so I've done those all at the 1914 rate.
- There are 268 instances of zero dates (0000-00-00) which have consideration values. I think rather than fudging those, we should fill them in by revisiting the records. We could make a principled decision to use the transfer date instead, assuming there is one.
Meeting with MA and JS-R, with skype to WA and TW. Discussed a lot of issues around AtoM, with no resulting immediate work required from HCMC; also talked about the build process and diagnostics, as a result of which I have today:
- Integrated generation of the transaction-based spreadsheet into the build process. This is based on the work we did earlier to pre-generate data for SPSS use.
- Integrated three new diagnostic tests:
- A check for out-of-range dates, which throws up a few obvious errors (1066 etc.)
- A check for cases in which the Custodian acquires a property for a significant consideration (more than a dollar). This finds three cases, ids 1785, 2946 and 5813.
- A check for cases in which the Custodian sells a property for a nominal fee ($1 or less). This throws up five cases, 1029, 1171, 1639, 1966 and 3728.
We also planned out the extension of the spreadsheet generation to generate from the current version two more versions:
- One in which all amounts are normalized to their modern-day values, based on a lookup table to be supplied by JS-R.
- One in which rows involving nominal-sum transactions are elided, to provide only a chain of significant transactions.
These would be generated in sequence, so I can repurpose the CSV-parsing code I wrote the other day for CCAP to use XSLT to create them. An alternative would be Python, if I feel like making life difficult for myself.
For some reason the python script suddenly started complaining about indent issues, and I discovered there were some tabs instead of spaces. Corrected that, and also had to rewrite a bit so that directory creation could be done recursively, another point of failure which hadn't appeared before. Ran the script again, after removing all my existing mp3s, so the whole process took a long time. Hopefully this is now fixed and it'll keep working as intended...
That python script worked perfectly third time out. I'm now confident in it.
Got the name indexes working and linked. Had a meeting with JSR, SA and MK, and there are five action items arising out of that.
Began work on building name indexes. First I split out the Streets tables into separate files, because the single file is much too big. Next I figured out how to assign unique ids to personal names in the HTML output, while incorporating info about the source document. Finally, I've built a very simple index of linked names from the HTML output, linking to the instances of names in the HTML documents. This is not quite working yet, but it's well on its way.
Dates were missing due to bug; started some minor prettying-up.
I've basically finished all but the cosmetics of creating usable readable output from the fishing boat ledger spreadsheet; there are now XML and HTML documents for each of the spreadsheet rows, and a tabular sortable index, and the original page-images are linked. Lots more work to do, now, looking at linking similar names together.
Met with AC and SA about the fishing boat ledger data. This is currently in an Excel spreadsheet, which is OK, but we want to make it more accessible, so I've now added a bunch of processing to the build which:
- Uses headless libreoffice to turn it into a FODS file (I had to install libreoffice-common, libreoffice-writer and libreoffice-calc on the server).
- Cleans up and expands the FODS file a bit.
- Generates an XML document from each record/row.
- Generates and HTML doc from each record.
Tweaked the XML CSS to make the XML readable; the HTML is very rudimentary and needs work. After that, I'll generate a bunch of indexes of various kinds (by name, by vessel, by date, by amount, etc.) or maybe a single table that contains columns for these, which is sortable.