Meeting with AC, JSR, and SA to discuss plans. More details to be worked out in the next few days.
Moved the oral history material to the new home1h filesystem, replacing the original folder with a symlink, so things should look identical from the point of view of uploaders. I've set default group with setgid, and added a few ACLs for individual users and groups in an effort to ensure that HCMC staff cannot be locked out of any material in this structure. We'll see how this works out over the next little while. There turned out to be a largish subset of the original files which were unreadable by us; these were reset or deleted by HR, enabling the move, but it's indicative of what happens in this sort of situation, and hopefully we'll be free of it now.
I went to the SDI conference this week, and heard about a project that geocodes BC addresses - it's one of the Data BC projects. I thought it might be useful to you folks. See it here
The people_in_prose.xml file was linked to the wrong schemas (tei_all), so it wasn't validating properly locally, but on the build it was being validated against the correct schemas and was broken. Fixed the schema links and the attribute error breaking the build.
Where there's no title date, the CSV generation process (the first one, creating the first CSV) now substitutes the transfer date if there is one. This is something we agreed on at the meeting, but it's now implemented. The only remaining thing arising out of the discussion this week is the requirement to elide transactions with nominal values. That's tricky. Waiting for feedback from JSR on an outline of the algorithm for that.
The spreadsheet using 2014-normalized dollars is now being produced on the build server.
The calculation is not actually very complicated, but parsing the spreadsheet turned out to be a bit tricky, and I think that code could be tightened up a bit to reduce risky assumptions that work in the case of our own spreadsheets but might not generalize very well. Things to think about:
- This of course normalizes $1 transactions, so we'll have to take account of that when we decide which transactions are for nominal amounts in later processing.
- Some transactions take place before 1914, the earliest year for which we have a rate, so I've done those all at the 1914 rate.
- There are 268 instances of zero dates (0000-00-00) which have consideration values. I think rather than fudging those, we should fill them in by revisiting the records. We could make a principled decision to use the transfer date instead, assuming there is one.
Meeting with MA and JS-R, with skype to WA and TW. Discussed a lot of issues around AtoM, with no resulting immediate work required from HCMC; also talked about the build process and diagnostics, as a result of which I have today:
- Integrated generation of the transaction-based spreadsheet into the build process. This is based on the work we did earlier to pre-generate data for SPSS use.
- Integrated three new diagnostic tests:
- A check for out-of-range dates, which throws up a few obvious errors (1066 etc.)
- A check for cases in which the Custodian acquires a property for a significant consideration (more than a dollar). This finds three cases, ids 1785, 2946 and 5813.
- A check for cases in which the Custodian sells a property for a nominal fee ($1 or less). This throws up five cases, 1029, 1171, 1639, 1966 and 3728.
We also planned out the extension of the spreadsheet generation to generate from the current version two more versions:
- One in which all amounts are normalized to their modern-day values, based on a lookup table to be supplied by JS-R.
- One in which rows involving nominal-sum transactions are elided, to provide only a chain of significant transactions.
These would be generated in sequence, so I can repurpose the CSV-parsing code I wrote the other day for CCAP to use XSLT to create them. An alternative would be Python, if I feel like making life difficult for myself.
For some reason the python script suddenly started complaining about indent issues, and I discovered there were some tabs instead of spaces. Corrected that, and also had to rewrite a bit so that directory creation could be done recursively, another point of failure which hadn't appeared before. Ran the script again, after removing all my existing mp3s, so the whole process took a long time. Hopefully this is now fixed and it'll keep working as intended...
That python script worked perfectly third time out. I'm now confident in it.
Got the name indexes working and linked. Had a meeting with JSR, SA and MK, and there are five action items arising out of that.