HCMC Journal: Sencoten 2024-11-07 to 2024-11-09

Sencoten 2024-11-07 to 2024-11-09

07 November 2024 to 09 November 2024: Martin Holmes
Minutes: 315

On Wednesday, started looking at the latest feature requests for the PDF. One thing I was able to do quite quickly was sequencing the items in the root-based index alphabetically under each root. I wasn’t able to move the root itself up to the top, unfortunately, because the processing and merging of near-duplicate entries is so complicated, but I’m not sure that’s a great idea anyway; waiting for SK’s input on that. Then I moved on to the request to add placenames into the Sencoten/English section of the dictionary. This proves to be remarkably complicated, because the placename spreadsheet is so different from the regular entry spreadsheet, and the entries themselves have to be processed in such an awkward manner. I did manage to cobble together a working mechanism for this, but it’s quite fragile, and as more requests are coming down the pipe, we may need to bit the bullet and rewrite quite a lot of the processing stages to get a more robust and maintainable build process, unfortunately.

On Friday, had a good shot at handling the multiple-root problem that we’re struggling with (more than one root encoded in the same column, but also in more than one instance of the same entry), and after a good while concluded that it’s simply not practical to work with the data in the format we have, and so I started building a second conversion process that works from the original spreadsheet to generate a sane, coherent, and de-duplicated dataset which will be properly usable.