Log in

HCMC Journal

CGWP: Discussions on staticizing

: Martin Holmes, Greg Newton
Minutes: 75

Over the past few days, GN and MH have been discussing options and approaches for staticizing CGWP. The data can be dumped in XML, but that XML file might be too large to process; there are also options for a TSV dump which might be more practical. From there, we could write a process to generate TEI, which could then be the basis for a lot of diagnostics that would provide fixes to be integrated back into the original DB. We could then process the TEI into a website, where each person would get an individual page.

The scale of the site is probably too great for the current incarnation of staticSearch to manage (Saxon would almost certainly run out of memory generating the JSON files), so this might be an opportunity to create a second processing chain for large sites in staticSearch, breaking up the JSON generation into phases so that large numbers of files don’t have to be in memory at the same time. We would also probably turn the titles JSON file into a much smaller numerical lookup, which could then be used to retrieve smaller blocks of titles as needed for hit display.

The timeline is fluid, but at the very least we should aim to be close to a working process before 2030.