I now have an ant build file written, which grabs the XML dump from the db, processes it into the simpler XML my existing code is used to working with, and builds the Streets HTML file. It also validates all the existing XML before starting, something which threw up some horrible invalidities. There's a bunch of encoding that took place on the Haney prose data which abandoned our schema entirely and went for tei_all, which obviously caused much markup drift; I've now incorporated the needs of that encoding into our schema. The existing directory files also had lots of problems, so I've tightened up the schema and fixed those; everything now validates. I need to build a lot more Schematron into the ODD file.
But now we're good to go on building stuff like diagnostic checks and generational suggestions into the build process.