Parallel approach to generating match data
Posted by mholmes on 12 Dec 2018 in Activity log
Since the existing process was taking a very long time (1652 minutes for 25,600 records), I've devised a revised approach which allows for parallelization of the process. Basically, the driver ant file runs the XSLT with a special parameter that causes the XSLT to write a new temporary ant file as a parallel driver; then the first ant file calls the temporary one, and several processes are kicked off simultaneously. We're both running this overnight to see how fast it goes. There's obviously a lot of tuning we could do in terms of the task division, so we'll definitely be coming back to this, but since the default process will take up to two weeks, cutting it down is essential.