Collapsing of duplicate segs
We had a common situation in unprocessed files in which the same transcription would appear in multiple segs in the same pron, with different bibls. I've written XSLT to detect and collapse these instances, and after some testing and tweaking we ran it on the whole collection today. This puts us in a better position to do phonemicization of the segs, because we don't have to allow for duplicates during that process.
Phonemicization has also been extended to deal with R, and we've realized that we can add a phonemic seg based on the existing hyph (text content), so the code that already works for phr should be usable. I have to finish that bit, and also rewrite the phr stuff so that it does some collapsing of its own at the end of the process.