Problems with eXist, leading to serious instability
Starting about ten days ago, we began to experience mysterious errors when uploading XML data into the eXist db in the main Cocoon on Lettuce (this is the "cocoon" running in jakarta-tomcat, which hosts several major projects including Mariage, ScanCan, EMLS and the Devonshire MS. This is very worrying.
The problems appear to be caused by structural index corruption, a known issue with earlier eXist versions, but something that hasn't caused any problems for us before. The first symptom was that uploads to the db would fail with an error relating to proxy ids. The first couple of times it happened, we were able to restart Tomcat and solve the problem, but it's becoming more frequent. In addition, we now have a new problem, this one specifically affecting Mariage -- the search page fails to load because of this error: "Internal error: failed to store temporary doc fragment". Judging by posts on the eXist list, this is also related to corrupted indexes. Restarts haven't solved this so far.
This has sent Greg and I scrambling to figure out a strategy for migrating these projects to a more stable version of eXist. Unfortunately, there are a couple of stumbling blocks, which I encountered as I tried to build a new portable Mariage today. First of all, the current Cocoon+eXist distribution (with Cocoon 2.1.11 and eXist 1.2.6) has some unsigned jar files, so the admin client will not run; that makes it difficult to manage the database. An earlier version I used for ColDesp proves to have a bug which would also impoverish the Mariage search engine (when the ancestor axis is used in a search query, it fails to tag text matches with <exist:match>
tags, which means the searches return only links to the documents, not the list of hits within the documents.
We're now pursuing another option, which is to build our own Cocoon 2.1.11 + eXist 1.4 package following the instructions here. If this is successful, we'll document any departures from the procedure described, and we'll also start building a new portable Mariage with it, and then try running it in the development Tomcat; if that works, we'll try putting that into the main Tomcat, and pointing mariage.uvic.ca at it. That would be a possible strategy for the highest-profile projects with their own domains (Mariage and ScanCan), and perhaps removing those from the old stumbling Cocoon will enable it to recover some stability.
I've spent all day fighting with this. I did, in the process, fix two small bugs in Mariage (find.xq needed to be updated to take account of the # (hash) characters preceding values of the @facs and @corresp attributes, and some of the style settings were pointing at old CSS files in one of Greg's own directories). But that's not much consolation for a completely broken search page.