I've built the generated Solr index into a zip file as part of the build, and BJ has the URL when he's ready to try ingesting it. I've spent a lot of the day writing documentation in the ODD file, and adding a build process for the ODD file that gives us half-decent HTML documentation. There's a fair bit more to do by way of documentation -- there's nothing on tagging practices, for example -- but what's there is good and helpful, and I've included it in the site, only linked from one place, but it's there to be viewed by anyone who has the URL. I will probably have to focus on other stuff starting Monday, but when I get free moments I can come back and add to the docs.
I've got the Solr search working and tested using my local install of Solr. The results are pretty effective, and I'm pleased, although this is really only a proof-of-concept. The next stage is to document this thoroughly, which I'm going to do in a rough draft of a paper I'll give somewhere at some point.
Over the weekend I set up a Google Custom Search for Graves, and integrated it into the site. I did some final bugfixing this morning and that now seems to be working.
Today I got back into working on the Solr search, and after a quick addition to the Solr index (which will have to be propagated to the Library's Solr servers if they agree to host the index), I've built a large "Advanced search" page which has lots of options for filtering by named entities and so on. I've got the form completed, and I've started on the JS class which will handle the actual search. I'm developing against my local install of Solr, which is working well.
After consulting with BJ, I've now got the Graves build process creating a collection of XML files optimized for Solr indexing with the default schema, and I've tested them with a local Solr instance. I'm confident I can build a nicely-faceted search page which uses a remote Solr backend to support rich queries with syntax highlighting. This was a little easier than I thought it was going to be. I think I like Solr quite a lot.
I now have the Graves site creating a Google-style sitemap, which is referenced in robots.txt. That's a prerequisite for setting up a Google search page, which should be fairly straightforward. I've also got a working Solr implementation on my computer and I've been testing and learning about it once again. I think the logical approach is going to be to create JSON for ingestion into Solr, which will give the ultimate in flexibility and enable us to create a faceted search interface. The Solr examples have a single file for each collection, but I think we're going to want to break it up; I'm not quite sure how to handle that yet.
A number of additional tweaks and feature updates to complete the local search part of this project:
- Search functionality tweaked to strip out accented characters in the search tokens.
- Search results now include a list of the tokens actually searched (not stopped or too short)
- Exist can now supply the simple local search at the search.html URL.
- Some nifty flex tweaks make the wrapping of the home page more friendly to keeping the search content together with its results.
- Everything is tested under https.
The local search functionality is now done, and it also incorporates document type and date range filtering. I've added an override to the eXist instance such that the eXist search page is shown when it's running on eXist, and that works fine too. The JSON files are 58MB, which is not a significant extra load. The only slight wrinkle I might go back and address is the handling of accented characters; there are some (because of Spanish etc.), but the Porter stemmer is not expecting any, so they seem to have their accents stripped. I've removed the accented character entry buttons from the local search page in view of this, but they're easy to put back if we figure out how not to lose them.
Ported the code over from the Keats site to generate the search JSON. The Graves site has about the same number of terms, but appears to generate an index only about two-thirds of the size of the Keats. I have everything working in the build process, but I still have to figure out what happens on the front-end to distinguish between the local and the eXist-based search.
The OAC folks wrote to say that they had deleted our site without backup (and apparently without warning); this turned out not to be the case, but just in case I went into the admin interface to get myself a backup, and discovered that's not an option. So I've curled the whole site just in case.
Got EC set up with endings SVN project for encoding of interview transcriptions. The schema and encoding strategies seem to be working well so far.