Character encoding issues on Tomcat-dev through Apache
Investigation and testing related to the container-encoding setting in the new Cocoon build process led me to discover a bug that's currently affecting sites on Pear's Tomcat-dev when accessed through Apache. Here's an illustration of the problem:
If you go to the Mariage site search page on Pear, accessed on its Tomcat port, and search for "mariée", you'll get correct results. However, if you access the site through Apache and the virtual domain and do the same search, you'll get garbled results.
The problem seems to be this:
We build our recent Cocoon stacks as all-UTF-8, and set up Tomcat as well to use UTF-8, but it appears that the last stage in the process, when Apache talks to Tomcat, is not working in UTF-8. We've done a bit of research, and based on this page:
http://confluence.atlassian.com/display/DOC/Using+Apache+with+mod_jkTwo things may need to be changed:
- The AJP connector in Tomcat's conf/server.xml file may need to be tweaked to add a URIEncoding="UTF-8" parameter:
<!-- Define an AJP 1.3 Connector on port 8009 --> <Connector port="8019" protocol="AJP/1.3" redirectPort="8081" />
changed to:<!-- Define an AJP 1.3 Connector on port 8009 --> <Connector port="8019" protocol="AJP/1.3" redirectPort="8081" URIEncoding="UTF-8" />
- This needs to be added to the Apache configuration:
JkOptions +ForwardURICompatUnparsed
For the moment, this only applies to Tomcat-dev; Pear's Tomcat-stable is running legacy projects which operate in 8859-1 encoding, and they're working fine.
Wrote to sysadmin to request that they look at this and see if it makes sense.
