ISE reviews, SIP, bots etc.
Martin and I had a bit of a back-and-forth with Michael Best on the weekend regarding the new reviews section of the site, which relies on b2 evolution. Certain style elements were not showing up and at the time we were at a loss to understand why.
The answer was that the proxy itself introduces a kind of HTML tidy-up function that gets deployed automatically, presumably because cleaner code is easier for it to work with. The proxy basically writes a DOCTYPE and removes/alters non-conformant code so that the resulting document conforms (at least loosely) to the declared DOCTYPE. In practice the rewritten code is not conformant at all and is baffling when you don't know how it came to pass that the code you wrote isn't the code you see rendered.
This morning we spent some time figuring out what was going on and got sysadmin to rewrite the proxy to explicitly exclude rewriting of HTML. Here's the meat of the proxy:
============================================================================
<VirtualHost 142.104.xx.xxx:80>
SetEnv UVPHP_VERSION 4
DocumentRoot /path/to/www/
ServerName virtual.host.uvic.ca
<Location /resulting/proxied/URL>
ProxyPass http://URL/to/actual/location/of/your/code
ProxyPassReverse http://URL/to/actual/location/of/your/code
#SetOutputFilter proxy-html
ProxyHTMLDoctype XHTML Legacy
ProxyHTMLURLMap http://URL/to/actual/location/of/your/code(.*)
\ http://URL/to/your/proxied/directory$1 R
</Location>
</VirtualHost>
============================================================================
The significant part is this: "ProxyHTMLDoctype XHTML Legacy" which is the part that tells the proxy to leave the code alone.
In other ISE news, we discussed the hammering that SIP takes from bots and how we need to stifle that. We asked the ISE to implement a robots.txt that would limit access to SIP to only one bot (like google). The plan was acceptable and they are working on it now. We'll need to have sysadmin keep us up to date on its impact.