Analytics rabbit hole
While GoAccess provides a shed load of detail about a site's monthly performance it's frequently the case that we need more cumulative stats - how many hits did the site get least year? how about compared to the year before? how many hits so far this year?
So, I've been working on an 'executive overview' kind of representation of site data using XSLT and I've been having an interesting problem. Saxon errors out on some files with:
Failed to read input file file:/path/to/file.html (java.nio.charset.MalformedInputException): Input length = 1
Looking at a such a file using file --mime-encoding /path/to/file.html
produces something like 'file.html: us-ascii'
I suspect that the output is more a function of the characters that the 'file' application finds in a file. It's unlikely that GoAccess is intentionally encoding the files this way (reports usually come up 'us-ascii', but sometimes it's 'utf-8', sometimes it's 'iso-8859-1', and sometimes it's the literally unidentifiable 'unknown-8bit').
Looking at the specific point in the file that causes grief it looks like it's generally the result of intentionally malformed requests. GoAccess blithely includes the funky chars and XSLT coughs up a hairball.
So it looks like XSLT is out and I'll need to find another way of processing these files.