Maintenance

  • Home
  • Log in
  • « Homebrew macOS
  • Analytics update »

Analytics rabbit hole

Posted by gregster on 05 Oct 2021 in Servers, Activity log, Documentation

While GoAccess provides a shed load of detail about a site's monthly performance it's frequently the case that we need more cumulative stats - how many hits did the site get least year? how about compared to the year before? how many hits so far this year?

So, I've been working on an 'executive overview' kind of representation of site data using XSLT and I've been having an interesting problem. Saxon errors out on some files with:
Failed to read input file file:/path/to/file.html (java.nio.charset.MalformedInputException): Input length = 1

Looking at a such a file using file --mime-encoding /path/to/file.html produces something like 'file.html: us-ascii'

I suspect that the output is more a function of the characters that the 'file' application finds in a file. It's unlikely that GoAccess is intentionally encoding the files this way (reports usually come up 'us-ascii', but sometimes it's 'utf-8', sometimes it's 'iso-8859-1', and sometimes it's the literally unidentifiable 'unknown-8bit').

Looking at the specific point in the file that causes grief it looks like it's generally the result of intentionally malformed requests. GoAccess blithely includes the funky chars and XSLT coughs up a hairball.

So it looks like XSLT is out and I'll need to find another way of processing these files.

This entry was posted by Greg and filed under Servers, Activity log, Documentation.

Maintenance

This blog is the location for all work involving software and hardware maintenance, updates, installs, etc., both routine and urgent.
  • Home
  • Recently
  • Archives
  • Categories

Search

Categories

  • All
  • Announcements
  • Hit by a bus
  • Labs
    • Activity log
    • Documentation
  • Notes
  • R & D
    • Activity log
    • Documentation
  • Servers
    • Activity log
    • Documentation
  • Tasks

All blogs

  • Academic
  • AdaptiveDB
  • Admin
  • Announcements
  • CanMys
  • Cascade
  • CGWP
  • ColDesp
  • Depts
  • DVPP
  • Endings
  • HCMC Blogs
  • Landscapes
  • LEMDO
  • Linguistics
  • Maint
  • LondonMap
  • Mariage
  • MoM
  • Moses
  • Pro-D
  • Projects
  • ScanCan
  • HumsSites
  • Wendat

This collection ©2025 by admin • Help • Social CMS software