ISE static version: some progress, lots to do

Posted by on 24 Dec 2018 in Activity log

I've now also pulled the content from the DRE and QME sites. All three sets of data have the same range of issues, the most serious of which is a preponderance of hard-coded links to the domain (http://internetshakespeare.uvic.ca for example) rather than relative links. The WGET tool I used to pull the content should have been able to make those links relative, but in most cases it didn't, possibly because the content is not valid HTML so couldn't easily be parsed.

So I'm going to have to fix all those links using a script, which I'm writing in Python now. It's doubly tricky because of the mobile vs desktop issue, which results in links which look like this:

<a href="../../../frommobile.html%3Fto=http:%252F%252Fdigitalrenaissance.uvic.ca%252FFoyer%252Fcopyright.html">

So it will take a while before I can get this all normalized and working. Meanwhile, I don't yet have the list of AJAX files that need to be pulled from the server, so I won't be able to get those until MT sends it to me.

This entry was posted by Martin and filed under Activity log.