etcl : write code to scrape DevMS wikibook and generate circleMagic output

26/04/12

Permalink 11:45:42 am, by sarneil, 240 words, 67 views   English (CA)
Categories: Activity log; Mins. worked: 360

etcl : write code to scrape DevMS wikibook and generate circleMagic output

Wrote a scraper php page which when it is opened in a browser: - queries the number of records in the Devonshire Manuscript wikibooks project to be processed and displays that number - scrapes each of the records in the Devonshire Manuscript project on wikibooks - generates an XML file constructed to work with the circleMagic player for that record - generates an htm file that includes an instance of a call to the player with the appropriate XML data file The XML is idiosyncratic and based on examples provided with the circleMagic code. CircleMagic can't handle an XML data file with more than 7 "source" elements (which in this implementation are used to identify contributors for that page). I included in the php code which comments out all source elements after 7 in any xml file, and displays a warning to the user as well as on the generated html page that displays the circleMagic player. CircleMagic's processing from the XML structure to the circular GUI is also idiosyncractic, but I've posted on that previously. Other potential constraints eventually imposed by the wikibooks API : - returns a maximum of 500 hits to the query asking for all the pages in the DevMS collection - returns a maximum of 500 hits to the query asking for the number of revisions to a page. The most revisions on any page so far is about 200, so it will be a while before that limit is reached.

Pingbacks:

No Pingbacks for this post yet...

Depts

This blog is for work done for academic departments which does not fall under other categories.

Reports

Categories

August 2014
Sun Mon Tue Wed Thu Fri Sat
 << <   > >>
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31            

XML Feeds