CZ analysis completed for Douglas and Newcastle
I'm documenting this process in detail, because it took me a while to figure out, because it's a long time since I first did it under DH's guidance in the workshop, and in the meantime his spreadsheets have changed a little:
- Created a combined text set for Douglas and Newcastle in the Intelligent Archive, and generated a word list of the 4,000 most frequent words, using 2,000-word blocks.
- With another text set consisting only of the Douglas texts, generated a word-frequency table with the same block size, using the same 4,000 words generated in the first step. Copied that into a spreadsheet.
- With a third text set consisting only of the Newcastle texts, did the same as in the step above, and copied that into another spreadsheet. This is the core data we need for the operation.
- Moved to Windows (Arugula), because that's where we happen to have a copy of Office 2007. Started Excel and turned on macros (they're off by default).
- Opened the CraigZeta spreadsheet and saved it with a new name.
- Deleted the Author 1, Author 2, Author 1 Ind, and Author 2 Ind data (we don't need the Ind sheets for this calculation, since we have no unattributed texts).
- Created a couple of new sheets, Graph 1 and Graph 2, and moved the visible chart from the first sheet to Graph 1. Graph 2 will hold the generated chart at the end of the process, because we'll also want to move that away from the front sheet.
- Copy/pasted the Douglas data from step 2 into the Author 1 sheet.
- Copy/pasted the Newcastle data from step 3 into the Author 2 sheet.
- Ran the CraigZeta macro (View / Macros / View macros, then select it).
- Moved the generated chart from the front sheet onto the Graph 2 sheet.
- The results are now all there, but I wanted to get the top 200 Newcastle words with their scores and the top 200 Newcastle words with their scores, so I added another sheet for that, and set it up. The Douglas (Author 1) words are the first 200 in the list on the left; the Newcastle ones are those from around 3000 to 3200 (the macro re-orders the last 1,000 words so you can get them in order). Copy/paste the two sets of words into the new sheet.
- To copy/paste the scores, select and copy them, then click on the Paste down arrow to select "Paste Values", otherwise you'll just be pasting the formulas.
This I saved as an Excel 2007 macro-enabled sheet; I also wanted to make it portable, but that seems almost impossible. If you try to save as an older Excel format, it warns you that the sheets have too much data, so you'll lose some. You can export to ODS, but if you then try to open that in OpenOffice, much of the first sheet is borked (full of data elements in the msoxl namespace!). The graphs don't survive either. So I'll need to go back and print off the graphs, or save them as images, to work with them outside MS Office 2007.
As far as the results are concerned, they look as intriguing as the original Douglas/Lytton results, and remarkably similar. I might now try some comparative stuff to see whether Lytton and Newcastle can be distinguished, and I'll see how much correspondence we have between Lytton or Newcastle and other people, to see if we might find ways of contrasting the way they write to Douglas with the way they write to others.