How do I generate the time graphs I’ve seen you display?
: Deniz Aydin
Minutes: 60
I’ve written a little Python script that pulls in sample data contained in the Narratives repository. You need to clone the whole repo, and navigate to data_viz/time_graphs
This data come in two separate files: TheSoundandtheFuryTimeGraphJuly2024.xlsx and TheSoundandtheFuryTellingTimesJuly2024.xlsx. The first file has two tabs, called Events
and Pages
respectively. Events
includes data about the time period over which the event occurs, its description, and the people involved in the event (as Boolean values). The second sheet, Pages
, also contains the description of the event (although the relevant column is titled Event
in one sheet and Description
in the other), the narrator, and the pages that span the event. There is also a column that describes the telling time, but since the values in there are string descriptions of the telling time, this was not easy to plot. Instead, I’ve asked for telling time timestamps to be included in the events, and Corinne has sent me a new spreadsheet called TheSoundandtheFuryTellingTimesJuly2024.xlsx. This spreadsheet correlates the timestamps with the description of the telling time.
As for the Python script that does the plotting, the sequence of operations are as follows:
- Import the required modules/libraries at the top of the file (TODO: these need to be installed if not already present on the host system, especially
openpyxl
); - Explicitly change the OS parameter to Mac (TODO: this needs to be changed or commented out depending on what platform is being used);
- Load the data from the spreadsheet into three Pandas dataframes, one per each of the two tabs in the TimeGraph file and one for the TellingTime data set;
- Get the relevant column headers using
if "word" in header
syntax; - Change the name of the relevant column from
Description
toEvent
, so that cross-correlating tables is easier; - Join the two
Events
andPages
data by merging on theEvent
column to create a single dataframe; - For each row in the dataframe, create a scatter plot point to represent instantaneous events, and a line to represent events with a longer duration;
- N. B. the axes are replicated using a
twinx
statement so that two separate legends can placed on a single plot; - Set the legends, axes labels, and the like;
- Create another plot overlaid on the first by using the same axes handles.
- Mark the telling time for events with
markers.x
Note also that the line/marker colours for the scatter plot entries and the line plots indicate which narrator is speaking. So now, if you press play
, you should get the final output with all the data in a pop-up matplotlib window. You can move your cursor and read the data point values on the bottom, as they are displayed live.