HCMC Journal: State of the Narratives Project

State of the Narratives Project

24 February 2025: Deniz Aydin
Minutes: 10

The narratives project includes information spread across multiple repositories. These repositories are:

GitLab repo for the data scientific scripts (under the export_scripts subfolder), Gephi headless research (narratives_gephi/gephi_headless), and the time graph processing scripts (time_graph).
export_scripts subfolder has further subfolder strucutres. It has a data and test_data subfolders, for the live DB data, and the testing data respectively. The root of the export_scripts also has the three Python scripts required for postprocessing DB dumps into Gephi-compatible CSV files. These scripts are called story_space_edges.py, text_space_edges.py, and final_postprocessing_story_space.py, named after the functions they perform. Note that story_space_edges.py and text_space_edges.py create BOTH a Gephi-readable file (i.e. including bare minimum number of columns required by Gephi’s processing), and a human-readable file with character names included. The human-readable file will only be used for the RAs and CB to proofread the relationships for correctness, completeness, and redundancy. The final_postprocessing_story_space.py deduplicates some relationships as follows: if there are multiple types of relationships between characters A and B, e.g., A has an exchange with B, A knows B, and A knows of B, only the most specific type of relationship will be kept. In order to do this, I’m making use of pandas’ Categorical feature, where I am ranking a certain column in order of relationship type, and then keeping only the catergory that has the highest relative importance. I have also added comments into the code explicitly describing the operations I am doing in there.
Gephi headless research subfolder probably has minimal working examples, and may not need to be kept long-term if the project does not move in this direction.
Time graphs and the projection graphs (a total of 3 plots) are produced by time_graph_working.py. The code has a lot of specficity for the sample novel we did this for, which was The Sound and The Fury. In the future, this code needs to be reworked for whatever novel and characters need to be plotted.
There are 9 databases on our server that service the narratives project. Their code names are: DE (for Disappearing Earth), GS (A Visit from the Goon Squad), HP (Harry Potter), original (this has a mess of information from multiple books and sources, but it is to be kept for posterity), PnP (Pride and Prejudice), POD (Plague of Doves), TKAM (To Kill a Mockingbird), TO (Tropic of Orange), TT (There, there). As of writing, they are in various states of completion. There is also a CRON job that backs up the dump nightly on our server, in both XML and SQL forms. I would definitely keep the SQL dumps too, in case the DB needs to be reconstructed. There is also a need for CSV dumps of the characters, narrContainers, and menExs tables, which will feed into the story spaces export script.
narratives-adaptive-db-tests repository on Gitlab has the preliminary changes I made to work towards a functioning global sort functionality.
narratives-data-dumps-and-exports has an old version of the Narratives Final Scripts repo, and is kind of borked. I am keeping it in case there is non-transferred information there, and once Narratives Final Scripts has been thoroughly tested, this repo can safely be nuked.
narratives_pod is a repository which has a working copy of the Adaptive DB code, but since the only changes made to it are the local_classes file and the creds, it might be unnecessary to keep in the long term.