State of the Narratives Project
: Deniz Aydin
Minutes: 10
The narratives project includes information spread across multiple repositories. These repositories are:
- GitLab repo for the data scientific scripts (under the
export_scripts
subfolder), Gephi headless research (narratives_gephi/gephi_headless
), and the time graph processing scripts (time_graph
). export_scripts
subfolder has further subfolder strucutres. It has adata
andtest_data
subfolders, for the live DB data, and the testing data respectively. The root of theexport_scripts
also has the three Python scripts required for postprocessing DB dumps into Gephi-compatible CSV files. These scripts are calledstory_space_edges.py
,text_space_edges.py
, andfinal_postprocessing_story_space.py
, named after the functions they perform. Note thatstory_space_edges.py
andtext_space_edges.py
create BOTH a Gephi-readable file (i.e. including bare minimum number of columns required by Gephi’s processing), and a human-readable file with character names included. The human-readable file will only be used for the RAs and CB to proofread the relationships for correctness, completeness, and redundancy. Thefinal_postprocessing_story_space.py
deduplicates some relationships as follows: if there are multiple types of relationships between characters A and B, e.g., A has an exchange with B, A knows B, and A knows of B, only the most specific type of relationship will be kept. In order to do this, I’m making use of pandas’ Categorical feature, where I am ranking a certain column in order of relationship type, and then keeping only the catergory that has the highest relative importance. I have also added comments into the code explicitly describing the operations I am doing in there.- Gephi headless research subfolder probably has minimal working examples, and may not need to be kept long-term if the project does not move in this direction.
- Time graphs and the projection graphs (a total of 3 plots) are produced by
time_graph_working.py
. The code has a lot of specficity for the sample novel we did this for, which was The Sound and The Fury. In the future, this code needs to be reworked for whatever novel and characters need to be plotted. - There are 9 databases on our server that service the narratives project. Their code names are: DE (for Disappearing Earth), GS (A Visit from the Goon Squad), HP (Harry Potter), original (this has a mess of information from multiple books and sources, but it is to be kept for posterity), PnP (Pride and Prejudice), POD (Plague of Doves), TKAM (To Kill a Mockingbird), TO (Tropic of Orange), TT (There, there). As of writing, they are in various states of completion. There is also a CRON job that backs up the dump nightly on our server, in both XML and SQL forms. I would definitely keep the SQL dumps too, in case the DB needs to be reconstructed. There is also a need for CSV dumps of the
characters
,narrContainers
, andmenExs
tables, which will feed into the story spaces export script. narratives-adaptive-db-tests
repository on Gitlab has the preliminary changes I made to work towards a functioning global sort functionality.narratives-data-dumps-and-exports
has an old version of the Narratives Final Scripts repo, and is kind of borked. I am keeping it in case there is non-transferred information there, and once Narratives Final Scripts has been thoroughly tested, this repo can safely be nuked.narratives_pod
is a repository which has a working copy of the Adaptive DB code, but since the only changes made to it are the local_classes file and the creds, it might be unnecessary to keep in the long term.