#85 VVI
Chapter 15 Facts
Number of paragraphs: 180
Number of sentences: 500
Number of tokens: 3,527
Number of unique tokens: 1,011
Number of speakers: 4
Grace : 467 tokens
Rocky : 439 tokens
Dr. Lamai : 322 tokens
Stratt : 218 tokens
Direct speech: 40.94% of tokens
Space: 3 sections; 74.40% of tokens
Earth: 1 sections; 25.60% of tokens
Words unusually frequent for Earth sections:
Lamai, blood, monkey, medical, develop.
Words unusually infrequent or lacking for Earth sections:
the, Astrophage, ’s, about, and.
Words unusually frequent for Space sections:
he, magnet, tunnel, science, fail.
Words unusually infrequent or lacking for Space sections:
if, Taumoeba, way, air, screen.
For the sentences count, segmentation was performed using spaCy. Tokenization is just based on whitespace, em-dash, en-dash, and ellipsis delimiters. Unique tokens are case-insensitive.
Speaker identification was done manually.
Unusually frequent or infrequent words are based on log-likelihood of lemmas (lemmatization by spaCy).