#14 VV
Chapter 3 Facts
Number of paragraphs: 272
Number of sentences: 722
Number of tokens: 6,009
Number of unique tokens: 1,549
Number of speakers: 9
Grace : 1413 tokens
Stratt : 1019 tokens
Abby : 42 tokens
Computer : 12 tokens
Larry : 8 tokens
Trang : 5 tokens
Grace’s Students : 2 tokens
Jeff : 1 token
Regina : 1 token
Direct speech: 41.64% of tokens
Earth: 2 sections; 71.61% of tokens
Space: 2 sections; 28.39% of tokens
Words unusually frequent for Earth sections:
dot, beanbag, Abby, cylinder, form.
Words unusually infrequent or lacking for Earth sections:
we, Astrophage, he, would, crew.
Words unusually frequent for Space sections:
sun, sunspot, image, velocity, kps.
Words unusually infrequent or lacking for Space sections:
he, we, Rocky, his, you.
For the sentences count, segmentation was performed using spaCy. Tokenization is just based on whitespace, em-dash, en-dash, and ellipsis delimiters. Unique tokens are case-insensitive.
Speaker identification was done manually.
Unusually frequent or infrequent words are based on log-likelihood of lemmas (lemmatization by spaCy).