Beanbag #31 | Project Amaze!

#31 V̶I

Chapter 6 Facts

Number of paragraphs: 232
Number of sentences: 742
Number of tokens: 7,016
Number of unique tokens: 1,610

Number of speakers: 3
    Stratt : 778 tokens
    Grace : 478 tokens
    Computer : 27 tokens
Direct speech: 18.30% of tokens

Space: 2 sections; 75.77% of tokens
Earth: 1 sections; 24.23% of tokens

Words unusually frequent for Earth sections:
coma, primate, gin, patient, gene.
Words unusually infrequent or lacking for Earth sections:
he, light, am, Astrophage, I.

Words unusually frequent for Space sections:
Ceti, Tau, line, Petrova, star.
Words unusually infrequent or lacking for Space sections:
he, Rocky, his, Taumoeba, question.

For the sentences count, segmentation was performed using spaCy. Tokenization is just based on whitespace, em-dash, en-dash, and ellipsis delimiters. Unique tokens are case-insensitive.

Speaker identification was done manually.

Unusually frequent or infrequent words are based on log-likelihood of lemmas (lemmatization by spaCy).

chapter-facts