Beanbag #20 | Project Amaze!

#20 λV

Chapter 4 Facts

Number of paragraphs: 260
Number of sentences: 752
Number of tokens: 6,441
Number of unique tokens: 1,571

Number of speakers: 11
    Grace : 1187 tokens
    Stratt : 580 tokens
    Michael : 40 tokens
    Abby : 40 tokens
    Harrison : 17 tokens
    Tamora : 13 tokens
    Luther : 12 tokens
    Computer : 11 tokens
    Trang : 10 tokens
    U.S. Army Soldier : 5 tokens
    Theresa : 4 tokens
Direct speech: 29.79% of tokens

Space: 4 sections; 49.56% of tokens
Earth: 2 sections; 50.44% of tokens

Words unusually frequent for Earth sections:
needle, water, Michael, lab, poke.
Words unusually infrequent or lacking for Earth sections:
he, Venus, crew, we, Mary.

Words unusually frequent for Space sections:
label, diagram, fuel, uniform, mission.
Words unusually infrequent or lacking for Space sections:
Rocky, he, we, question, yes.

For the sentences count, segmentation was performed using spaCy. Tokenization is just based on whitespace, em-dash, en-dash, and ellipsis delimiters. Unique tokens are case-insensitive.

Speaker identification was done manually.

Unusually frequent or infrequent words are based on log-likelihood of lemmas (lemmatization by spaCy).

chapter-facts