#22 λ+
The definite article “the” occurs 8,131 times in Project Hail Mary.
That’s 4.24% of tokens in the novel overall.
The definite article makes up 5.0% of the tokens from the narrator on Earth and 5.07% in Space.
Of the speakers who say “the” more than 20 times, it is 4.5% of Bob Redell’s spoken tokens, 4.0% of Dr. Lokken’s, 3.7% of DuBois’s, 4.0% of LeClerc’s, and 3.9% of Steve Hatch’s.
In Stratt’s speech it only makes up 3.2% and in Grace’s: 2.4% on Earth and 2.2% in Space.
Only 0.1% of Rocky’s tokens are the definite article, a measure of his broken English on which we will have more to say in a later beanbag.
Here’s a visualization of the relative frequencies across all sections and chapters of the book.
Each coloured bar represents a section and is colour-coded for Earth sections (green) and Space sections (purple).
The grey bars represent the chapters, which are marked with ticks and numbers.
The y-axis is the relative frequency, i.e. the proportion of tokens in that section (or chapter) that are “the” (so a bar might be higher, even with fewer occurrences, if the section or chapter is shorter).
Tokenization from spaCy and treats punctuation separately.