#210 V̶V̶ℓ
How complex is each character’s vocabulary in Project Hail Mary?
Using the VXGL word list, which maps English words to the US grade level at which they’d be expected to be known, we can score every word each character speaks and compare their vocabulary profiles.
The raw numbers are dominated by function words (the, I, is, to) that everyone uses equally so we focus on content words: nouns, verbs, adjectives, adverbs.
Mean VXGL grade level (content words only):
| Mean | Median | Content words | |
|---|---|---|---|
| Dr. Lamai | 4.17 | 3.96 | 146 |
| Dr. Lokken | 3.59 | 2.81 | 840 |
| Computer | 3.55 | 2.78 | 205 |
| Leclerc | 3.52 | 2.81 | 475 |
| DuBois | 3.52 | 3.00 | 260 |
| Dr. Browne | 3.26 | 2.90 | 155 |
| Stratt | 3.18 | 2.60 | 3,287 |
| Bob Redell | 3.13 | 2.53 | 444 |
| Narration | 2.99 | 2.28 | 50,037 |
| Dimitri | 2.92 | 2.25 | 405 |
| Steve Hatch | 2.84 | 2.14 | 445 |
| Grace | 2.73 | 2.00 | 6,951 |
| Marissa | 2.52 | 1.92 | 127 |
| Rocky | 2.47 | 1.87 | 3,029 |
The scientists (and Computer) sit at the top. Dr. Lamai (μ=4.17) uses the most complex vocabulary of any character—nearly a full grade level above the next cluster. Dr. Lokken, Leclerc, and DuBois form a tight scientific group around 3.5.
Stratt (3.18) lands between the scientists and Grace. She’s a bureaucrat, not a researcher, and her vocabulary reflects it: more formal than Grace, less technical than the scientists.
Grace’s narration (2.99) is notably higher than his speech (2.73). He thinks in bigger words than he speaks. This gap is consistent whether he’s on the Hail Mary or in flashback.
Grace in Space vs Grace on Earth:
| Mean | Median | |
|---|---|---|
| Grace (Earth) | 2.77 | 2.00 |
| Grace (Hail Mary) | 2.69 | 2.00 |
The difference is small: Grace’s vocabulary simplifies slightly aboard the Hail Mary (although we haven’t yet tested for statistical significance).
Rocky (2.47) has the simplest content vocabulary of any character with substantial dialogue. His English is limited, not just syntactically (as we’ve seen in his grammar) but lexically. Yet his mean is only 0.26 below Grace’s speech. An interesting further investigation would be whether his higher-grade vocabulary items tend to be technical terms (which seems likely).
Vocabulary levels are from the VXGL v1.4 list (Florea, 2024), which maps 126,413 English words to expected US grade levels (0 = kindergarten, 16 = fourth year of college). Coverage was 93–99% of tokens across all speakers.
“Content words” excludes ~120 common function words (pronouns, determiners, prepositions, conjunctions, auxiliaries, and common adverbs).
Speaker identification was done manually.
I learned your language. Give me some credit.
11.055