#191 V̶IV̶
How do Grace and Rocky differ in their key vocabulary?
Words key to Grace (overused compared to Rocky):
| Grace | Rocky | LL | |
|---|---|---|---|
| the | 568 | 9 | 251.8 |
| a | 351 | 7 | 148.0 |
| it | 358 | 25 | 86.8 |
| do | 308 | 30 | 55.0 |
| 're | 109 | 1 | 52.7 |
| 'm | 96 | 1 | 45.6 |
| 's | 85 | 1 | 39.7 |
| that | 254 | 28 | 39.1 |
| yeah | 92 | 3 | 33.5 |
Words key to Rocky (overused compared to Grace):
| Grace | Rocky | LL | |
|---|---|---|---|
| question | 6 | 208 | 533.7 |
| good | 60 | 76 | 60.9 |
| understand | 18 | 43 | 57.1 |
| no | 85 | 84 | 49.8 |
| ship | 43 | 56 | 46.2 |
| many | 10 | 30 | 45.0 |
| happy | 4 | 23 | 44.3 |
| amaze | 0 | 15 | 42.2 |
| save | 5 | 20 | 34.1 |
Rocky’s most distinctive word is question. It reflects his construction for asking questions.
As is typical for English speakers, Grace’s keywords are almost entirely function words: articles, contracted forms of “to be”, pronouns, etc. Rocky’s broken English does not make as much use of these.
Rocky’s keywords reveal his thought and emotions: good, happy, sad, understand, and amaze. He, of course, also exclusively uses the musical note symbols ♩, ♪, and ♫.
Log-likelihood (LL) measures how statistically surprising the difference in frequency is between two corpora. Higher LL = more distinctive. All values shown are significant at p < 0.001 (LL > 10.83).
Analysis is based on lemmas (lemmatization by spaCy) with punctuation excluded.
Grace has 16,951 tokens of direct speech; Rocky has 5,501.
I spend an hour every day studying Eridian vocabulary
12.144