Monday 7 February 2022

Wordle Statistics

In my Pedagogical Posturing blog, I recently posted about Wordle but in post I want to focus on the statistics of the letter frequencies. Of primary interest of course is how frequently the various letters of the alphabet occur within the Wordle "universe". In A Mathematician's Guide to Wordle, it's explained that:

Wordle has both a ‘source’ word list and a ‘target’ word list. You can guess anything from the source word list (which is the 5 letter words from CSW19) ... The target word list is a small hand-curated subset of less than 2500 of these words.

CSW stands for Collins Scrabble Words that consists of 279,496 words. The article just quoted from goes on to say that:

It is well known that when you order letters by frequency of appearance in English words, you get ETAOIN SHRDLU. However ... if you take all 5 letter words in CSW19, you get SEAORY LTNUDY.

Figure 1 shows the full sequence of letters and their relative frequencies. The commentary on this chart includes:

  • In fact, they're essentially tied, S appearing only three more times than E in the 64,860 letters in the word list. That's a 0.0046 percentage point difference.

  • After those two, the distribution doesn't fall off that much. A is 9.2%, O is 6.8%, R is 6.4%.

  • The top 10 most frequent letters in the list make up 67% of the occurrences. That's all five vowels, plus the Wheel of Fortune consonants, R S T L N.

  • For those of you wondering about that "other" vowel, Y is 12th at 3.2%.

  • The least common 5 letters, Q X J Z V, combine for a mere 2.8% of the occurrences, about the same as the letter H, the 16th ranked letter.

  • The most common letter, S, is overwhelmingly more likely to be in fifth position (59%). Otherwise, it's more likely that it appears first (23%) than in any of the middle positions combined.

  • Vowels occur most commonly in the second position. That is unless you're E, which is more common in the fourth spot (35%) than second (24%). Again, "-ed" words are a plausible explanation.

  • For you Y fans: The pseudo-vowel shows up in the last spot 63% of the time when it appears.

  • Of the most common letters, L has the most level distribution, appearing most in the third position (25%) and least in the last position (14%).

  • Words in which letters appear twice make up a little more than 35.8% of the accepted word list.

Figure 1: source

Looking at Figure 1, it would seem that the AROSE would be a good starting point. Figure 2 shows the positions within a word of the most common letters.

Figure 2: source

Figure 2 shows that S occurs most frequently in the final position and so the previous choice of AROSE is perhaps not as good a choice as it first seemed. There are no permutations of this word that have S in the final position. Finally Figure 3 shows the frequency of letters occurring twice in a word compared with their overall frequency.

Figure 3: source

Given that both E are S more likely to occur twice in a word, it would seem that EASES might be good choice of starting word. Some other suggested starting words are:

There are a lot of factors to consider but this post is at least a start and will definitely improve my Wordle prowess.

1 comment:

  1. 818 Casino Ave. South, NY 13502 - Mapyro
    Find parking costs, opening hours and 통영 출장샵 a 서귀포 출장샵 parking 서귀포 출장마사지 map of 818 Casino Ave. South, NY 남양주 출장마사지 13502 광주 출장안마 located in Riverview at
