Tuesday, 8 February 2022

More Wordle Statistics

My last post titled Wordle Statistics was long enough so I didn't want to add more newly found information to that and hence I'm making a fresh post. 3Blue1Brown has just created a YouTube video that involves a statistical analysis of Wordle.


There's a lot to digest in this video but my main takeaway after first viewing it was that CRANE was a good starting word! I clearly need to watch it again and again to fully absorb what he's saying. However, for today's Wordle I started with CRANE and the results were almost disastrous as can be seen in Figure 1.


Figure 1

Looking at Figure 1, it can be seen that I had a spectacular start with three letters in the correct positions. There were only two remaining letters to guess. However, I nearly failed because there were just so many possible words that could be made from *RA*E. 

Referring to kaggle, a database of English word frequencies, we can see that TRADE was a good second choice because it has by far the highest frequency. Had I known about word frequencies, my third choice would have been FRAME and I would have solved the puzzle in a mere three attempts.

CRANE: 4,888,961 FIFTH

                                        TRADE: 110,086,585 FIRST

                                        ERASE: 3,086,642 SIXTH

                                        GRACE: 17,642,126 THIRD

BRAKE: 9,321,885 FOURTH

                                        FRAME: 46,079,991 SECOND 

Using Google search with quotes e.g. "trade" yields the following statistics:

CRANE: 166,000,000 SIXTH

                                         TRADE: 1,930,000,000 SECOND

                                         ERASE:  242,000,000 FIFTH

                                         GRACE:  918,000,000 THIRD

 BRAKE:  503,000,000 FOURTH

                                         FRAME:   2,350,000,00 FIRST

Interestingly, using the Google search, FRAME and TRADE swap places with the former being markedly more frequent (in searches at least). CRANE and ERASE also swap positions in fifth and sixth places.

I downloaded the CSV file of word frequencies from kaggle (it's only 5MB) and filtered out words that were not five letters in length. Here are the initial five letter words with the highest frequencies:

about 1,226,734,006

other 978,481,319

which 810,514,085

their         782,849,411

there 701,170,205

first         578,161,543

would 572,644,147

these 541,003,982

click         536,746,424

price         501,651,226

state         453,104,133

email 443,949,646

world 431,934,249

music 414,028,837

after         372,948,094

video 365,410,017

where 360,468,339

books 347,710,184

links         339,926,541

years 337,841,309

As can be seen, ABOUT comes out clearly on top with a frequency of over 1.2 billion! This might not be a bad starting word. Anyway, more food for thought went tackling Wordle.

No comments:

Post a Comment