(Written 19.03.2022, published 04.08.2022. I can't be bothered to take the screenshots I wanted to put here...)
Have you heard about Wordle? It's a word-guessing game, and it's called that way because it was made by Josh Wardle. Apparently it's a rather popular fad game recently - or, at least, it was about two months ago, when I first heard about it from a colleague at work.
Wikipedia explains it better and more succinctly than I could: "Players have six attempts to guess a five-letter word, with feedback given for each guess in the form of colored tiles indicating when letters match or occupy the correct position."
So first things first, this science-themed (?) YouTube channel 3Blue1Brown has made a video on finding an optimal way to play the game. He writes an automated Wordle solver. It's all very impressive, but it's less actionable than what I'm about to write, and he delves into information theory, entropy, etc., and most of that flew straight through my head. Here's the video: https://www.youtube.com/watch?v=v68zYyaEmEA
At least I think that's a science channel. I've only watched one video of his before, explaining how blockchains work for the purpose of jumping on the cryptocurrency fad, many years after it exploded in value.
Alright so, a little bit of backstory. Wordle was familiar to me in gameplay. Before it, I had played a number-guessing game Mastermind. Just a couple times, but I did. Very similar in principle, but you had, I think about 10 guesses to discover a 4-digit (?) code. The tricky bit was, duplicates were allowed (I think - I don't remember us reading the rules so it's possible we just made our own), and you didn't get feedback you could map to individual digits - only the number of digits that matched the code, and the number of digits that were in the code, but have been input in the incorrect position.
Even further before that, I played a lot of the Atarix minigame in a Czech shoot-'em-up video game Jets'n'Guns. It was even simpler and very similar to Mastermind: you had a 4-digit code to guess. No duplicates were allowed. You got a coloured box feedback for every digit you input: red if a digit was not in the code, yellow if it was in the code, but in a different position, and green if it the digit matched the code. That system was very much like in Wordle.
(screenshot of the Atarix)
So, Wordle operates on a few rules. They are quite simply explained, but I'll write up a quick explanation here, too: instead of digit codes, you're guessing 5-letter words. The interesting new limitation is: guesses can only be legitimate words. You can't just type "AAAAA" as a guess, for example. You have six guesses. Because some 5-letter words have repeating characters in them, repetitions are allowed.
I had fun with Wordle when I started out. The catch is, however: you only get one word per day, so you can't play it too much. And the browser stores your guesses, so you only get 6 guesses that day, barring browser cookie and local storage manipulation. I failed my very first attempt at guessing. The word was "THOSE", and I came very close, with my sixth and final guess being "WHOSE".
(screenshot of my first guess)
Immediately I got to analysing what went wrong. Obviously the fifth guess "GHOST" had no chance of being correct, because I knew "E" was the final letter, but it was there to try reconfiguring the "O" and "S", as well as to try some new letters. As you can see, Wordle provides a convenient keyboard minimap that flags which letters have been tried, and with which result.
What really upset me was, I kept ignoring the results of the fifth letter guess, it seems. I had the word on a silver platter. I knew it had letters "S", "E", "O", "H", and "T" in there somewhere, yet somehow ignored the "T".
After that, I started musing to my colleague about what would be the more optimal way to play the game. I had a certain "meta" for cracking Atarix codes in Jets'n'Guns that never failed me and would always let me guess the correct code within the 6 or so allowed guesses. It was all about gathering information. With Atarix, I would open with "1234", then immediately follow up with "5678" if I didn't get all 4 digits in the first guess. That way I would quickly get information about eight out of the nine possible digits. I thought something like that should also exist for Wordle and should be possible to find.
There were some hurdles to overcome, however. For starters, words are not like digit codes, they follow certain patterns. Obviously, the letter distribution in words is not uniform. A word like "AAAAA" has a 0% possibility of being the code word, because it doesn't exist. Words are made out of syllables, and I think every syllable has a vowel sound at its core.
Now, funny thing is, as of writing this article, and using the HN Algolia search, I can see that Wordle only became noticed about four months ago, before it positively exploded with popularity. In the first threads, people were already thinking about ways to optimise their gameplay, and writing bots to make the game play and win itself. While I'm not interested in automatic solving, and formalising my thought processes as algorithms, I too was wondering what would be the best starting word.
I didn't just consider the vowels for my opener word, though. I knew from keyboard layout design, like for example I think the Dvorak keyboard, that there is a frequency to each letter in the English language. And that's what some ergonomic designs are built around, making it so that keys were sorted by frequency rather than... whatever the QWERTY keyboard is supposed to be optimised for. I wanted a word that would exhaust the most frequent letters in English.
I didn't want to do my own research, so I asked the search engine about "most common letters in the English language" (I always get anxious about using the search engine and formulating my query right), and got an ideal answer: https://listafterlist.com/most-common-letters-in-the-alphabet-used-in-the-english-language/ - I'll paste it here too, should the source ever go down before my website does:
E – 57; A – 43; R – 39; I – 38; O – 37; T – 35; N – 34; S – 29; L – 28; C – 23; U – 19; D – 17; P – 16; M – 15+; H – 15; G – 13; B – 11; F – 9+; Y – 9; W – 7; K – 6; V – 5; X – 1+; Z – 1+; J – 1+; Q – 1
These I think are percentages of words in which a given letter appears, rounded down to the nearest percentage. This is "according to a study done by AskOxford, using their Concise English Dictionary".
As you can see from that list, it was the right idea to consider the vowels first and foremost, as they generally have a very high frequency, with the most frequent letter and vowel by far being "E". With that list in mind, my first opener word I could come up with was "EARTH", because it takes care of the first three letters from the top of the list, and "T" is very frequent, too. Then, I thought of "RATIO", which checks letters 2nd through 6th. "IRATE" was a refinement of that idea, substituting "O" for the most common "E".
I showed the game to my partner and he liked it. Together, we started to ponder something I had in mind, but hadn't started work on: it's almost as if you could assign a score value to each letter according to its frequency. What, then, if you wanted to check the most valuable fifteen letters in just three guesses? That would leave you a leisurely three guesses to find the right word, with a lot of valuable information on your hands. You would only have letters like W, V, Q, J, Z, X, etc. left unchecked, which are unlikely to very unlikely to show up in a word. You will notice that this approach is very similar to what I did with the Atarix minigame in JnG, but taking letter frequency into account.
One set of words I could think of was "IRATE", "SOUND", "CLAMP". I basically scanned the list from top to bottom, and then tried to construct words that would take as many unchecked letters from the top as possible. If we convert letters to their frequency values, that would give "IRATE" a score of 212 points, "SOUND" 136 points, and "CLAMP" 82 points, for a total of 430 points. What I disliked was that the A in "CLAMP" is useless, because it's already been checked, therefore I did not count it in the word's score.
The issue constructing a good opener word triplet is that, like I said, words are built out of syllables, and they all have a vowel at their core. The vowels in the English language are, as I was told somewhere, A, E, I, O, and U. This is actually untrue, Y is a vowel, too, sometimes. That depends on its position in the word and how you pronounce it, e.g. I believe it's not a vowel in "yankee", but it is in "Clyde". Well, regardless, because of its unreliable status as a vowel, it's a rather uncommon letter and therefore it is justified not to worry about it too much. Anyway, there are only so many vowels you can put in three words, and you want them to draw from the "AEIOU" pool.
The dilemma, therefore, is which vowels do you use first for your word triplet. Out of the top five letters, a staggering four are vowels. I think it would be ideal to cover 1st through 5th letter in the first word, 6th through 10th in the second, and 11th through 15th in the third, but that's sadly simply not possible. There are no English words that consist solely of the letters T, N, S, L, and C - because there are no vowels here to build syllables out of. We are limited by what the language will allow us to construct. So we'll need to compromise. It would be great if we could at least cover the most common consonants in order, but that might be hard, too.
A word triplet we arrived to after some brainstorming was "SATIN" (179), "HORDE" (165), "CLUMP" (101) - 445 points in total, a perfect score. H is the fifteenth letter and pretty awkward to work with because of its relative rarity. I would have preferred to have it in the third word. And likewise, I would have preferred to have "E" in my first word, because it's the most frequent. As you can see, "SATIN" and "HORDE" have pretty similar scores, and not as high as "IRATE"'s 212.
There was also an additional word I prepared in the rare case the first three words would somehow not give us enough information about the code word: "GABBY" only uses three letters out of G, B, F, Y, W, but it tests the two most valuable ones and the final vowel. I, uh, I've also thought of "FAGGY", if swear words are allowed. It's similar, but substitutes F for B. Also not ideal.
What I never took into account was letter position. Someone claimed that testing "S" in the fifth spot is particularly useful, because it's often used there in plural forms. Maybe.
Our word triplet had to be tested. There are now numerous Wordle clones that let you play without the one code word per day limit. Using one of those clones, we played for an extremely long time, guessing 100 words. The results were promising. We would get most words in 4 or 5 guesses, with 3 or 6 guesses being rare. We would never guess a word on our first or second try, but that's okay and to be expected. In that 3Blue1Brown video I linked, the uploader says he'd heard someone describe Wordle performance in golf terms, calling guessing the code word in three attempts the "birdie" (1 less attempt below par) and four attempts the "par". So I guess 4 is the number of attempts you should aim for. Anyway, I should also mention that there were zero words we did not guess after six attempts, which was nice.
So, there you have it. That's my Wordle meta, lol. It might be a bit naive and not as good as it could be, but it usually gets you the word in 4 or 5 attempts.
By the way, today's (19.03.2022) word had me sweating, lol. I got it on my sixth attempt. It was "ALLOW". I only knew that letters A, O, L were in the word, and had no information about their position whatsoever except for L, because they all fall on the same spot in my word triplet. "GABBY" did not reveal any new letters in the code word. As a last resort, I attempted "FLAKE", just to see if F and K were in the word somewhere, even though I couldn't think of a word that would have those, as well as A and O, and L in the second place. It failed to reveal unused letters, and only told me that "A" was not in the third spot. To be honest, I could only mostly think of "ALOLA", which is not a real word, and is the name of the region in the seventh (I think) generation of Pokemon games, lol. To be fair, I had a huge amount of information about what letters were NOT in the code word, so my guess wasn't very wild at all, and was probably one of the very few remaining possible options.
SATIN - 179 points
HORDE - 165 points
CLUMP - 101 points
GABBY - covers some less-used letters if you still don't have enough information after the previous three wordsBack to the main page