Indie game storeFree gamesFun gamesHorror games
Game developmentAssetsComics
SalesBundles
Jobs
Tags

Optimal starting guesses? I did an analysis and here's the result!

A topic by FurbyFubar created Jan 26, 2022 Views: 15,171 Replies: 27
Viewing posts 1 to 15
(2 edits) (+5)

The Wordle strategy of starting with "Siren, Octal, Dumpy" that I saw mentioned on a youtube channel I follow just nerd sniped me hard. The TL;DR is that I now think "Crape, Doily, Shunt" are three better opening guesses, especially for Dordle. Read on to find out why.

Since you have to find two words using one set of guesses in Dordle this mean that making use of the first few guesses to make progress of finding most letters in both words is even more important than in Wordle. So after hearing about starting with guesses of "Siren, Octal, Dumpy" I tried it out, and it very much increased my Dordle win rate. But that strategy was based on the frequency of the letters in English as a whole, and that got me thinking, wouldn't checking how common letters were the list of allowed answers be much more relevant? That's not the same list as the list of allowed five letter words to *guess*, so it's not just the same thing as limiting the dictionary to five letter words. This is because it would be frustrating to play if the answer was too often words they've never heard of.

So if I assume that I always want to use my first three guesses to find as many letters as possible, (if the first two guesses hasn't found all five letters), then optimizing the first word to have the most common letters isn't needed. I still optimize primarily on finding as many letters as possible, so that means finding three words using the 15 most common letters. But then I want to optimize those possible guesses on finding three words that will give an as high average for number of green squares found with those words. So I set to writing a quick and dirty JavaScript.

The first thing I did was count the number of times each letter was in the 2315 words allowed to be the randomly picked answers. I counted both for that letter in any position, and for that letter in each of the five positions. I found that if you're going to use just two words to guess, then "Siren, Octal" is not the correct way to go for this dictionary, even if we ignore finding green letters. The 10 most common letters in allowed answers are ACEILNORST. The word "Orate" contains all the 5 most common letters, but that leaves CILNS as letters 6-10 and that doesn't anagram to a word. Hang on, "orate" is the five most common letters, that doesn't have an S in it? Aren't plural forms of nouns allowed, so wouldn't S shoot up to near the top? Well, no. The game allows plural forms for guesses, but not for the answers! So for example, the word "Books" can never be the answer in Dordle. In fact, only 36 out of 2315 allowed answers end in an S! (This made me think my code was buggy, but the count tool in my text editor confirmed this.)

But since I was looking for three starting words to guess, what's more relevant for that search is that the fifteen most common letters in possible answers are: ACDEHILNOPRSTUY. Wordsmith's Internet Anagram Server gave me a list of 3223 triples of five-letter words that use all of these letters. https://new.wordsmith.org/anagram/ Yes, it's possible that Wordsmith uses a slightly different dictionary than the allowed guesses, but I decided to not worry about it.

I then ran all those triples through my script and counted how many green squares each triple would result in when run against every possible answer. The winning triple was "Crape, Doily, Shunt".

I then checked for the number of green letters given by each of those words, and got back that they should be guessed in the order "Crape, Doily, Shunt", the same order I randomly got them in first. But then I realized that if I know I'll use three words to search for letters *unless* the first two words got all five letters of the answer, then I want to optimize for doing this instead, so I searched for those words giving yellow or green squares. And a bit annoyingly the correct order for them was still "Crape, Doily, Shunt".

 It's also worth noting that while Crape, Doily, and Shunt are all legal guesses, only Shunt can ever be the correct answer.

Amusingly I just noticed that Crape, Doily, Shunt is an opening that beats Absudle, the adversarial Wordle variant in just 5 guesses, even though it's in no way optimized to be good against it.

Anyone have any feedback? Would you try to have optimized for something else instead or as well? Or do you think I missed something obvious?

(+3)

Also, if you want to share your daily results for either Dordle or Wordle with people who follow you online, having a standard starting guess or guesses means that telling anyone else about them means that the colored squares you share now becomes a spoiler for anyone who knows what your starting guess always/typically is. So you should pick at most one thing to share with the same set of people: Either your daily results OR your standard starting guess(es).  

Tried it and it works. CRAPE, DOILY, SHUNT, THING, THINK beats absurdle every time.

> Amusingly I just noticed that Crape, Doily, Shunt is an opening that beats Absudle ... even though it's in no way optimized to be good against it.

Isn't it though? Absurdle uses the same legal guess and answer lists as wordle, and absurdle is all about overlapping with as many words as possible, so wouldn't the math be the same? Seems like the best strategy for openers against absurdle would be words that contain all the most common letters and sequences to force the AI to discard words that have them

>"to force the AI to discard words that have them"

The surprising part for me was that they are so common that AI starts out by giving me green squares. But yeah, it makes sense in retrospect that this strategy also works well on Absurdle, but since it's not what I set out to do it still amused me that it did.

The words I've been using, which aren't well thought out but rather copied from a site that solves wordles by process of elimination start with "Raise, Could, Nymph".  This combination is able to solve Absurdle within 5 guesses, by going to Bunny next and then Funky.  Or Funky and then Bunny.  Either way it's 5.  But your explanation of Crape, Doily, Shunt makes perfect sense so I'm going to try that combo for the next week or so!

Hi, these are all interesting.  I started with 2 standard words (adieu and snort, also lymph).  These are fine but I prefer to just start the game with one word., any random word.  Use any letters I get to form a new word, etc.  This works even for dordle.

Also on the above, this makes you more disciplined.  I could usually figure out wordle in 3 or 4 lines but I messed around accumulating letters until the 2nd to last or the last line and then I would get it.  If you solve the first puzzle quickly, you have a good chance of getting both puzzles correct.

I just did two puzzles in a row, daily and unlimited. I got them both correct, both puzzles.  Do the first puzzle and don't worry about the second one.  Be strategic on the first, get it in 3 lines and you will have plenty of time ( and accumulated correct letters) to do the 2nd one.

Every "analysis" of the best starting words give you completely different words. I'm not even sure why they would say that siren or dumpy would be all that great. Just two vowels and the consonants are not all that common. Especially dumpy. Ideally, you want to get two words that get all the vowels and a lot of the Wheel of Fortune gimme letters (NSTLR) On my own, I came up with ALOUD and TRIES. Simple words. But it gets me all the vowels and all the NSTLR except for N. I've yet to fail on Wordle, and have only failed on Dordle when they throw a curveball at me with words with the same letter more than once or uncommon consecutive consonants like a GN, or FJ

Did you misunderstand the part of these three words *together* being the best three starting guesses? They *together* use all the 15 most common letters in the list of possible answers, and do so in a way that will result in (on average) more green squares than any other three words using those 15 letters.

I've quite clearly stated my methodology and the assumptions it made, so saying that "You want to start with two words" when the whole post was about finding the best triplet of words is a strange take.

For me, the best opening words are Roate, dings, plumb. This eliminates the most common letters and all vowels. 

The most common letters in English are not the same as the most common letters in the Wordle possible answers list. The 15 most common letters in the latter are all in crape, doily, shunt. That's the first point of this whole post. (The second point being that I also cared about what three words using all those 15 letters would give the most green squares on average.)

Also, all the vowels? You just make Y go cry in the corner again!

Using frequency order

E T A O I N S R H L D C U M F P G W Y B V K X J Q Z 
x x x x x x x       x   x                        
x x x x x x x x   x x   x x   x     x
x x x x x x x x x x x x x     x     x

first line is my 2 lead words  TENSE & AUDIO
second line is SIREN, OCTAL & DUMPY
third line is CRAPE, DOILY & SHUNT
I've had consistent success with Wordle using those two.  Third word varies with what the first two uncover.

This is a frequency order for English I'm guessing? So not for the Wordle possible answers list which is what I used?

Right.  I started playing 'hello wordl' = https://hellowordl.net/  and she uses a Scrabble dictionary as word source.  Then I found Wardle's site.  Only yesterday I downloaded his site and found the list of words hard-coded in the script.  So you found that, and counted frequency from there, huh?

Yeah, I just looked at the source code, the list's always been in plain text there. And even if it had been garbled in the code somehow for obfuscation, the JavaScript (being run client side) needs the word list to function, so with a browser's dev tools you could just tell the script to list the dictionary it has decoded, so any such obfuscation would be pointless. 

Some of them toward the end are scary -- words I never see or use, only found in Scrabble dictionaries.

E is in half the words in the win list. It's a wasteful letter to guess on! Save it for a later guess! This is a hill I will die on.

I play hard mode on Wordle, and faux-hard mode here. I've yet to lose Wordle in 30+ days of playing, and my losses in Dordle are few, given that I've played SO MANY TIMES.

But a binary search is the way to go when it's possible. Sure this isn't a sorted list, but starting by cutting the search space in half still sounds great to me as it's the closest we can get to a binary search? Guessing a single E will always remove half the possible answers from the list (with the info from just that one E), so that must be the most info we can (always) get from a single letter in the first word.

So yes, you will get more info if you guess and find another letter than E. But you will get less info in every case where you don't find that letter. On average, going for an as close split to 50/50 as possible will result in the fastest average search. For example, think of teaching a robot to looking up a word in a strange dictionary with one word per page. If we tell it to open the dictionary in the middle and look at the  word, then it can remove 50% of the book no matter what. For the next step it looks at the middle of the pages it was left with, and repeats this until it's either looking at the word it's looking for, or it tries to go half a page forward (and thus knows that the word it's looking for wasn't in the dictionary). This can always be done in O(log n) steps, where n is the number of pages in the dictionary. 

The analogy here is that guessing a letter that's in half the possible answers is like the robot guessing to look at the center of the dictionary first. If we guess an uncommon letter in Wordle that's in 10% of words, that's like programming the robot to opening the dictionary 90% towards the end and hoping that the word it's searching for still comes later in the dictionary, because then we've made the search even faster! Well, if it's lucky and its word is in the last 10% , then  we have made the that search faster. But we can't ignore that in 90% of cases the random word it's looking for comes in the first 90% of the book. (Yes, I know that a human would assume things about where the word they're looking for is expected to be in a normal dictionary and thus make their first guess better for normal dictionaries, but this is still how a program searches in an ordered list, because it's mathematically proven to be the fastest way to do it when you don't have info about what's in the list.)

I'll admit that it's still *possible* that a single word without an E could result in 5 different pseudo "binary lookups" that when combined (always or on average) cuts the list down more than the 5 lookups of any single word that includes an  E would, but that seems unlikely to me.

Of course, this whole argument is all ignoring any info green squares could give.

I'd go to a forum talking about Absurdle to see what starting guess(es) leave(s) the least *possible* answers given the worst possible luck (because someone is bound to have done that analysis already). I couldn't find it while trying to speed read on the games's own page https://qntm.org/absurdle And the first results I found with Google was some guy on a blog  NOT using Wordle/Absurdle's word list in their analysis,  so yeah -  Your mileage may vary.

I can't back my argument up with big o notation or such, but I don't feel like this truly compares to a binary search.  Is it because I only do hard mode?

(My gut feel is unscientific, and thus my hypothesis is pretty dann dubious, but let me provide one more useless piece of anecdotal data: have you seen the wordle solver that lets you put in the final word, and it tries to guess it? I match or beat it 4 out of every 5 times. But maybe it's not the best solver?)

Maybe it's this: In hard mode, you HAVE to use the letters you've found, right?

On my first guess, I find there is an E somewhere.

Now, because I have to use that E IN MY SUBSEQUENT GUESSES, I only have four spaces per guess now, and I still don't know where the E goes.

Blah. I'm not very convincing. Maybe what it comes down to is letter combos. Like if you rule out H, you also rule out CH, SH, TH, PH, WH, and GH. Those are two-letter combos ruled out for the price of one letter. With those combos gone, the likelihood of each companion letter goes down, too! (Or if you find an H, you're statistically way more likely to have one of those combos than not. (I think.))

But E... E is ubiquitous. It can go just about anywhere, with any adjacent letter.

In short, I think this game can't be simplified to binary searches. Words have patterns that shortcut things faster than a 50/50 search on one letter. That's my hypothesis.

concur: without the restricted wordlist and in hard mode, it does feel like locking in a letter hurts a lot; many of the popularised starts can easily get trapped after the first guess:

 * SIREN gets trapped by [B/C/D/F/K/L/M/N/P/S/T/V/W]INES

 * AROSE/SOARE gets trapped by RA[C/G/J/K/L/P/T/V/X/Z]ES (and maybe RARES/RASES) as well as S[C/E/H/N/P/T/W]ARE

 * SALET gets trapped by [D/F/H/M/P/R/V/W/Z]EALS (but not TEALS or SEALS)

 * ORATE/ROATE gets trapped by TA[B/K/L/M/P/S/V/W/X]ER (and maybe TATER)

most, if not all, of the "best strategy" presentations depend on the reduced solution list to avoid corridor traps ([B/D/E/F/H/L/M/N/R/S/T/W]IGHT is reduced to [E/F/L/M/N/R/S/T/W]IGHT for example; also almost all S-terminal plurals and present verbs are not on the list).  given that, by an exhaustive search posted by Alex Selby, the best word averages 3.42 guesses and the worst word averages 4.10 guesses (in normal mode), it's probably more important not to fall into any traps while discovering information on hard mode, since the worst average has 1.90 guesses to spare (noting though that the worst guess ends up falling back to the 'optimal' guess in half the cases, and fails to guess the word some of the time).

from the human side of things, digraphs probably do play a significant role in making guesses; if R, H, T, S, L, and N are all ruled out, for example, you can be fairly certain the word has at least two vowels, without having to guess the vowels and restricting further guesses.

it doesn't feel like a binary search is optimal in hardmode, since confirmation reduces the ability to get more information, unless the search space is sufficiently small (which is true with the restricted word list, but not as much with the full word list)

(personally, CRWTH is my starter of choice, followed with confirmed letters with S+L+P / S+L+N / S+N+D)

I absolutely agree that you have a good point if we're talking about hard mode! In hard mode finding letters is not only the goal of the game, but also something that hinders you in achieving that goal. So finding common letters later can clearly be an advantage then, especially if the strategy you're running is to set aside some number of guesses at the start for letter-hunting.

This is just by my gut feeling, but In English it feels finding  vowels is less helpful for narrowing down what letters can go next to it than finding a consonant, and especially an uncommon consonant. The only languages I really know are Swedish, English, and JavaScript, but it feels like the irregular spelling  of words in English vs their sounds mean that vowels can go anywhere they damn please given just one or two other fixed (green) letters around them. Whereas consonants following each other can do weird things on occasion, but if you have a word starting with T you know that if it's not a vowel coming up, it's pretty much for sure is going to be H, R or W. T's in the middle of a word add C and another T as other common possibilities. But for vowels, loan words with spellings from languages that treat vowels differently mean that such rules for are much fewer and much more frequently broken. Very often when I get stumped trying to figure out what word could possibly fit with just one unknown and four greens, the reason is that I overlooked the word is because I pronounced it wrong when I checked the possibilities. This is less of an issue (but can still happen) when I play word games in Swedish, and I don't think it's mainly because it's my first language; having had a spelling reform, even if it was in 1906, means that Swedish is a bit more regular in how it uses its vowels.

What's your definition of faux-hard?

Personally, any variant does indeed make Dordle much harder (especially if four letters are locked by guess 3).

Anecdotally I tried "(d)[o]ily" > "c[o]re(d)" > "p[o][u][n][d]" > "s[ound]" > "[mound]"... and was lucky the correct guess wasn't "hound" or "wound" because that would leave me with nothing for the other column (in this case, "do[i]ly", "co(r)(e)d", "(s)ound" gave me just enough to guess "(ar)[ise]" > "[raise]".

If any kind of "letter lock" is implemented, an extra guess slot should be added.

I personally like to have a two-word starter -- broad heist -- rather than try to use a 3rd guess (which usually doesn't go according to plan in hard mode).

Ditto, for Wardle's Wordle.  Now that I know his word list letter frequency ≠ English generally, my new two are SNAIL & ROUTE.

I always begin ARISE PLUTO. It knocks out all of the vowels, as well as R S L and T.

It also makes me feel like I’m summoning the ruler of the underworld.

haha! i like that!

none of my starting guesses are here lol. i guess that means they're special 🤗 /half-joking