Show HN: Experiments in AI-generation of crosswords

abstractbill | 38 points | 6mon ago | abstractnonsense.com

Hi HN, I've been experimenting on-and-off over the years trying to automatically generate crosswords [1]. Recently I've been feeling like my results are good enough that I want to share them and see what other people think. I'm not trying to claim that these could appear in, say, the NYT in their current state, but honestly the velocity of progress makes me feel like I will inevitably be able to automatically generate NYT-quality crosswords within just a year or so.

A write-up is here: https://abstractnonsense.com/crosswords.html

And you can play the crosswords here: https://crosswordracing.com (They should work well on both desktop and mobile, and there's a leader-board for each crossword if you want to leave your name when you solve one).

[1]: Just in case anyone is interested, my very first attempt at this problem was way back in 2006! I used multiple wordlists (e.g. list of British monarchs, with reign dates), and wrote little functions to generate clues from each list (e.g. "British monarch who ruled from {date1} to {date2}"). Even with randomized synonym substitution and similar tricks, this approach was too labor-intensive, and the results too robotic, for it to work well. Can't complain though, that project led to me getting hired as the first engineer at Justin.TV!

vunderba|6mon ago

Not bad.

As someone who has dabbled in AI generated crosswords I found that providing samples of "good crossword clues" (which I curated from historical NYT monday puzzles) as part of the LLM context helped tremendously in generating better clues.

There was also a Show HN for a generative AI crossword puzzle system a few months ago so I'll include what I mentioned there:

Part of the deep satisfaction in solving a crossword puzzle is the specificity of the answer. It's far more gratifying to answer a question with something like "Hawking" then to answer with "scientist", or answering with "mandelbrot" versus "shape".

So ideally, you want to lean towards "specificity" wherever possible, and use "generics" as filler.

Link:

https://news.ycombinator.com/item?id=41879754

abstractbill|6mon ago

Thanks. Yes, specificity of solutions seems like a good metric to optimize for.

In some of my crosswords I get clues that are specific in clever ways (e.g. one of these has "Extreme, not camping" which I thought was really strange until I found the answer "intense" and was very impressed by that level of wordplay from an LLM!)

korymath|6mon ago

Great post.

Funny, I just posted this to X

2025 GenAI challenge

Create a 5x5 crossword puzzle with two distinct solutions. Each clue must work for both solutions. Do not use the same word in both solutions. No black squares.

I try with each new model that lands. Still can’t get it.

alberto_balsam|6mon ago

Do you know if there is a solution to this by humans? I'd be interested in seeing it.

korymath|6mon ago

I've not found a solution at any NxN size made by human or machine.

quuxplusone|6mon ago

You might get a little closer by tweaking the prompt — you're asking the LLM to "figure out" that the first step is to create two 5x5 word squares with no repeated words, and then the second step is to solve ten requests of the form, "Give me a crossword-style clue that could be reasonably solved by either the words OPERA or the word TENET" (for each of the ten word-pairs in your square-pairs). However, LLMs are based on tokens, and thus fundamentally don't "understand" that words are made out of letters — that's why we have memes about their inability to count the number of "r"s in "strawberry" and so on. So we shouldn't expect an LLM to be able to perform Step 1 at all, really. And Step 2 requires wordplay and/or lateral thinking, which LLMs are again bad at. (They can easily do "Give me a crossword-style clue that could be solved by the word OPERA," because there are databases of such things on the web which form part of every LLM's dataset. But there's no such database for double-solution clues.)

Generating a 5x5 word square (with different words across and down, so not of the "Sator Arepo" variety) is already really hard for a human. I plugged the Wordle target word list into https://github.com/Quuxplusone/xword/blob/master/src/xword-f... to get a bunch of plausible squares like this:

    SCALD
    POLAR
    ARTSY
    CEASE
    ERROR
But you want two word squares that can plausibly be clued together, which is (not impossible, but) difficult if matching entries aren't the same part of speech. For example, cluing "POLAR" together with "ARTSY" (both adjectives) seems likely more doable than cluing "POLAR" together with "LASSO" (noun or verb).

Anyway, here's my attempt at a human solution, using the grid above — and another grid, which I'll challenge you to find from these clues. Hint: All but two of the ten pairs match, part-of-speech-wise.

    1A. Remove the outer layer of, perhaps  
    2A. Region on a globe  
    3A. Like some movie theaters  
    4A. Command to a lawbreaker  
    5A. Rhyme for Tom Lehrer?  
    1D. ____yard (sometime sci-fi setting)  
    2D. It goes something like this: Ꮎ  
    3D. Feature of liturgy, often  
    4D. It's vacuous, in a sense  
    5D. Fino, vis-a-vis Pedro Ximénez

echelon|6mon ago

That's algorithmically hard.

Ask the LLM to generate a program to solve the problem.

korymath|6mon ago

I've tried that, as recently as today with latest Gemini, Claude, and o1 ... none have been successful.

abstractbill|6mon ago

Thanks!

That's a wonderfully hard problem, I'd love to see it get solved.