Skip to main content

Indie game storeFree gamesFun gamesHorror games
Game developmentAssetsComics
SalesBundles
Jobs
TagsGame Engines

Better itch.io thumbnails with ChatGPT

Making good thumbnails is hard

Here at WoollySoft we sometimes struggle to produce a decent thumbnail for a project. Take our latest demo Knights of the Triangle. It's a tribute to the Vectrex console so the graphics style attempts to mimic its vector graphics display. The idea is that the minimal graphics leads players to focus on the gameplay itself and immerse themselves in the real-time action, strategy and cunning required to outwit their opponent.

This may or may not work (it's still a demo after all) but one thing is for sure: Screenshots of simple vector graphics make for terrible thumbnails. Even if you start out with a nice high resolution image, by the time it has been scaled down and filtered, it will look more like this:


Pretty uninspiring right? It's not surprising that so far this demo has remained largely unplayed.

With all the hype surrounding text-to-image generative AI, it feels like we should be able to do better without spending too much time designing thumbnails (that's time we could be spending developing games after all, and we'd much rather be working on new levels for Sheep Hug Simulator than making eye-catching thumbnails ourselves).

Low-effort text-to-image thumbnails

We'll be using ChatGPT with its built-in support for DALL-E 3 (other text-to-image AI's reportedly yield better results, e.g. Midjourney, but sometimes when learning a new technology you just need to pick something and stick with it). We can generate a thumbnail in the style of 1980s Vectrex cover-art simply by asking, using the following prompt:

Generate an image which can represent a game as a thumbnail on itch.io. The game is called Kinights of the Triangle. It is a retro game in the style of games for the Vectrex console. The thumbnail should be in the style of cover art for games for the Vectrex console which were released in the first half of the 1980s. The theme of the game is medieval combat and duelling.

The result is not exactly what we had hoped for:


When working with ChatGPT, sometimes it's helpful to have a little back-and-forth to refine its initial suggestions and the same applies for text-to-image. Let's be more specific about what we want:

Try again but with more colour and the title of the game included as text. Again, remember the early 1980s home video console aesthetic.

This is more promising, but not immediately usable:

So, the middle section is great but if we were to crop it we would be left with something far from the required aspect ratio. There's also a small issue with the text (more on that later). Let's just tell ChatGPT what needs to change:

Try again. The middle section of the previous image is perfect. The main problem is that it is cassette case sized and there are additional designs to the left and right. The thumbnail needs to be more of a 16:9 ratio to reflect modern expectations, but the style of the middle of the previous image is perfect.

This does indeed give us something in the 16:9 aspect ratio and in a similar style but if we were really hoping to keep the image content the same we are out of luck:


We're now starting to see a lot of issues with generated text-like characters appearing in the sort of place you might expect but making no sense whatsoever. The title has also mutated again to "KNNIGHTS OF THE TRIANGLE" and there are plenty of clues the image was AI-generated if you look closely at the detail.

Still, feels like we're onto a good thing so let's continue to be specific and really lean-in to the idea of having extra text in the image (note that we got carried away with aspect ratios at this point):

Generate a similar version, this time with a 4:3 aspect ratio and only use text from the following comma separated list: Knights of the Triangle, WoollySoft, VECTOR, Duel to the death!, Unparalleled Realism

Oh no... So close, but yet so far:

To be fair, this has so much immediate appeal we were tempted to change the title of the game just so we could use the art. In the end we couldn't come up with a convincing explanation of what a Kriaanigle might be and had to give up.

So to conclude our low-effort attempt, we got close but we haven't managed to generate anything really usable.

Digging deeper

To make use of any new technology sometimes it helps to know a little about what's going on behind the scenes, or at least reading up a little on the documentation so you can be sure you're using it in the most effective way. Firstly, let's take a look at OpenAI's description of DALL-E 3:

When prompted with an idea, ChatGPT will automatically generate tailored, detailed prompts for DALL·E 3 that bring your idea to life. If you like a particular image, but it’s not quite right, you can ask ChatGPT to make tweaks with just a few words

This is interesting for two reasons. Firstly, when we ask ChatGPT to generate an image our instruction is not fed to DALL-E 3 exactly as we wrote it - it is first translated by ChatGPT so that it provides sufficient information to produce a compelling image. Secondly, it seems as though we should be able to make minor edits to the images with follow-up requests, though in our initial experiments this didn't appear to be the case.

Let's jump from there to the Limitations & Risk section of the DALL-E 3 research paper:

When building our captioner, we paid special attention to ensuring that it was able to include prominent words found in images in the captions it generated. As a result, DALL-E 3 can generate text when prompted. During testing, we have noticed that this capability is unreliable as words are have missing or extra characters.

This is exactly what we have observed too. The paper goes on to explain this is likely a result of text being encoded as tokens (roughly approximate to a word) which the model must then map back to individual characters.

Given the fact that the text seemed to get less predictable as our design session went on, our working hypothesis is that the volume of tokens in a DALL-E 3 request is somehow proportional to the amount of weirdness that will be observed in generated text.

Next up we'll use our new knowledge to construct better prompts and hopefully get some better results.

Improving our prompts with science guesswork

By combining our initial experience with the knowledge above we can come up with a few general rules for prompts which should generate better images:

  1. Follow general ChatGPT prompting guidelines. Because our prompt is going to be transformed by another ChatGPT prompt before it is fed to DALL-E 3 we can follow documented best practices to ensure the model behaves in a predictable way.
  2. Provide all information required to generate the image in a single chat message and generate only one image per chat session. The idea here is that we want to keep the number of tokens down so that the model doesn't get confused when it is trying to generate text.
  3. Avoid mixing visual, style or contextual requirements with requests for specific text. Along the same lines as the previous rule, we don't want to confuse the model by associating more tokens with the desired text than are absolutely necessary.

Putting it all together

After a little more trial, error and guesswork, we're seeing some pretty good results with the following prompt:

I want you to create an image with a 16:9 aspect ratio. I will first provide the context and imagery requirements. I will then provide details of the text I need in the image. The context and text will be separated by a newline and the context will be provided first.
Context: Cover art for a 1980s home computer game. It is named “Knights of the Triangle”. The themes are knights, medieval combat and duelling.
Text: The title, “Knights of the Triangle”, should be displayed in the top centre. The words “Duel to the death!” should be included as a callout. No other text should be included. The text I have asked for should be spelled exactly as I have written it. If you generate any additional text when pre-processing this prompt disregard it and use my original text exactly as provided within the quotation marks.

As can be imagined, we continued to struggle getting the text we wanted, but we did seem to be getting more consistent results. Here's one of the better ones:

And finally, an image which captures the right feel and contains exactly the text we wanted (with a bonus aeroplane):

Future work

There's a lot more we could do here - the ChatGPT UI has some nice looking tools for editing sections of images, and the transformed prompts can be viewed which gives further clues how to get images that are closer to what we actually want.

But honestly, we'll probably just wait for OpenAI to improve the underlying model - because fun though this is, we'd rather be making games than thumbnails.

Bonus tip - converting from WebP to PNG

Images generated in ChatGPT are currently only downloadable in WebP format which is not supported by itch.io, so they'll need to be converted using your favourite image editing software. We used ImageMagick which allows us to resize the images at the same time using the following command:

convert -resize 848x480 /path/to/input_image.webp kott_thumbnail.png

Support this post

Did you like this post? Tell us

Leave a comment

Log in with your itch.io account to leave a comment.

Mentioned in this post

A medieval combat game for zero to two players.
Action
Play in browser
A fun game for people who all they want in life is to hug sheep!
Simulation
Play in browser