OpenAI is rolling out brand new image generation capabilities for ChatGPT. And guess what? It finally — almost — nails text.
Until now, the chatbot used the company's separate DALL-E model to dream up pictures. With this latest update, users will be able to access a new feature dubbed "Images in ChatGPT," leveraging OpenAI's flagship GPT-4o model, which has underpinned the chatbot for nearly a year. The upgrade is also available in Sora, OpenAI's video generation tool.
"This model is a step change above previous models," research lead Gabriel Goh told The Verge.
The most noticeable change is how the model handles text, something that it and its competitors have long struggled with. Words tended to come out looking like gobbledygook, and the text that was legible looked sloppy, filled with formatting errors and misspellings.
Not anymore, according to OpenAI. One example shared by the company shows an employee writing out the pros and cons of the ChatGPT image update on a whiteboard, following to the letter what was specified in the prompt; ditto for a four-panel comic strip about a snail — all with cleanly rendered text.
4o image generation has arrived.
It's beginning to roll out today in ChatGPT and Sora to all Plus, Pro, Team, and Free users. pic.twitter.com/pFXDzKhh2t
— OpenAI (@OpenAI) March 25, 2025
"This was just like a process of iteration that took many, many months to get right," Goh told The Verge. "It's been just many months of small improvements." The model still struggles with very small lettering, but overall, the text quality is consistently usable, Goh said.
Unlike image generators like DALL-E, which use a diffusion model, GPT-4o uses an autoregressive approach that produces images from left to right and top to bottom, per The Verge, similar to how text — at least in English — is written.
Beyond improved penmanship, OpenAI says the model will now follow instructions better, as a common issue with older iterations was that they'd ignore certain details in lengthier prompts. It's also been fine-tuned to be able to generate more photorealistic images.
There are caveats. For one, it'll take longer to generate the outputs. And like all generative models, it's still prone to making up information, or hallucinating. It also struggles with generating non-Latin scripts, hallucinating characters when trying to write out languages like Korean.
With greater capabilities come greater safety and misinformation concerns. To this end, OpenAI stressed that it has particularly "robust safeguards" in place around nudity, violence, and depictions of real people. Moreover, all images that the AI model generates will be embedded with C2PA metadata identifying that it was made with GPT-4o. But this hidden watermark of sorts can easily be stripped — in fact, many social media platforms automatically remove an image's metadata once it's uploaded.
"Ultimately, no system is perfect for this type of thing, but we're continuously improving our safeguards and we think of this as a starting point," ChatGPT multimodal product lead Jackie Shannon told The Verge.
For now, GPT-4o image generation is only available to subscribers of OpenAI's ludicrous $200 per month Pro subscription tier, with plans to roll out the feature to Plus and free users in the near future.
More on OpenAI: Something Bizarre Is Happening to People Who Use ChatGPT a Lot
Share This Article