Then what's the point?
Baby Brain
OpenAI has finally debuted "Operator," its very own AI agent, a type of autonomous model designed to do digital tasks on your behalf like shopping for groceries online.
But calling it an AI "toddler" might be more fitting. As a Bloomberg reporter describes her experience using OpenAI's new toy, the experimental tech needs a "lot of adult supervision," frequently screwing up and asking for help when it gets stuck.
It's also pretty sluggish, as echoed by other users, and slow on the uptake — as a still-developing brain might be.
"For several agonizing moments, I watched as OpenAI's artificially intelligent agent slowly navigated the internet like someone who's had the web described to them in great detail but never actually used it," wrote Bloomberg's Rachel Metz. "I had to monitor it the entire time."
ParentGPT
Experiences like Metz' suggest there's still a long way to realizing OpenAI's vision — and the industry's at large — of releasing agentic AI models that can serve as virtual employees or personal assistants, supercharging your productivity by doing the work for you.
Typical large language models are limited to words. But AI agents are capable of interacting with their environment, like a user's desktop computer. That potentially enables them to do anything from browse the web — itself a fount of infinite possibilities — to using installed software. In its announcement, OpenAI highlighted Operator's usefulness in making reservations, booking flights, and creating shopping lists. The AI model is only available to subscribers of the ChatGPT Pro plan, which costs $200 per month.
If it could do any of those things without help quickly and reliably, it might be a huge time saver. The tech is in a very early stage, however, and isn't as hands-off as one might hope.
As OpenAI warned, Operator has to ask you for confirmation before actually completing any important tasks — betraying that the tech isn't yet trustworthy enough to be left alone.
Helicopter Parent
Bloomberg's Metz says that Operator successfully handled tasks like getting ice cream delivered, although it required some "guidance and permission," like providing payment info and approving the purchase.
With more serious applications like creating a spreadsheet for a schedule, it frequently messed up the details, she said. OpenAI admitted that Operator still struggles with "complex interfaces" like creating slideshows and managing calendars.
If it can Instacart you some food and do some shopping, cool. But is it worth shelling out $200 a month for?
"Given it was just a test, I was ready and willing to keep a close eye on the product," Metz concluded. "But if OpenAI and its peers want agents to take off, they'll need to convince people that they can trust these services to act autonomously on their behalf. Otherwise, we may decide if we want a job done right, we should just do it ourselves."
More on OpenAI: OpenAI Hit With Wave of Mockery for Crying That Someone Stole Its Work Without Permission to Build a Competing Product
Share This Article