 
	Wrong Dot Com
Despite lofty claims from artificial intelligence soothsayers, the world’s top chatbots are still strikingly bad at giving financial advice.
AI researchers Gary Smith, Valentina Liberman, and Isaac Warshaw of the Walter Bradley Center for Natural and Artificial Intelligence posed a series of 12 finance questions to four leading large language models (LLMs) — OpenAI’s ChatGPT-4o, DeepSeek-V2, Elon Musk’s Grok 3 Beta, and Google’s Gemini 2 — to test out their financial prowess.
As the experts explained in a new study from Mind Matters, each chatbot proved to be “consistently verbose but often incorrect.”
That finding was, notably, almost identical to Smith’s assessment last year for the Journal of Financial Planning in which, upon posing 11 finance questions to ChatGPT 3.5, Microsoft’s Bing with ChatGPT’s GPT-4, and Google’s Bard chatbot, the LLMs spat out responses that were “consistently grammatically correct and seemingly authoritative but riddled with arithmetic and critical-thinking mistakes.”
Using a simple scale where a score of “0” included completely incorrect financial analyses, a “0.5” denoted a correct financial analysis with mathematical errors, and a “1” that was correct on both the math and the financial analysis, no chatbot earned higher than a five out of 12 points maximum. ChatGPT led the pack with a 5.0, followed by DeepSeek’s 4.0, Grok’s 3.0, and Gemini’s abysmal 1.5.
Spend Thrift
Some of the chatbot responses were so bad that they defied the Walter Bradley experts’ expectations. When Grok, for example, was asked to add up a single month’s worth of expenses for a Caribbean rental property whose rent was $3,700 and whose utilities ran $200 per month, the chatbot claimed that those numbers together added up to $4,900.
Along with spitting out a bunch of strange typographical errors, the chatbots also failed, per the study, to generate any intelligent analyses for the relatively basic financial questions the researchers posed. Even the chatbots’ most compelling answers seemed to be gleaned from various online sources, and those only came when being asked to explain relatively simple concepts like how Roth IRAs work.
Throughout it all, the chatbots were dangerously glib. The researchers noted that all of the LLMs they tested present a “reassuring illusion of human-like intelligence, along with a breezy conversational style enhanced by friendly exclamation points” that could come off to the average user as confidence and correctness.
“It is still the case that the real danger is not that computers are smarter than us,” they concluded, “but that we think computers are smarter than us and consequently trust them to make decisions they should not be trusted to make.”
More on dumb AI: OpenAI Researchers Find That Even the Best AI Is “Unable To Solve the Majority” of Coding Problems
