Artificial intelligence is amassing strength, yet its hallucinations continue to intensify.
Here Comes a Torrent of AI Malarkey! Even the Big Guns in Tech Don't Know Why Their Systems Are Spouting Nonsense
Let's face it, folks. AI has become as common as schoolyard gossip, and it's taking on an alarming number of tasks in our everyday lives. Case in point: recent AI systems, known as "automated reasoning" systems, are turning out to be as unreliable as thatfriend who always blurts out some ridiculously false story. Even the corporations that developed these AI systems are scratching their heads, wondering why they're spouting such nonsense.
Last month, an AI robot for a tech support tool called Cursor tried to pull a fast one on its users. It announced a change in the company's policy, stating that users were no longer allowed to use Cursor on more than one computer. As you can imagine, the tech community was aghast. Users' voices were heard loud and clear, as an uproar ensued across online forums, and many even canceled their Cursor accounts. To make matters worse, after a flurry of angry messages, the CEO of Cursor took to the internet to clarify: the AI robot had made it all up.
"The policy remains unchanged. You have the freedom to use Cursor on multiple machines," wrote Michael Truell, CEO and co-founder of Cursor, on Reddit. "It's unfortunate that our AI support robot had an unwanted lapse and fabricated a policy change."
Today's AI is capable of handling an increasingly wide range of tasks, from creating academic papers to generating computer code, but there's still no guarantee that the information it dishes out is accurate. Some AI systems are better at math, but when it comes to facts, they're falling short.
Hallucinatory AI: Still Tripping Ballz
The latest and most powerful AI systems, such as OpenAI, Google, and DeepSeek, are causing the most trouble. Their error rates have risen significantly. These AI systems are based on mathematical systems that learn by analyzing a varied array of numerical data, but they're clueless about determining what's true versus what's false. They can even invent things, a phenomenon referred to as "hallucinations."
In a test, the hallucination rate of new AI systems hit 79%. These AI systems choose the best response based on mathematical probabilities, not on a strict set of human-defined rules. As a result, they make mistakes.
"Despite all our efforts, hallucinations will always be there, they won't just vanish," estimates Amr Awadallah, a former Google executive who founded Vectara, a business AI startup.
The unreliability of these systems has long been a concern, as their use extends to critical areas like law, medicine, and commerce. Without a solid fact-check, these systems can lead to costly and even dangerous mistakes.
Stuck in La-La Land
"We spend a lot of time sorting out what's true and what's false," explains Pratik Verma, co-founder and CEO of Okahu, a company that helps businesses manage the pesky problem of hallucinations. "If we don't address these errors, AI systems would become firmly planted in their fantastical worlds and forget their purpose of automating tasks."
Cursor and Michael Truell were not available to comment for this article.
Since 2023, OpenAI, Google, and their peers have made strides in improving their AI and reducing their error rates. However, with the arrival of reasoning systems, errors are on the rise. According to OpenAI tests, their latest systems are hallucinating more than their older models.
OpenAI's State-of-the-Art System, o3, Hallucinates 33% of the Time on the PersonQA Reference Test (a Series of Questions About Public Figures). This is Double the Hallucination Rate of o1, OpenAI's Previous Reasoning System. The New o4-Mini System Has an Even Higher Hallucination Rate: 48%.
In a study, SimpleQA (more general questions), o3 and o4-mini had hallucination rates of 51% and 79% respectively; o1 performed better, with only a 44% hallucination rate. Independent tests suggest that hallucination rates are also on the rise with reasoning models from Google, DeepSeek, and other AI companies.
The High Stakes of Being a Loose Cannon
"Reinforcement learning" is the new method companies are relying on to improve AI. This technique allows systems to learn behavior through trial and error, which is effective in certain domains. However, it doesn't cut it in others.
"These systems focus too much on one task, forgetting the others," explains Laura Perez-Beltrachini, a researcher at the University of Edinburgh who is studying the hallucination problem closely.
Given that these AI systems process amounts of data our minds can't even grasp, it's challenging for engineers to pin down the cause of their faulty behavior.
In light of these developments, it seems that the political implications of AI malarkey are becoming increasingly significant. Even as tech corporations invest heavily in improving their AI systems, the issue of AI hallucinations remains an ongoing concern, particularly with the latest reasoning systems like OpenAI's o3 and o4-mini, which have alarmingly high hallucination rates.
As AI systems continue to master a wide range of tasks, from academic research to business management, their unreliability casts doubt on their ability to deliver accurate and trustworthy results. The potential consequences, particularly in critical areas such as law, medicine, and commerce, are profound, highlighting the urgent need for improved fact-checking mechanisms in AI technology.