Researchers have spotted an apparent downside of smarter chatbots. Although AI models predictably become more accurate as they advance, they’re also more likely to (wrongly) answer questions beyond their capabilities rather than saying, “I don’t know.” And the humans prompting them are more likely to take their confident hallucinations at face value, creating a trickle-down effect of confident misinformation.
“They are answering almost everything these days,” José Hernández-Orallo, professor at the Universitat Politecnica de Valencia, Spain, told Nature. “And that means more correct, but also more incorrect.” Hernández-Orallo, the project lead, worked on the study with his colleagues at the Valencian Research Institute for Artificial Intelligence in Spain.
The team studied three LLM families, including OpenAI’s GPT series, Meta’s LLaMA and the open-source BLOOM. They tested early versions of each model and moved to larger, more advanced ones — but not today’s most advanced. For example, the team began with OpenAI’s relatively primitive GPT-3 ada model and tested iterations leading up to GPT-4, which arrived in March 2023. The four-month-old GPT-4o wasn’t included in the study, nor was the newer o1-preview. I’d be curious if the trend still holds with the latest models.
The researchers tested each model on thousands of questions about “arithmetic, anagrams, geography and science.” They also quizzed the AI models on their ability to transform information, such as alphabetizing a list. The team ranked their prompts by perceived difficulty.
The data showed that the chatbots’ portion of wrong answers (instead of avoiding questions altogether) rose as the models grew. So, the AI is a bit like a professor who, as he masters more subjects, increasingly believes he has the golden answers on all of them.
Further complicating things is the humans prompting the chatbots and reading their answers. The researchers tasked volunteers with rating the accuracy of the AI bots’ answers, and they found that they “incorrectly classified inaccurate answers as being accurate surprisingly often.” The range of wrong answers falsely perceived as right by the volunteers typically fell between 10 and 40 percent.
“Humans are not able to supervise these models,” concluded Hernández-Orallo.
The research team recommends AI developers begin boosting performance for easy questions and programming the chatbots to refuse to answer complex questions. “We need humans to understand: ‘I can use it in this area, and I shouldn’t use it in that area,’” Hernández-Orallo told Nature.
It’s a well-intended suggestion that could make sense in an ideal world. But fat chance AI companies oblige. Chatbots that more often say “I don’t know” would likely be perceived as less advanced or valuable, leading to less use — and less money for the companies making and selling them. So, instead, we get fine-print warnings that “ChatGPT can make mistakes” and “Gemini may display inaccurate info.”
That leaves it up to us to avoid believing and spreading hallucinated misinformation that could hurt ourselves or others. For accuracy, fact-check your damn chatbot’s answers, for crying out loud.
You can read the team’s full study in Nature.
Trending Products

SAMSUNG FT45 Sequence 24-Inch FHD 1080p Laptop Monitor, 75Hz, IPS Panel, HDMI, DisplayPort, USB Hub, Peak Adjustable Stand, 3 Yr WRNTY (LF24T454FQNXGO),Black

KEDIERS ATX PC Case,6 PWM ARGB Fans Pre-Installed,360MM RAD Support,Gaming 270° Full View Tempered Glass Mid Tower Pure White ATX Computer Case,C690

ASUS RT-AX88U PRO AX6000 Twin Band WiFi 6 Router, WPA3, Parental Management, Adaptive QoS, Port Forwarding, WAN aggregation, lifetime web safety and AiMesh assist, Twin 2.5G Port

Wi-fi Keyboard and Mouse Combo, MARVO 2.4G Ergonomic Wi-fi Pc Keyboard with Telephone Pill Holder, Silent Mouse with 6 Button, Appropriate with MacBook, Home windows (Black)

Acer KB272 EBI 27″ IPS Full HD (1920 x 1080) Zero-Body Gaming Workplace Monitor | AMD FreeSync Know-how | As much as 100Hz Refresh | 1ms (VRB) | Low Blue Mild | Tilt | HDMI & VGA Ports,Black

Lenovo Ideapad Laptop computer Touchscreen 15.6″ FHD, Intel Core i3-1215U 6-Core, 24GB RAM, 1TB SSD, Webcam, Bluetooth, Wi-Fi6, SD Card Reader, Home windows 11, Gray, GM Equipment

Acer SH242Y Ebmihx 23.8″ FHD 1920×1080 Home Office Ultra-Thin IPS Computer Monitor AMD FreeSync 100Hz Zero Frame Height/Swivel/Tilt Adjustable Stand Built-in Speakers HDMI 1.4 & VGA Port

Acer SB242Y EBI 23.8″ Full HD (1920 x 1080) IPS Zero-Body Gaming Workplace Monitor | AMD FreeSync Expertise Extremely-Skinny Trendy Design 100Hz 1ms (VRB) Low Blue Gentle Tilt HDMI & VGA Ports
