The curious case of hallucinating chatbots

Microsoft Bing went off the rails, and Google’s Bard made an error in its first public demo—yet companies are still racing to integrate AI into the world’s most popular search engines

If you ask a chatbot a question, there’s a good chance its answer is going to be right. But, just as likely, it could be made up—a phenomenon called “hallucination,” where AIs confidently present error-riddled responses as fact. And, thanks to Google’s recent integration of AI, that may also be true of the search engine’s results.

One user found this out by googling Animal Collective, only to find that Google instead surfaced results for Anal Cunt: a grindcore band known for its outrageous and offensive music, to which the search engine attributed Animal Collective’s numerous albums. While the exact origin of the glitch is not known, such errors suggest that Google’s transition to AI-powered search, which the company announced last month, will be less seamless than its executives might hope.

In fact, Google’s own chatbot, Bard, made a factual error in its very first public demo, claiming that the James Webb Telescope took the “very first pictures” of an exoplanet outside of our solar system—which, as several astronomers subsequently pointed out, is not true. This public blunder lost Google’s parent company, Alphabet, some $100 million in market value. It also wasn’t so surprising, given the behavior of counterparts like Microsoft Bing, the AI originally positioned as a search engine replacement, before it developed a reputation for spewing incorrect information and even berating its users—prompting Microsoft to impose new limitations on the length and content of its chats, and walk back some of their earlier claims about the validity of the information it supplies.

“Unlike chatbots of the past… today’s large language models are designed to learn skills on their own, synthesizing patterns from online datasets and applying these insights to create plausible-sounding answers.”

It can be hard to prevent such behavior in AI systems, in part because even the developers behind generative models like Bing, Bard, and ChatGPT don’t know exactly how they come up with their answers. This is because, unlike chatbots of the past—programs that offered responses based on pre-programmed, carefully-defined parameters—today’s large language models are designed to learn skills on their own, synthesizing patterns from online datasets and applying these insights to create plausible-sounding answers.

Such answers are often correct—but because these chatbots all work by leveraging billions of data points to predict the next word in a string of text, they don’t actually know if the information they’re providing is true or false. So as machine learning researchers scramble to fix the hallucination issue, it’s important for users of AI-assisted search to remember that these programs essentially function like a hyper-intelligent version of autocomplete—which, as any iPhone user can attest, doesn’t know what the duck it’s talking about.