Study finds major AI chatbots struggle with accuracy and bias on US midterm election topics

Four leading AI chatbots—ChatGPT, Google Gemini, Anthropic's Claude, and xAI's Grok—were tested on over 3,100 news-related questions covering politics, healthcare, and foreign affairs ^{[1, 2]}. The study revealed that collective answers about US elections failed on accuracy, bias, or source selection 90% of the time, with nearly 36% of election answers containing at least one factual error ^{[1, 2]}.

Among the four, Grok had the highest error rate, with factual inaccuracies in nearly 52% of its election-related responses ^{[1, 2]}. When bias appeared, ChatGPT, Claude, and Gemini tended to lean left politically, while Grok’s answers skewed right ^{[1, 2]}.

All chatbots frequently cited foreign state-owned media outlets like China’s Global Times, CGTN, and Russia’s RT as reliable sources. In 35% of responses to foreign policy questions, these state-controlled media were referenced ^{[1, 2]}. ChatGPT cited state-owned media in 51% of foreign policy answers, while Grok did so 44% of the time ^{[1, 2]}.

The study highlighted that chatbots often presented biased or inaccurate information with misleading confidence, making such errors harder for users to detect. Forum AI noted, “The most professional-looking answers, backed by strongest-looking citations, were also the most likely to contain buried factual errors” ^[1]. The AI models powering these chatbots are trained on massive open web datasets, where unreliable and biased content is common ^{[1, 2]}.

Although few people currently rely on chatbots for news, usage is expected to rise, especially during election cycles ^{[1, 2]}. Campbell Brown, CEO of Forum AI, said, “I am particularly concerned about the study’s results given the looming midterm election cycle” ^[1].

Forum AI publicly released these findings on May 20, 2024, ahead of the US midterm elections ^[1]. The study makes clear the challenges AI chatbots face in providing accurate, unbiased information on politically sensitive topics.

Study finds major AI chatbots struggle with accuracy and bias on US midterm election topics

Gallery

Sources