Four leading AI chatbots—ChatGPT, Google Gemini, Anthropic's Claude, and xAI's Grok—were tested on over 3,100 news-related questions covering politics, healthcare, and foreign affairs [1, 2]. The study revealed that collective answers about US elections failed on accuracy, bias, or source selection 90% of the time, with nearly 36% of election answers containing at least one factual error [1, 2].
Among the four, Grok had the highest error rate, with factual inaccuracies in nearly 52% of its election-related responses [1, 2]. When bias appeared, ChatGPT, Claude, and Gemini tended to lean left politically, while Grok’s answers skewed right [1, 2].
All chatbots frequently cited foreign state-owned media outlets like China’s Global Times, CGTN, and Russia’s RT as reliable sources. In 35% of responses to foreign policy questions, these state-controlled media were referenced [1, 2]. ChatGPT cited state-owned media in 51% of foreign policy answers, while Grok did so 44% of the time [1, 2].
The study highlighted that chatbots often presented biased or inaccurate information with misleading confidence, making such errors harder for users to detect. Forum AI noted, “The most professional-looking answers, backed by strongest-looking citations, were also the most likely to contain buried factual errors” [1]. The AI models powering these chatbots are trained on massive open web datasets, where unreliable and biased content is common [1, 2].
Although few people currently rely on chatbots for news, usage is expected to rise, especially during election cycles [1, 2]. Campbell Brown, CEO of Forum AI, said, “I am particularly concerned about the study’s results given the looming midterm election cycle” [1].
Forum AI publicly released these findings on May 20, 2024, ahead of the US midterm elections [1]. The study makes clear the challenges AI chatbots face in providing accurate, unbiased information on politically sensitive topics.