ChatGPT Scores Worst for Workplace Reliability Study

0
Chatgpt
Chatgpt

ChatGPT ranks as the least reliable artificial intelligence (AI) chatbot for work tasks despite dominating 81 percent of the market, according to a December 2025 study examining how major AI platforms perform in workplace situations. The research by casino games aggregator Relum reveals the popular chatbot makes up incorrect information 35 percent of the time.

The study evaluated 10 major AI chatbots based on four key factors including hallucination rate, customer ratings, response consistency, and downtime frequency. Each platform received a reliability risk score from 0 to 99, with higher numbers indicating greater problems. The hallucination rate, which measures how often chatbots generate false information, counted for 50 percent of the total scoring.

ChatGPT received a reliability risk score of 99, the highest among all tested platforms. The chatbot earned a customer rating of 4.7 out of 5 and scored 4.0 for quality and response consistency, yet experienced downtime 0.81 percent of the time over a two month testing period. The findings contrast sharply with ChatGPT’s dominant market position and strong customer satisfaction ratings.

Claude emerged as the second most unreliable option with a reliability risk score of 75. The AI assistant hallucinates 17 percent of the time, less than half of ChatGPT’s error rate, but still presents enough accuracy concerns for professional environments. Claude holds just 0.99 percent of the chatbot market and records a 4.4 customer satisfaction rating. The platform experiences downtime 0.77 percent of the time, nearly matching ChatGPT’s availability issues.

Meta AI secured third position with a risk score of 70. While its hallucination rate sits at 15 percent, the platform struggles with overall performance quality. Customers assign Meta AI the lowest rating among major tested chatbots at 3.4 out of 5 stars. Response consistency also scores low at 3.4, meaning users receive answers of varying quality depending on their questions. Meta AI maintains a 20 percent market share despite these challenges and rarely experiences service interruptions.

Botpress ranked fourth with a risk score of 41, hallucinating 15 percent of the time. The platform maintains 4.5 ratings for both customer satisfaction and response consistency. Botpress charges 89 dollars monthly, making it the most expensive option among business focused chatbots. The service records 0.37 percent downtime, translating to several hours of unavailability each month.

Perplexity AI claimed fifth place with a risk score of 31. The chatbot demonstrates a 13 percent hallucination rate and earns a 4.6 customer rating. Response quality measures 3.5 out of 5, showing opportunities for consistency improvements. Perplexity AI captures more than 10 percent of market share while charging 40 dollars monthly, double ChatGPT’s pricing. The service shows only 0.18 percent downtime during the testing period.

The most reliable chatbots according to the study include Kimi with a risk score of 3, DeepSeek at 4, and Grok at 6. These platforms recorded almost no downtime during the evaluation period. Kimi achieved a 13 percent hallucination rate with a 4.5 customer rating and 4.3 response consistency score. DeepSeek demonstrated a 14 percent hallucination rate alongside a 4.7 customer rating and zero downtime. Grok registered an 8 percent hallucination rate with a 4.5 customer rating.

Google Gemini presented an unusual case, recording the highest hallucination rate at 38 percent yet earning a low reliability risk score of 13. The platform benefits from minimal downtime at 0.05 percent and maintains strong customer ratings at 4.4. Microsoft Copilot followed with a 27 percent hallucination rate and risk score of 11.

Razvan Lucian Haiduc, Chief Product Officer at Relum, emphasized the importance of selecting appropriate AI tools for specific business needs. He noted that approximately 65 percent of United States companies now use AI chatbots in daily operations, while nearly 45 percent of employees admit sharing sensitive company information with these platforms. The growing dependence on AI tools makes reliability assessment crucial for organizations.

The research methodology involved comprehensive testing across multiple workplace scenarios to measure how chatbots handle real world tasks. Hallucination rates were determined by comparing chatbot responses against verified information sources. Downtime measurements tracked service availability over the two month evaluation period. Customer ratings reflect aggregated user feedback from various review platforms.

Relum operates as a casino games aggregator offering over 20,000 games from more than 180 providers through a single application programming interface (API) platform. The company provides game aggregation services and engagement tools including jackpots, free spins, and tournaments for online casino operators. The firm maintains operations focused on the interactive gaming industry while conducting research on AI technology applications.

The study’s findings raise questions about the correlation between market dominance and actual reliability in AI platforms. ChatGPT’s massive user base and high satisfaction ratings coexist with the poorest performance in factual accuracy among tested chatbots. This disconnect suggests users may prioritize factors like ease of use, feature richness, or ecosystem integration over pure accuracy when selecting AI tools.

Industry experts continue debating appropriate use cases for different AI chatbots. While some platforms excel at creative tasks or general conversation, others demonstrate superior performance for fact checking or professional applications requiring high accuracy. The study recommends companies evaluate chatbots based on their specific industry requirements rather than defaulting to the most popular option.

Complete research findings are available through Relum’s published report. The company plans to update its reliability assessments periodically as AI platforms release new versions and improvements. Organizations considering AI chatbot adoption can reference these benchmarks when making technology decisions for their workplace environments.

Send your news stories to [email protected] Follow News Ghana on Google News

LEAVE A REPLY

Please enter your comment!
Please enter your name here