Did you know Which AI Models Are Leading the Way in Reducing Hallucinations and Improving Accuracy?

Did you know Which AI Models Are Leading the Way in Reducing Hallucinations and Improving Accuracy?

 

AI models are helping us in a lot of areas but they tend to hallucinate too and give us inaccurate information. IBM defines hallucinations in AI chatbots or computer vision tools as some outputs that come out as inaccurate due to detection of some patterns that do not exist. Vectara analyzed 1,000 short documents with each LLMs to detect hallucinations in them and came up with top 15 large language models with the lowest rates of hallucination. According to the data, Zhipu AI’s GLM-4-9B-Chat has the least hallucination rate at 1.3%. Google Gemini-2.0-Flash-Esp has the second lowest hallucination rate at 1.3% as well.

The top third LLM with least hallucination levels is OpenAI’s o1-mini with 1.4% hallucination rate. With a hallucination rate of 1.5%, GPT-4o is the fourth model with least hallucination. GPT-4o-mini and GPT-4-Turbo have hallucination rates of 1.7%. It was observed that more specialized and smaller models have the lowest hallucination rates. OpenAI’s GPT-4 has a hallucination rate of 1.8%, while GPT-3.5-Turbo has a hallucination rate of 1.9%.  

It is important for AI systems to show low levels of hallucination for them to work properly, especially in high-stake applications in healthcare, finance and law. Smaller models are slowly reducing hallucinations in their AI models, with Mistral 8×7B models reducing hallucinations in their AI generated texts. 


 

ModelHallucination RateFactual Consistency RateAnswer RateAverage Summary Length (Words)
Zhipu AI GLM-4-9B-Chat1.3 %98.7 %100.0 %58.1
Google Gemini-2.0-Flash-Exp1.3 %98.7 %99.9 %60
OpenAI-o1-mini1.4 %98.6 %100.0 %78.3
GPT-4o1.5 %98.5 %100.0 %77.8
GPT-4o-mini1.7 %98.3 %100.0 %76.3
GPT-4-Turbo1.7 %98.3 %100.0 %86.2
GPT-41.8 %98.2 %100.0 %81.1
GPT-3.5-Turbo1.9 %98.1 %99.6 %84.1
DeepSeek-V2.52.4 %97.6 %100.0 %83.2
Microsoft Orca-2-13b2.5 %97.5 %100.0 %66.2
Microsoft Phi-3.5-MoE-instruct2.5 %97.5 %96.3 %69.7
Intel Neural-Chat-7B-v3-32.6 %97.4 %100.0 %60.7
Qwen2.5-7B-Instruct2.8 %97.2 %100.0 %71
AI21 Jamba-1.5-Mini2.9 %97.1 %95.6 %74.5
Snowflake-Arctic-Instruct3.0 %97.0 %100.0 %68.7
Qwen2.5-32B-Instruct3.0 %97.0 %100.0 %67.9
Microsoft Phi-3-mini-128k-instruct3.1 %96.9 %100.0 %60.1
OpenAI-o1-preview3.3 %96.7 %100.0 %119.3
Google Gemini-1.5-Flash-0023.4 %96.6 %99.9 %59.4
01-AI Yi-1.5-34B-Chat3.7 %96.3 %100.0 %83.7
Llama-3.1-405B-Instruct3.9 %96.1 %99.6 %85.7
Microsoft Phi-3-mini-4k-instruct4.0 %96.0 %100.0 %86.8
Llama-3.3-70B-Instruct4.0 %96.0 %100.0 %85.3
Microsoft Phi-3.5-mini-instruct4.1 %95.9 %100.0 %75
Mistral-Large24.1 %95.9 %100.0 %77.4
Llama-3-70B-Chat-hf4.1 %95.9 %99.2 %68.5
Qwen2-VL-7B-Instruct4.2 %95.8 %100.0 %73.9
Qwen2.5-14B-Instruct4.2 %95.8 %100.0 %74.8
Qwen2.5-72B-Instruct4.3 %95.7 %100.0 %80
Llama-3.2-90B-Vision-Instruct4.3 %95.7 %100.0 %79.8
XAI Grok4.6 %95.4 %100.0 %91
Anthropic Claude-3-5-sonnet4.6 %95.4 %100.0 %95.9
Qwen2-72B-Instruct4.7 %95.3 %100.0 %100.1
Mixtral-8x22B-Instruct-v0.14.7 %95.3 %99.9 %92
Anthropic Claude-3-5-haiku4.9 %95.1 %100.0 %92.9
01-AI Yi-1.5-9B-Chat4.9 %95.1 %100.0 %85.7
Cohere Command-R4.9 %95.1 %100.0 %68.7
Llama-3.1-70B-Instruct5.0 %95.0 %100.0 %79.6
Llama-3.1-8B-Instruct5.4 %94.6 %100.0 %71
Cohere Command-R-Plus5.4 %94.6 %100.0 %68.4
Llama-3.2-11B-Vision-Instruct5.5 %94.5 %100.0 %67.3
Llama-2-70B-Chat-hf5.9 %94.1 %99.9 %84.9
IBM Granite-3.0-8B-Instruct6.5 %93.5 %100.0 %74.2
Google Gemini-1.5-Pro-0026.6 %93.7 %99.9 %62
Google Gemini-1.5-Flash6.6 %93.4 %99.9 %63.3
Microsoft phi-26.7 %93.3 %91.5 %80.8
Google Gemma-2-2B-it7.0 %93.0 %100.0 %62.2
Qwen2.5-3B-Instruct7.0 %93.0 %100.0 %70.4
Llama-3-8B-Chat-hf7.4 %92.6 %99.8 %79.7
Google Gemini-Pro7.7 %92.3 %98.4 %89.5
01-AI Yi-1.5-6B-Chat7.9 %92.1 %100.0 %98.9
Llama-3.2-3B-Instruct7.9 %92.1 %100.0 %72.2
databricks dbrx-instruct8.3 %91.7 %100.0 %85.9
Qwen2-VL-2B-Instruct8.3 %91.7 %100.0 %81.8
Cohere Aya Expanse 32B8.5 %91.5 %99.9 %81.9
IBM Granite-3.0-2B-Instruct8.8 %91.2 %100.0 %81.6
Mistral-7B-Instruct-v0.39.5 %90.5 %100.0 %98.4
Google Gemini-1.5-Pro9.1 %90.9 %99.8 %61.6
Anthropic Claude-3-opus10.1 %89.9 %95.5 %92.1
Google Gemma-2-9B-it10.1 %89.9 %100.0 %70.2
Llama-2-13B-Chat-hf10.5 %89.5 %99.8 %82.1
AllenAI-OLMo-2-13B-Instruct10.8 %89.2 %100.0 %82
AllenAI-OLMo-2-7B-Instruct11.1 %88.9 %100.0 %112.6
Mistral-Nemo-Instruct11.2 %88.8 %100.0 %69.9
Llama-2-7B-Chat-hf11.3 %88.7 %99.6 %119.9
Microsoft WizardLM-2-8x22B11.7 %88.3 %99.9 %140.8
Cohere Aya Expanse 8B12.2 %87.8 %99.9 %83.9
Amazon Titan-Express13.5 %86.5 %99.5 %98.4
Google PaLM-214.1 %85.9 %99.8 %86.6
Google Gemma-7B-it14.8 %85.2 %100.0 %113
Qwen2.5-1.5B-Instruct15.8 %84.2 %100.0 %70.7
Qwen-QwQ-32B-Preview16.1 %83.9 %100.0 %201.5
Anthropic Claude-3-sonnet16.3 %83.7 %100.0 %108.5
Google Gemma-1.1-7B-it17.0 %83.0 %100.0 %64.3
Anthropic Claude-217.4 %82.6 %99.3 %87.5
Google Flan-T5-large18.3 %81.7 %99.3 %20.9
Mixtral-8x7B-Instruct-v0.120.1 %79.9 %99.9 %90.7
Llama-3.2-1B-Instruct20.7 %79.3 %100.0 %71.5
Apple OpenELM-3B-Instruct24.8 %75.2 %99.3 %47.2
Qwen2.5-0.5B-Instruct25.2 %74.8 %100.0 %72.6
Google Gemma-1.1-2B-it27.8 %72.2 %100.0 %66.8
TII falcon-7B-instruct29.9 %70.1 %90.0 %75.5

Mohamed Elarby

A tech blog focused on blogging tips, SEO, social media, mobile gadgets, pc tips, how-to guides and general tips and tricks

Post a Comment

Previous Post Next Post

Post Ads 1

Post Ads 2