Text Classification
Transformers
Safetensors
English
llama
text-generation-inference
panos-lema's picture
Update README.md
7fe1cda verified
|
raw
history blame
7.46 kB
metadata
library_name: transformers
license: cc-by-nc-4.0
datasets:
  - oumi-ai/oumi-c2d-d2c-subset
  - oumi-ai/oumi-synthetic-claims
  - oumi-ai/oumi-synthetic-document-claims
language:
  - en
base_model:
  - meta-llama/Llama-3.1-8B-Instruct

oumi logo Made with Oumi

Documentation Blog Discord

oumi-ai/HallOumi-8B-classifier

Introducing HallOumi-8B-classifier, a fast SOTA hallucination detection model, outperforming DeepSeek R1, OpenAI o1, Google Gemini 1.5 Pro, and Anthropic Sonnet 3.5 at only 8 billion parameters!

Model Balanced Accuracy Macro F1 Score Open Source? Model Size
HallOumi-8B-classifier 76.8% ± 2.0% 78.5% ± 2.1% ✔️ 8B
Anthropic Sonnet 3.5 67.3% ± 2.7% 69.6% ± 2.8% ??
OpenAI o1-preview 64.5% ± 2.0% 65.9% ± 2.3% ??
DeepSeek R1 60.7% ± 2.1% 61.6% ± 2.5% ✔️ 671B
Llama 3.1 405B 58.7% ± 1.7% 58.8% ± 2.4% ✔️ 405B
Google Gemini 1.5 Pro 52.9% ± 1.0% 48.2% ± 1.8% ??

Demo GIF: TODO

HallOumi-8B-classifier, the hallucination classification model built with Oumi, is an end-to-end binary classification system that enables fast and accurate assessment of the hallucination probability of any written content (AI or human-generated).

  • ✔️ Fast with high accuracy
  • ✔️ Per-claim support (must call once per claim)

Hallucinations

Hallucinations are often cited as the most important issue with being able to deploy generative models in numerous commercial and personal applications, and for good reason:

It ultimately comes down to an issue of trust — generative models are trained to produce outputs which are probabilistically likely, but not necessarily true. While such tools are certainly useful in the right hands, being unable to trust them prevents AI from being adopted more broadly, where it can be utilized safely and responsibly.

Building Trust with Verifiability

To be able to begin trusting AI systems, we have to be able to verify their outputs. To verify, we specifically mean that we need to:

  • Understand the truthfulness of a particular statement produced by any model.
  • Understand what information supports that statement’s truth (or lack thereof)
  • Have full traceability connecting the statement to that information.

Missing any one of these aspects results in a system that cannot be verified and therefore cannot be trusted; however, this is not enough, as we have to be capable of doing these things in a way that is meticulous, scalable, and human-readable.


Uses

Use to verify claims/detect hallucinations in scenarios where a known source of truth is available.

Out-of-Scope Use

Smaller LLMs have limited capabilities and should be used with caution. Avoid using this model for purposes outside of claim verification.

Bias, Risks, and Limitations

This model was finetuned with Llama-3.1-405B-Instruct data on top of a Llama-3.1-8B-Instruct model, so any biases or risks associated with those models may be present.

Training Details

Training Data

Training data:

Training Procedure

Training notebook: Coming Soon

Evaluation

Eval notebook: Coming Soon

Environmental Impact

  • Hardware Type: A100-80GB
  • Hours used: 1.5 (4 * 8 GPUs)
  • Cloud Provider: Google Cloud Platform
  • Compute Region: us-east5
  • Carbon Emitted: 0.15 kg

Citation

@misc{oumiHalloumi8BClassifier,
  author = {Achlioptas Panos, Jeremiah Greer, Aisopos Kostas, Schuler A. Michael, Elachqar Oussama, Koukoumidis Emmanouil},
  title = {HallOumi-8B-classifier},
  month = {March},
  year = {2025},
  url = {https://huggingface.co/oumi-ai/HallOumi-8B-classifier}
}

@software{oumi2025,
  author = {Oumi Community},
  title = {Oumi: an Open, End-to-end Platform for Building Large Foundation Models},
  month = {January},
  year = {2025},
  url = {https://github.com/oumi-ai/oumi}
}