RichardErkhov commited on
Commit
f2d0d41
·
verified ·
1 Parent(s): 4ef07ae

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +171 -0
README.md ADDED
@@ -0,0 +1,171 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ Llama-3-Patronus-Lynx-8B-Instruct-v1.1 - GGUF
11
+ - Model creator: https://huggingface.co/PatronusAI/
12
+ - Original model: https://huggingface.co/PatronusAI/Llama-3-Patronus-Lynx-8B-Instruct-v1.1/
13
+
14
+
15
+ | Name | Quant method | Size |
16
+ | ---- | ---- | ---- |
17
+ | [Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q2_K.gguf](https://huggingface.co/RichardErkhov/PatronusAI_-_Llama-3-Patronus-Lynx-8B-Instruct-v1.1-gguf/blob/main/Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q2_K.gguf) | Q2_K | 2.96GB |
18
+ | [Llama-3-Patronus-Lynx-8B-Instruct-v1.1.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/PatronusAI_-_Llama-3-Patronus-Lynx-8B-Instruct-v1.1-gguf/blob/main/Llama-3-Patronus-Lynx-8B-Instruct-v1.1.IQ3_XS.gguf) | IQ3_XS | 3.28GB |
19
+ | [Llama-3-Patronus-Lynx-8B-Instruct-v1.1.IQ3_S.gguf](https://huggingface.co/RichardErkhov/PatronusAI_-_Llama-3-Patronus-Lynx-8B-Instruct-v1.1-gguf/blob/main/Llama-3-Patronus-Lynx-8B-Instruct-v1.1.IQ3_S.gguf) | IQ3_S | 3.43GB |
20
+ | [Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/PatronusAI_-_Llama-3-Patronus-Lynx-8B-Instruct-v1.1-gguf/blob/main/Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q3_K_S.gguf) | Q3_K_S | 3.41GB |
21
+ | [Llama-3-Patronus-Lynx-8B-Instruct-v1.1.IQ3_M.gguf](https://huggingface.co/RichardErkhov/PatronusAI_-_Llama-3-Patronus-Lynx-8B-Instruct-v1.1-gguf/blob/main/Llama-3-Patronus-Lynx-8B-Instruct-v1.1.IQ3_M.gguf) | IQ3_M | 3.52GB |
22
+ | [Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q3_K.gguf](https://huggingface.co/RichardErkhov/PatronusAI_-_Llama-3-Patronus-Lynx-8B-Instruct-v1.1-gguf/blob/main/Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q3_K.gguf) | Q3_K | 3.74GB |
23
+ | [Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/PatronusAI_-_Llama-3-Patronus-Lynx-8B-Instruct-v1.1-gguf/blob/main/Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q3_K_M.gguf) | Q3_K_M | 3.74GB |
24
+ | [Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/PatronusAI_-_Llama-3-Patronus-Lynx-8B-Instruct-v1.1-gguf/blob/main/Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q3_K_L.gguf) | Q3_K_L | 4.03GB |
25
+ | [Llama-3-Patronus-Lynx-8B-Instruct-v1.1.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/PatronusAI_-_Llama-3-Patronus-Lynx-8B-Instruct-v1.1-gguf/blob/main/Llama-3-Patronus-Lynx-8B-Instruct-v1.1.IQ4_XS.gguf) | IQ4_XS | 4.18GB |
26
+ | [Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q4_0.gguf](https://huggingface.co/RichardErkhov/PatronusAI_-_Llama-3-Patronus-Lynx-8B-Instruct-v1.1-gguf/blob/main/Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q4_0.gguf) | Q4_0 | 4.34GB |
27
+ | [Llama-3-Patronus-Lynx-8B-Instruct-v1.1.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/PatronusAI_-_Llama-3-Patronus-Lynx-8B-Instruct-v1.1-gguf/blob/main/Llama-3-Patronus-Lynx-8B-Instruct-v1.1.IQ4_NL.gguf) | IQ4_NL | 4.38GB |
28
+ | [Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/PatronusAI_-_Llama-3-Patronus-Lynx-8B-Instruct-v1.1-gguf/blob/main/Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q4_K_S.gguf) | Q4_K_S | 4.37GB |
29
+ | [Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q4_K.gguf](https://huggingface.co/RichardErkhov/PatronusAI_-_Llama-3-Patronus-Lynx-8B-Instruct-v1.1-gguf/blob/main/Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q4_K.gguf) | Q4_K | 4.58GB |
30
+ | [Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/PatronusAI_-_Llama-3-Patronus-Lynx-8B-Instruct-v1.1-gguf/blob/main/Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q4_K_M.gguf) | Q4_K_M | 4.58GB |
31
+ | [Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q4_1.gguf](https://huggingface.co/RichardErkhov/PatronusAI_-_Llama-3-Patronus-Lynx-8B-Instruct-v1.1-gguf/blob/main/Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q4_1.gguf) | Q4_1 | 4.78GB |
32
+ | [Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q5_0.gguf](https://huggingface.co/RichardErkhov/PatronusAI_-_Llama-3-Patronus-Lynx-8B-Instruct-v1.1-gguf/blob/main/Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q5_0.gguf) | Q5_0 | 5.21GB |
33
+ | [Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/PatronusAI_-_Llama-3-Patronus-Lynx-8B-Instruct-v1.1-gguf/blob/main/Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q5_K_S.gguf) | Q5_K_S | 3.92GB |
34
+ | [Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q5_K.gguf](https://huggingface.co/RichardErkhov/PatronusAI_-_Llama-3-Patronus-Lynx-8B-Instruct-v1.1-gguf/blob/main/Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q5_K.gguf) | Q5_K | 3.82GB |
35
+ | [Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/PatronusAI_-_Llama-3-Patronus-Lynx-8B-Instruct-v1.1-gguf/blob/main/Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q5_K_M.gguf) | Q5_K_M | 5.34GB |
36
+ | [Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q5_1.gguf](https://huggingface.co/RichardErkhov/PatronusAI_-_Llama-3-Patronus-Lynx-8B-Instruct-v1.1-gguf/blob/main/Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q5_1.gguf) | Q5_1 | 5.34GB |
37
+ | [Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q6_K.gguf](https://huggingface.co/RichardErkhov/PatronusAI_-_Llama-3-Patronus-Lynx-8B-Instruct-v1.1-gguf/blob/main/Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q6_K.gguf) | Q6_K | 5.92GB |
38
+ | [Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q8_0.gguf](https://huggingface.co/RichardErkhov/PatronusAI_-_Llama-3-Patronus-Lynx-8B-Instruct-v1.1-gguf/blob/main/Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q8_0.gguf) | Q8_0 | 5.93GB |
39
+
40
+
41
+
42
+
43
+ Original model description:
44
+ ---
45
+ library_name: transformers
46
+ tags:
47
+ - text-generation
48
+ - pytorch
49
+ - Lynx
50
+ - Patronus AI
51
+ - evaluation
52
+ - hallucination-detection
53
+ license: cc-by-nc-4.0
54
+ language:
55
+ - en
56
+ ---
57
+
58
+ # Model Card for Model ID
59
+
60
+ Lynx is an open-source hallucination evaluation model. Patronus-Lynx-8B-Instruct-v1.1 was trained on a mix of datasets including CovidQA, PubmedQA, DROP, RAGTruth.
61
+ The datasets contain a mix of hand-annotated and synthetic data. The maximum sequence length is 128000 tokens.
62
+
63
+
64
+ ## Model Details
65
+
66
+ - **Model Type:** Patronus-Lynx-8B-Instruct-v1.1 is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct model.
67
+ - **Language:** Primarily English
68
+ - **Developed by:** Patronus AI
69
+ - **Paper:** [https://arxiv.org/abs/2407.08488](https://arxiv.org/abs/2407.08488)
70
+ - **License:** [https://creativecommons.org/licenses/by-nc/4.0/](https://creativecommons.org/licenses/by-nc/4.0/)
71
+
72
+ ### Model Sources
73
+
74
+ <!-- Provide the basic links for the model. -->
75
+
76
+ - **Repository:** [https://github.com/patronus-ai/Lynx-hallucination-detection](https://github.com/patronus-ai/Lynx-hallucination-detection)
77
+
78
+
79
+ ## How to Get Started with the Model
80
+ Lynx is trained to detect hallucinations in RAG settings. Provided a document, question and answer, the model can evaluate whether the answer is faithful to the document.
81
+
82
+ To use the model, we recommend using the following prompt:
83
+
84
+ ```
85
+ PROMPT = """
86
+ Given the following QUESTION, DOCUMENT and ANSWER you must analyze the provided answer and determine whether it is faithful to the contents of the DOCUMENT. The ANSWER must not offer new information beyond the context provided in the DOCUMENT. The ANSWER also must not contradict information provided in the DOCUMENT. Output your final verdict by strictly following this format: "PASS" if the answer is faithful to the DOCUMENT and "FAIL" if the answer is not faithful to the DOCUMENT. Show your reasoning.
87
+
88
+ --
89
+ QUESTION (THIS DOES NOT COUNT AS BACKGROUND INFORMATION):
90
+ {question}
91
+
92
+ --
93
+ DOCUMENT:
94
+ {context}
95
+
96
+ --
97
+ ANSWER:
98
+ {answer}
99
+
100
+ --
101
+
102
+ Your output should be in JSON FORMAT with the keys "REASONING" and "SCORE":
103
+ {{"REASONING": <your reasoning as bullet points>, "SCORE": <your final score>}}
104
+ """
105
+ ```
106
+
107
+ The model will output the score as 'PASS' if the answer is faithful to the document or FAIL if the answer is not faithful to the document.
108
+
109
+ ## Inference
110
+
111
+ To run inference, you can use HF pipeline:
112
+
113
+ ```
114
+
115
+ model_name = 'PatronusAI/Llama-3-Patronus-Lynx-8B-Instruct-v1.1'
116
+ pipe = pipeline(
117
+ "text-generation",
118
+ model=model_name,
119
+ max_new_tokens=600,
120
+ device="cuda",
121
+ return_full_text=False
122
+ )
123
+
124
+ messages = [
125
+ {"role": "user", "content": prompt},
126
+ ]
127
+
128
+ result = pipe(messages)
129
+ print(result[0]['generated_text'])
130
+
131
+ ```
132
+
133
+ Since the model is trained in chat format, ensure that you pass the prompt as a user message.
134
+
135
+ For more information on training details, refer to our [ArXiv paper](https://arxiv.org/abs/2407.08488).
136
+
137
+ ## Evaluation
138
+
139
+ The model was evaluated on [PatronusAI/HaluBench](https://huggingface.co/datasets/PatronusAI/HaluBench).
140
+
141
+
142
+ | Model | HaluEval | RAGTruth | FinanceBench | DROP | CovidQA | PubmedQA | Overall
143
+ | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
144
+ | GPT-4o | <ins>87.9%</ins> | 84.3% | <ins>85.3%</ins> | 84.3% | 95.0% | 82.1% | <ins>86.5%</ins> |
145
+ | GPT-4-Turbo | 86.0% | <ins>85.0%</ins> | 82.2% | <ins>84.8%</ins> | 90.6% | 83.5% | 85.0% |
146
+ | GPT-3.5-Turbo | 62.2% | 50.7% | 60.9% | 57.2% | 56.7% | 62.8% | 58.7% |
147
+ | Claude-3.5-Sonnet | 84.5% | 79.1% | 69.3% | 69.7% | 70.8% |84.8% |83.7%|
148
+ | RAGAS Faithfulness | 70.6% | 75.8% | 59.5% | 59.6% | 75.0% | 67.7% | 66.9% |
149
+ | Mistral-Instruct-7B | 78.3% | 77.7% | 56.3% | 56.3% | 71.7% | 77.9% | 69.4% |
150
+ | Llama-3-Instruct-8B | 83.1% | 80.0% | 55.0% | 58.2% | 75.2% | 70.7% | 70.4% |
151
+ | Llama-3-Instruct-70B | 87.0% | **83.8%** | 72.7% | 69.4% | 85.0% | 82.6% | 80.1% |
152
+ | Lynx (8B) | 85.7% | 80.0% | 72.5% | **77.8%** | 96.3% | 85.2% | 82.9% |
153
+ | Lynx v1.1 (8B) | **87.3%** | 79.9% | **75.6%** | 77.5% | <ins>**96.9%**</ins> |<ins> **88.9%**</ins> | **84.3%** |
154
+
155
+ ## Citation
156
+ If you are using the model, cite using
157
+
158
+ ```
159
+ @article{ravi2024lynx,
160
+ title={Lynx: An Open Source Hallucination Evaluation Model},
161
+ author={Ravi, Selvan Sunitha and Mielczarek, Bartosz and Kannappan, Anand and Kiela, Douwe and Qian, Rebecca},
162
+ journal={arXiv preprint arXiv:2407.08488},
163
+ year={2024}
164
+ }
165
+ ```
166
+
167
+ ## Model Card Contact
168
+ [@sunitha-ravi](https://huggingface.co/sunitha-ravi)
169
+ [@RebeccaQian1](https://huggingface.co/RebeccaQian1)
170
+ [@presidev](https://huggingface.co/presidev)
171
+