Update README.md
Browse files
README.md
CHANGED
@@ -34,7 +34,7 @@ library_name: transformers
|
|
34 |
[Large Action Models (LAMs)](https://blog.salesforceairesearch.com/large-action-models/) are advanced language models designed to enhance decision-making by translating user intentions into executable actions. As the **brains of AI agents**, LAMs autonomously plan and execute tasks to achieve specific goals, making them invaluable for automating workflows across diverse domains.
|
35 |
**This model release is for research purposes only.**
|
36 |
|
37 |
-
The new **xLAM-2** series, built on our most advanced data synthesis, processing, and training pipelines, marks a significant leap in **multi-turn conversation** and **tool usage**. Trained using our novel APIGen-MT framework, which generates high-quality training data through simulated agent-human interactions. Our models achieve state-of-the-art performance on **BFCL** and **Ο-bench** benchmarks, outperforming frontier models like GPT-4o and Claude 3.5. Notably, even our smaller models demonstrate superior capabilities in multi-turn scenarios while maintaining exceptional consistency across trials.
|
38 |
|
39 |
We've also refined the **chat template** and **vLLM integration**, making it easier to build advanced AI agents. Compared to previous xLAM models, xLAM-2 offers superior performance and seamless deployment across applications.
|
40 |
|
@@ -46,9 +46,11 @@ We've also refined the **chat template** and **vLLM integration**, making it eas
|
|
46 |
|
47 |
|
48 |
## Table of Contents
|
49 |
-
- [Model Series](#model-series)
|
50 |
- [Usage](#usage)
|
51 |
- [Basic Usage with Huggingface Chat Template](#basic-usage-with-huggingface-chat-template)
|
|
|
|
|
|
|
52 |
- [Benchmark Results](#benchmark-results)
|
53 |
- [Citation](#citation)
|
54 |
|
@@ -60,29 +62,21 @@ We've also refined the **chat template** and **vLLM integration**, making it eas
|
|
60 |
For the same number of parameters, the model have been fine-tuned across a wide range of agent tasks and scenarios, all while preserving the capabilities of the original model.
|
61 |
|
62 |
|
63 |
-
| Model | # Total Params | Context Length |
|
64 |
-
|
65 |
-
| Llama-xLAM-2-70b-fc-r | 70B | 128k |
|
66 |
-
| Llama-xLAM-2-8b-fc-r | 8B | 128k |
|
67 |
-
| xLAM-2-32b-fc-r | 32B | 32k (max 128k)* |
|
68 |
-
| xLAM-2-3b-fc-r | 3B | 32k (max 128k)* |
|
69 |
-
| xLAM-2-1b-fc-r | 1B | 32k (max 128k)* |
|
70 |
-
| xLAM-7b-r | 7.24B | 32k | Sep. 5, 2024|General, Function-calling | [π€ Link](https://huggingface.co/Salesforce/xLAM-7b-r) | -- |
|
71 |
-
| xLAM-8x7b-r | 46.7B | 32k | Sep. 5, 2024|General, Function-calling | [π€ Link](https://huggingface.co/Salesforce/xLAM-8x7b-r) | -- |
|
72 |
-
| xLAM-8x22b-r | 141B | 64k | Sep. 5, 2024|General, Function-calling | [π€ Link](https://huggingface.co/Salesforce/xLAM-8x22b-r) | -- |
|
73 |
-
| xLAM-1b-fc-r | 1.35B | 16k | July 17, 2024 | Function-calling| [π€ Link](https://huggingface.co/Salesforce/xLAM-1b-fc-r) | [π€ Link](https://huggingface.co/Salesforce/xLAM-1b-fc-r-gguf) |
|
74 |
-
| xLAM-7b-fc-r | 6.91B | 4k | July 17, 2024| Function-calling| [π€ Link](https://huggingface.co/Salesforce/xLAM-7b-fc-r) | [π€ Link](https://huggingface.co/Salesforce/xLAM-7b-fc-r-gguf) |
|
75 |
-
| xLAM-v0.1-r | 46.7B | 32k | Mar. 18, 2024 |General, Function-calling | [π€ Link](https://huggingface.co/Salesforce/xLAM-v0.1-r) | -- |
|
76 |
|
77 |
***Note:** The default context length for Qwen-2.5-based models is 32k, but you can use techniques like YaRN (Yet Another Recursive Network) to achieve maximum 128k context length. Please refer to [here](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct#processing-long-texts) for more details.
|
78 |
|
|
|
79 |
|
80 |
-
|
81 |
-
- `xLAM-7b-r`: A general-purpose v1.0 or v2.0 release of the **Large Action Model**, fine-tuned for broad agentic capabilities. The `-r` suffix indicates it is a **research** release.
|
82 |
-
- `xLAM-7b-fc-r`: A specialized variant where `-fc` denotes fine-tuning for **function calling** tasks, also marked for **research** use.
|
83 |
-
- β
All models are fully compatible with VLLM, FastChat, and Transformers-based inference frameworks.
|
84 |
|
85 |
-
|
86 |
|
87 |
|
88 |
## Usage
|
@@ -139,17 +133,90 @@ generated_tokens = outputs[:, input_ids_len:] # Slice the output to get only the
|
|
139 |
print(tokenizer.decode(generated_tokens[0], skip_special_tokens=True))
|
140 |
```
|
141 |
|
142 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
143 |
|
144 |
-
|
|
|
|
|
|
|
145 |
|
|
|
146 |
```bash
|
147 |
-
vllm serve Salesforce/xLAM-2-
|
|
|
|
|
|
|
|
|
148 |
```
|
149 |
|
150 |
-
|
|
|
|
|
151 |
|
|
|
152 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
153 |
|
154 |
## Benchmark Results
|
155 |
|
@@ -157,7 +224,7 @@ And then interact with the model using your preferred method for querying a vLLM
|
|
157 |
<p align="center">
|
158 |
<img width="80%" alt="BFCL Results" src="https://github.com/apigen-mt/apigen-mt.github.io/blob/main/img/bfcl-result.png?raw=true">
|
159 |
<br>
|
160 |
-
<small><i>Performance comparison of different models on BFCL leaderboard. The rank is based on the overall accuracy, which is a weighted average of different evaluation categories. "FC" stands for function-calling mode in contrast to using a customized "prompt" to extract the function calls.</i></small>
|
161 |
</p>
|
162 |
|
163 |
### Ο-bench Benchmark
|
@@ -196,6 +263,9 @@ If you use our model or dataset in your work, please cite our paper:
|
|
196 |
}
|
197 |
```
|
198 |
|
|
|
|
|
|
|
199 |
```bibtex
|
200 |
@article{zhang2025actionstudio,
|
201 |
title={ActionStudio: A Lightweight Framework for Data and Training of Action Models},
|
@@ -214,8 +284,6 @@ If you use our model or dataset in your work, please cite our paper:
|
|
214 |
}
|
215 |
|
216 |
```
|
217 |
-
Additionally, please check our other related works regarding xLAM and consider citing them as well:
|
218 |
-
|
219 |
|
220 |
```bibtex
|
221 |
@article{liu2024apigen,
|
|
|
34 |
[Large Action Models (LAMs)](https://blog.salesforceairesearch.com/large-action-models/) are advanced language models designed to enhance decision-making by translating user intentions into executable actions. As the **brains of AI agents**, LAMs autonomously plan and execute tasks to achieve specific goals, making them invaluable for automating workflows across diverse domains.
|
35 |
**This model release is for research purposes only.**
|
36 |
|
37 |
+
The new **xLAM-2** series, built on our most advanced data synthesis, processing, and training pipelines, marks a significant leap in **multi-turn conversation** and **tool usage**. Trained using our novel APIGen-MT framework, which generates high-quality training data through simulated agent-human interactions. Our models achieve state-of-the-art performance on [**BFCL**](https://gorilla.cs.berkeley.edu/leaderboard.html) and **Ο-bench** benchmarks, outperforming frontier models like GPT-4o and Claude 3.5. Notably, even our smaller models demonstrate superior capabilities in multi-turn scenarios while maintaining exceptional consistency across trials.
|
38 |
|
39 |
We've also refined the **chat template** and **vLLM integration**, making it easier to build advanced AI agents. Compared to previous xLAM models, xLAM-2 offers superior performance and seamless deployment across applications.
|
40 |
|
|
|
46 |
|
47 |
|
48 |
## Table of Contents
|
|
|
49 |
- [Usage](#usage)
|
50 |
- [Basic Usage with Huggingface Chat Template](#basic-usage-with-huggingface-chat-template)
|
51 |
+
- [Using vLLM for Inference](#using-vllm-for-inference)
|
52 |
+
- [Setup and Serving](#setup-and-serving)
|
53 |
+
- [Testing with OpenAI API](#testing-with-openai-api)
|
54 |
- [Benchmark Results](#benchmark-results)
|
55 |
- [Citation](#citation)
|
56 |
|
|
|
62 |
For the same number of parameters, the model have been fine-tuned across a wide range of agent tasks and scenarios, all while preserving the capabilities of the original model.
|
63 |
|
64 |
|
65 |
+
| Model | # Total Params | Context Length | Category | Download Model | Download GGUF files |
|
66 |
+
|------------------------|----------------|------------|-------|----------------|----------|
|
67 |
+
| Llama-xLAM-2-70b-fc-r | 70B | 128k | Multi-turn Conversation, Function-calling | [π€ Link](https://huggingface.co/Salesforce/Llama-xLAM-2-70b-fc-r) | NA |
|
68 |
+
| Llama-xLAM-2-8b-fc-r | 8B | 128k | Multi-turn Conversation, Function-calling | [π€ Link](https://huggingface.co/Salesforce/Llama-xLAM-2-8b-fc-r) | [π€ Link](https://huggingface.co/Salesforce/Llama-xLAM-2-8b-fc-r-gguf) |
|
69 |
+
| xLAM-2-32b-fc-r | 32B | 32k (max 128k)* | Multi-turn Conversation, Function-calling | [π€ Link](https://huggingface.co/Salesforce/xLAM-2-32b-fc-r) | NA |
|
70 |
+
| xLAM-2-3b-fc-r | 3B | 32k (max 128k)* | Multi-turn Conversation, Function-calling | [π€ Link](https://huggingface.co/Salesforce/xLAM-2-3b-fc-r) | [π€ Link](https://huggingface.co/Salesforce/xLAM-2-3b-fc-r-gguf) |
|
71 |
+
| xLAM-2-1b-fc-r | 1B | 32k (max 128k)* | Multi-turn Conversation, Function-calling | [π€ Link](https://huggingface.co/Salesforce/xLAM-2-1b-fc-r) | [π€ Link](https://huggingface.co/Salesforce/xLAM-2-1b-fc-r-gguf) |
|
|
|
|
|
|
|
|
|
|
|
|
|
72 |
|
73 |
***Note:** The default context length for Qwen-2.5-based models is 32k, but you can use techniques like YaRN (Yet Another Recursive Network) to achieve maximum 128k context length. Please refer to [here](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct#processing-long-texts) for more details.
|
74 |
|
75 |
+
You can also explore our previous xLAM series [here](https://huggingface.co/collections/Salesforce/xlam-models-65f00e2a0a63bbcd1c2dade4).
|
76 |
|
77 |
+
The `-fc` suffix indicates that the models are fine-tuned for **function calling** tasks, while the `-r` suffix signifies a **research** release.
|
|
|
|
|
|
|
78 |
|
79 |
+
β
All models are fully compatible with vLLM and Transformers-based inference frameworks.
|
80 |
|
81 |
|
82 |
## Usage
|
|
|
133 |
print(tokenizer.decode(generated_tokens[0], skip_special_tokens=True))
|
134 |
```
|
135 |
|
136 |
+
### Using vLLM for Inference
|
137 |
+
|
138 |
+
The xLAM models can also be efficiently served using vLLM for high-throughput inference. Please use `vllm>=0.6.5` since earlier versions will cause degraded performance for Qwen-based models.
|
139 |
+
|
140 |
+
#### Setup and Serving
|
141 |
+
|
142 |
+
1. Install vLLM with the required version:
|
143 |
+
```bash
|
144 |
+
pip install "vllm>=0.6.5"
|
145 |
+
```
|
146 |
|
147 |
+
2. Download the tool parser plugin to your local path:
|
148 |
+
```bash
|
149 |
+
wget https://huggingface.co/Salesforce/xLAM-2-1b-fc-r/raw/main/xlam_tool_call_parser.py
|
150 |
+
```
|
151 |
|
152 |
+
3. Start the OpenAI API-compatible endpoint:
|
153 |
```bash
|
154 |
+
vllm serve Salesforce/xLAM-2-1b-fc-r \
|
155 |
+
--enable-auto-tool-choice \
|
156 |
+
--tool-parser-plugin ./xlam_tool_call_parser.py \
|
157 |
+
--tool-call-parser xlam \
|
158 |
+
--tensor-parallel-size 1
|
159 |
```
|
160 |
|
161 |
+
Note: Ensure that the tool parser plugin file is downloaded and that the path specified in `--tool-parser-plugin` correctly points to your local copy of the file. The xLAM series models all utilize the **same** tool call parser, so you only need to download it **once** for all models.
|
162 |
+
|
163 |
+
#### Testing with OpenAI API
|
164 |
|
165 |
+
Here's a minimal example to test tool usage with the served endpoint:
|
166 |
|
167 |
+
```python
|
168 |
+
import openai
|
169 |
+
import json
|
170 |
+
|
171 |
+
# Configure the client to use your local vLLM endpoint
|
172 |
+
client = openai.OpenAI(
|
173 |
+
base_url="http://localhost:8000/v1", # Default vLLM server URL
|
174 |
+
api_key="empty" # Can be any string
|
175 |
+
)
|
176 |
+
|
177 |
+
# Define a tool/function
|
178 |
+
tools = [
|
179 |
+
{
|
180 |
+
"type": "function",
|
181 |
+
"function": {
|
182 |
+
"name": "get_weather",
|
183 |
+
"description": "Get the current weather for a location",
|
184 |
+
"parameters": {
|
185 |
+
"type": "object",
|
186 |
+
"properties": {
|
187 |
+
"location": {
|
188 |
+
"type": "string",
|
189 |
+
"description": "The city and state, e.g. San Francisco, CA"
|
190 |
+
},
|
191 |
+
"unit": {
|
192 |
+
"type": "string",
|
193 |
+
"enum": ["celsius", "fahrenheit"],
|
194 |
+
"description": "The unit of temperature to return"
|
195 |
+
}
|
196 |
+
},
|
197 |
+
"required": ["location"]
|
198 |
+
}
|
199 |
+
}
|
200 |
+
}
|
201 |
+
]
|
202 |
+
|
203 |
+
# Create a chat completion
|
204 |
+
response = client.chat.completions.create(
|
205 |
+
model="Salesforce/xLAM-2-1b-fc-r", # Model name doesn't matter, vLLM uses the served model
|
206 |
+
messages=[
|
207 |
+
{"role": "system", "content": "You are a helpful assistant that can use tools."},
|
208 |
+
{"role": "user", "content": "What's the weather like in San Francisco?"}
|
209 |
+
],
|
210 |
+
tools=tools,
|
211 |
+
tool_choice="auto"
|
212 |
+
)
|
213 |
+
|
214 |
+
# Print the response
|
215 |
+
print("Assistant's response:")
|
216 |
+
print(json.dumps(response.model_dump(), indent=2))
|
217 |
+
```
|
218 |
+
|
219 |
+
For more advanced configurations and deployment options, please refer to the [vLLM documentation](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html).
|
220 |
|
221 |
## Benchmark Results
|
222 |
|
|
|
224 |
<p align="center">
|
225 |
<img width="80%" alt="BFCL Results" src="https://github.com/apigen-mt/apigen-mt.github.io/blob/main/img/bfcl-result.png?raw=true">
|
226 |
<br>
|
227 |
+
<small><i>Performance comparison of different models on [BFCL leaderboard](https://gorilla.cs.berkeley.edu/leaderboard.html). The rank is based on the overall accuracy, which is a weighted average of different evaluation categories. "FC" stands for function-calling mode in contrast to using a customized "prompt" to extract the function calls.</i></small>
|
228 |
</p>
|
229 |
|
230 |
### Ο-bench Benchmark
|
|
|
263 |
}
|
264 |
```
|
265 |
|
266 |
+
Additionally, please check our other awesome related works regarding xLAM series and consider citing them as well:
|
267 |
+
|
268 |
+
|
269 |
```bibtex
|
270 |
@article{zhang2025actionstudio,
|
271 |
title={ActionStudio: A Lightweight Framework for Data and Training of Action Models},
|
|
|
284 |
}
|
285 |
|
286 |
```
|
|
|
|
|
287 |
|
288 |
```bibtex
|
289 |
@article{liu2024apigen,
|