zuxin-llm commited on
Commit
53b0eff
Β·
verified Β·
1 Parent(s): 46d09fd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +95 -27
README.md CHANGED
@@ -34,7 +34,7 @@ library_name: transformers
34
  [Large Action Models (LAMs)](https://blog.salesforceairesearch.com/large-action-models/) are advanced language models designed to enhance decision-making by translating user intentions into executable actions. As the **brains of AI agents**, LAMs autonomously plan and execute tasks to achieve specific goals, making them invaluable for automating workflows across diverse domains.
35
  **This model release is for research purposes only.**
36
 
37
- The new **xLAM-2** series, built on our most advanced data synthesis, processing, and training pipelines, marks a significant leap in **multi-turn conversation** and **tool usage**. Trained using our novel APIGen-MT framework, which generates high-quality training data through simulated agent-human interactions. Our models achieve state-of-the-art performance on **BFCL** and **Ο„-bench** benchmarks, outperforming frontier models like GPT-4o and Claude 3.5. Notably, even our smaller models demonstrate superior capabilities in multi-turn scenarios while maintaining exceptional consistency across trials.
38
 
39
  We've also refined the **chat template** and **vLLM integration**, making it easier to build advanced AI agents. Compared to previous xLAM models, xLAM-2 offers superior performance and seamless deployment across applications.
40
 
@@ -46,9 +46,11 @@ We've also refined the **chat template** and **vLLM integration**, making it eas
46
 
47
 
48
  ## Table of Contents
49
- - [Model Series](#model-series)
50
  - [Usage](#usage)
51
  - [Basic Usage with Huggingface Chat Template](#basic-usage-with-huggingface-chat-template)
 
 
 
52
  - [Benchmark Results](#benchmark-results)
53
  - [Citation](#citation)
54
 
@@ -60,29 +62,21 @@ We've also refined the **chat template** and **vLLM integration**, making it eas
60
  For the same number of parameters, the model have been fine-tuned across a wide range of agent tasks and scenarios, all while preserving the capabilities of the original model.
61
 
62
 
63
- | Model | # Total Params | Context Length |Release Date | Category | Download Model | Download GGUF files |
64
- |------------------------|----------------|------------|-------------|-------|----------------|----------|
65
- | Llama-xLAM-2-70b-fc-r | 70B | 128k | Mar. 26, 2025 | Multi-turn Conversation, Function-calling | [πŸ€— Link](https://huggingface.co/Salesforce/Llama-xLAM-2-70b-fc-r) | NA |
66
- | Llama-xLAM-2-8b-fc-r | 8B | 128k | Mar. 26, 2025 | Multi-turn Conversation, Function-calling | [πŸ€— Link](https://huggingface.co/Salesforce/Llama-xLAM-2-8b-fc-r) | [πŸ€— Link](https://huggingface.co/Salesforce/Llama-xLAM-2-8b-fc-r-gguf) |
67
- | xLAM-2-32b-fc-r | 32B | 32k (max 128k)* | Mar. 26, 2025 | Multi-turn Conversation, Function-calling | [πŸ€— Link](https://huggingface.co/Salesforce/xLAM-2-32b-fc-r) | NA |
68
- | xLAM-2-3b-fc-r | 3B | 32k (max 128k)* | Mar. 26, 2025 | Multi-turn Conversation, Function-calling | [πŸ€— Link](https://huggingface.co/Salesforce/xLAM-2-3b-fc-r) | [πŸ€— Link](https://huggingface.co/Salesforce/xLAM-2-3b-fc-r-gguf) |
69
- | xLAM-2-1b-fc-r | 1B | 32k (max 128k)* | Mar. 26, 2025 | Multi-turn Conversation, Function-calling | [πŸ€— Link](https://huggingface.co/Salesforce/xLAM-2-1b-fc-r) | [πŸ€— Link](https://huggingface.co/Salesforce/xLAM-2-1b-fc-r-gguf) |
70
- | xLAM-7b-r | 7.24B | 32k | Sep. 5, 2024|General, Function-calling | [πŸ€— Link](https://huggingface.co/Salesforce/xLAM-7b-r) | -- |
71
- | xLAM-8x7b-r | 46.7B | 32k | Sep. 5, 2024|General, Function-calling | [πŸ€— Link](https://huggingface.co/Salesforce/xLAM-8x7b-r) | -- |
72
- | xLAM-8x22b-r | 141B | 64k | Sep. 5, 2024|General, Function-calling | [πŸ€— Link](https://huggingface.co/Salesforce/xLAM-8x22b-r) | -- |
73
- | xLAM-1b-fc-r | 1.35B | 16k | July 17, 2024 | Function-calling| [πŸ€— Link](https://huggingface.co/Salesforce/xLAM-1b-fc-r) | [πŸ€— Link](https://huggingface.co/Salesforce/xLAM-1b-fc-r-gguf) |
74
- | xLAM-7b-fc-r | 6.91B | 4k | July 17, 2024| Function-calling| [πŸ€— Link](https://huggingface.co/Salesforce/xLAM-7b-fc-r) | [πŸ€— Link](https://huggingface.co/Salesforce/xLAM-7b-fc-r-gguf) |
75
- | xLAM-v0.1-r | 46.7B | 32k | Mar. 18, 2024 |General, Function-calling | [πŸ€— Link](https://huggingface.co/Salesforce/xLAM-v0.1-r) | -- |
76
 
77
  ***Note:** The default context length for Qwen-2.5-based models is 32k, but you can use techniques like YaRN (Yet Another Recursive Network) to achieve maximum 128k context length. Please refer to [here](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct#processing-long-texts) for more details.
78
 
 
79
 
80
- ### πŸ“¦ Model Naming Conventions
81
- - `xLAM-7b-r`: A general-purpose v1.0 or v2.0 release of the **Large Action Model**, fine-tuned for broad agentic capabilities. The `-r` suffix indicates it is a **research** release.
82
- - `xLAM-7b-fc-r`: A specialized variant where `-fc` denotes fine-tuning for **function calling** tasks, also marked for **research** use.
83
- - βœ… All models are fully compatible with VLLM, FastChat, and Transformers-based inference frameworks.
84
 
85
- ---
86
 
87
 
88
  ## Usage
@@ -139,17 +133,90 @@ generated_tokens = outputs[:, input_ids_len:] # Slice the output to get only the
139
  print(tokenizer.decode(generated_tokens[0], skip_special_tokens=True))
140
  ```
141
 
142
- <!-- ### Using vLLM for Inference
 
 
 
 
 
 
 
 
 
143
 
144
- The xLAM models can also be efficiently served using vLLM for high-throughput inference. Please refer to the vLLM documentation for detailed instructions on how to deploy and use these models. You can typically start the vLLM service with the model name:
 
 
 
145
 
 
146
  ```bash
147
- vllm serve Salesforce/xLAM-2-3b-fc-r
 
 
 
 
148
  ```
149
 
150
- And then interact with the model using your preferred method for querying a vLLM endpoint. -->
 
 
151
 
 
152
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
153
 
154
  ## Benchmark Results
155
 
@@ -157,7 +224,7 @@ And then interact with the model using your preferred method for querying a vLLM
157
  <p align="center">
158
  <img width="80%" alt="BFCL Results" src="https://github.com/apigen-mt/apigen-mt.github.io/blob/main/img/bfcl-result.png?raw=true">
159
  <br>
160
- <small><i>Performance comparison of different models on BFCL leaderboard. The rank is based on the overall accuracy, which is a weighted average of different evaluation categories. "FC" stands for function-calling mode in contrast to using a customized "prompt" to extract the function calls.</i></small>
161
  </p>
162
 
163
  ### Ο„-bench Benchmark
@@ -196,6 +263,9 @@ If you use our model or dataset in your work, please cite our paper:
196
  }
197
  ```
198
 
 
 
 
199
  ```bibtex
200
  @article{zhang2025actionstudio,
201
  title={ActionStudio: A Lightweight Framework for Data and Training of Action Models},
@@ -214,8 +284,6 @@ If you use our model or dataset in your work, please cite our paper:
214
  }
215
 
216
  ```
217
- Additionally, please check our other related works regarding xLAM and consider citing them as well:
218
-
219
 
220
  ```bibtex
221
  @article{liu2024apigen,
 
34
  [Large Action Models (LAMs)](https://blog.salesforceairesearch.com/large-action-models/) are advanced language models designed to enhance decision-making by translating user intentions into executable actions. As the **brains of AI agents**, LAMs autonomously plan and execute tasks to achieve specific goals, making them invaluable for automating workflows across diverse domains.
35
  **This model release is for research purposes only.**
36
 
37
+ The new **xLAM-2** series, built on our most advanced data synthesis, processing, and training pipelines, marks a significant leap in **multi-turn conversation** and **tool usage**. Trained using our novel APIGen-MT framework, which generates high-quality training data through simulated agent-human interactions. Our models achieve state-of-the-art performance on [**BFCL**](https://gorilla.cs.berkeley.edu/leaderboard.html) and **Ο„-bench** benchmarks, outperforming frontier models like GPT-4o and Claude 3.5. Notably, even our smaller models demonstrate superior capabilities in multi-turn scenarios while maintaining exceptional consistency across trials.
38
 
39
  We've also refined the **chat template** and **vLLM integration**, making it easier to build advanced AI agents. Compared to previous xLAM models, xLAM-2 offers superior performance and seamless deployment across applications.
40
 
 
46
 
47
 
48
  ## Table of Contents
 
49
  - [Usage](#usage)
50
  - [Basic Usage with Huggingface Chat Template](#basic-usage-with-huggingface-chat-template)
51
+ - [Using vLLM for Inference](#using-vllm-for-inference)
52
+ - [Setup and Serving](#setup-and-serving)
53
+ - [Testing with OpenAI API](#testing-with-openai-api)
54
  - [Benchmark Results](#benchmark-results)
55
  - [Citation](#citation)
56
 
 
62
  For the same number of parameters, the model have been fine-tuned across a wide range of agent tasks and scenarios, all while preserving the capabilities of the original model.
63
 
64
 
65
+ | Model | # Total Params | Context Length | Category | Download Model | Download GGUF files |
66
+ |------------------------|----------------|------------|-------|----------------|----------|
67
+ | Llama-xLAM-2-70b-fc-r | 70B | 128k | Multi-turn Conversation, Function-calling | [πŸ€— Link](https://huggingface.co/Salesforce/Llama-xLAM-2-70b-fc-r) | NA |
68
+ | Llama-xLAM-2-8b-fc-r | 8B | 128k | Multi-turn Conversation, Function-calling | [πŸ€— Link](https://huggingface.co/Salesforce/Llama-xLAM-2-8b-fc-r) | [πŸ€— Link](https://huggingface.co/Salesforce/Llama-xLAM-2-8b-fc-r-gguf) |
69
+ | xLAM-2-32b-fc-r | 32B | 32k (max 128k)* | Multi-turn Conversation, Function-calling | [πŸ€— Link](https://huggingface.co/Salesforce/xLAM-2-32b-fc-r) | NA |
70
+ | xLAM-2-3b-fc-r | 3B | 32k (max 128k)* | Multi-turn Conversation, Function-calling | [πŸ€— Link](https://huggingface.co/Salesforce/xLAM-2-3b-fc-r) | [πŸ€— Link](https://huggingface.co/Salesforce/xLAM-2-3b-fc-r-gguf) |
71
+ | xLAM-2-1b-fc-r | 1B | 32k (max 128k)* | Multi-turn Conversation, Function-calling | [πŸ€— Link](https://huggingface.co/Salesforce/xLAM-2-1b-fc-r) | [πŸ€— Link](https://huggingface.co/Salesforce/xLAM-2-1b-fc-r-gguf) |
 
 
 
 
 
 
72
 
73
  ***Note:** The default context length for Qwen-2.5-based models is 32k, but you can use techniques like YaRN (Yet Another Recursive Network) to achieve maximum 128k context length. Please refer to [here](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct#processing-long-texts) for more details.
74
 
75
+ You can also explore our previous xLAM series [here](https://huggingface.co/collections/Salesforce/xlam-models-65f00e2a0a63bbcd1c2dade4).
76
 
77
+ The `-fc` suffix indicates that the models are fine-tuned for **function calling** tasks, while the `-r` suffix signifies a **research** release.
 
 
 
78
 
79
+ βœ… All models are fully compatible with vLLM and Transformers-based inference frameworks.
80
 
81
 
82
  ## Usage
 
133
  print(tokenizer.decode(generated_tokens[0], skip_special_tokens=True))
134
  ```
135
 
136
+ ### Using vLLM for Inference
137
+
138
+ The xLAM models can also be efficiently served using vLLM for high-throughput inference. Please use `vllm>=0.6.5` since earlier versions will cause degraded performance for Qwen-based models.
139
+
140
+ #### Setup and Serving
141
+
142
+ 1. Install vLLM with the required version:
143
+ ```bash
144
+ pip install "vllm>=0.6.5"
145
+ ```
146
 
147
+ 2. Download the tool parser plugin to your local path:
148
+ ```bash
149
+ wget https://huggingface.co/Salesforce/xLAM-2-1b-fc-r/raw/main/xlam_tool_call_parser.py
150
+ ```
151
 
152
+ 3. Start the OpenAI API-compatible endpoint:
153
  ```bash
154
+ vllm serve Salesforce/xLAM-2-1b-fc-r \
155
+ --enable-auto-tool-choice \
156
+ --tool-parser-plugin ./xlam_tool_call_parser.py \
157
+ --tool-call-parser xlam \
158
+ --tensor-parallel-size 1
159
  ```
160
 
161
+ Note: Ensure that the tool parser plugin file is downloaded and that the path specified in `--tool-parser-plugin` correctly points to your local copy of the file. The xLAM series models all utilize the **same** tool call parser, so you only need to download it **once** for all models.
162
+
163
+ #### Testing with OpenAI API
164
 
165
+ Here's a minimal example to test tool usage with the served endpoint:
166
 
167
+ ```python
168
+ import openai
169
+ import json
170
+
171
+ # Configure the client to use your local vLLM endpoint
172
+ client = openai.OpenAI(
173
+ base_url="http://localhost:8000/v1", # Default vLLM server URL
174
+ api_key="empty" # Can be any string
175
+ )
176
+
177
+ # Define a tool/function
178
+ tools = [
179
+ {
180
+ "type": "function",
181
+ "function": {
182
+ "name": "get_weather",
183
+ "description": "Get the current weather for a location",
184
+ "parameters": {
185
+ "type": "object",
186
+ "properties": {
187
+ "location": {
188
+ "type": "string",
189
+ "description": "The city and state, e.g. San Francisco, CA"
190
+ },
191
+ "unit": {
192
+ "type": "string",
193
+ "enum": ["celsius", "fahrenheit"],
194
+ "description": "The unit of temperature to return"
195
+ }
196
+ },
197
+ "required": ["location"]
198
+ }
199
+ }
200
+ }
201
+ ]
202
+
203
+ # Create a chat completion
204
+ response = client.chat.completions.create(
205
+ model="Salesforce/xLAM-2-1b-fc-r", # Model name doesn't matter, vLLM uses the served model
206
+ messages=[
207
+ {"role": "system", "content": "You are a helpful assistant that can use tools."},
208
+ {"role": "user", "content": "What's the weather like in San Francisco?"}
209
+ ],
210
+ tools=tools,
211
+ tool_choice="auto"
212
+ )
213
+
214
+ # Print the response
215
+ print("Assistant's response:")
216
+ print(json.dumps(response.model_dump(), indent=2))
217
+ ```
218
+
219
+ For more advanced configurations and deployment options, please refer to the [vLLM documentation](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html).
220
 
221
  ## Benchmark Results
222
 
 
224
  <p align="center">
225
  <img width="80%" alt="BFCL Results" src="https://github.com/apigen-mt/apigen-mt.github.io/blob/main/img/bfcl-result.png?raw=true">
226
  <br>
227
+ <small><i>Performance comparison of different models on [BFCL leaderboard](https://gorilla.cs.berkeley.edu/leaderboard.html). The rank is based on the overall accuracy, which is a weighted average of different evaluation categories. "FC" stands for function-calling mode in contrast to using a customized "prompt" to extract the function calls.</i></small>
228
  </p>
229
 
230
  ### Ο„-bench Benchmark
 
263
  }
264
  ```
265
 
266
+ Additionally, please check our other awesome related works regarding xLAM series and consider citing them as well:
267
+
268
+
269
  ```bibtex
270
  @article{zhang2025actionstudio,
271
  title={ActionStudio: A Lightweight Framework for Data and Training of Action Models},
 
284
  }
285
 
286
  ```
 
 
287
 
288
  ```bibtex
289
  @article{liu2024apigen,