zuxin-llm commited on
Commit
34024b6
·
verified ·
1 Parent(s): 4a3404e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -20
README.md CHANGED
@@ -22,9 +22,10 @@ library_name: transformers
22
 
23
 
24
  <p align="center">
 
25
  <a href="https://apigen-mt.github.io/">[Homepage]</a> |
26
- <a href="https://github.com/SalesforceAIResearch/xLAM">[Github]</a> |
27
- <a href="https://blog.salesforceairesearch.com/large-action-model-ai-agent/">[Blog]</a>
28
  </p>
29
  <hr>
30
 
@@ -38,7 +39,7 @@ The new **xLAM-2** series, built on our most advanced data synthesis, processing
38
  We've also refined the **chat template** and **vLLM integration**, making it easier to build advanced AI agents. Compared to previous xLAM models, xLAM-2 offers superior performance and seamless deployment across applications.
39
 
40
  <p align="center">
41
- <img width="100%" alt="Model Performance Overview" src="img/model_board.png">
42
  <br>
43
  <small><i>Comparative performance of larger xLAM-2-fc-r models (8B-70B, trained with APIGen-MT data) against state-of-the-art baselines on function-calling (BFCL v3, as of date 04/02/2025) and agentic (τ-bench) capabilities.</i></small>
44
  </p>
@@ -46,10 +47,9 @@ We've also refined the **chat template** and **vLLM integration**, making it eas
46
 
47
  ## Table of Contents
48
  - [Model Series](#model-series)
49
- - [Benchmark Results](#benchmark-results)
50
  - [Usage](#usage)
51
  - [Basic Usage with Huggingface Chat Template](#basic-usage-with-huggingface-chat-template)
52
- - [License](#license)
53
  - [Citation](#citation)
54
 
55
  ## Model Series
@@ -132,26 +132,25 @@ And then interact with the model using your preferred method for querying a vLLM
132
 
133
 
134
 
135
- <!-- ## Benchmark Results
136
- Note: **Bold** and <u>Underline</u> results denote the best result and the second best result for Success Rate, respectively.
137
-
138
- ### Berkeley Function-Calling Leaderboard (BFCL)
139
- ![xlam-bfcl](media/xlam-bfcl.png)
140
- *Table 1: Performance comparison on BFCL-v2 leaderboard (cutoff date 09/03/2024). The rank is based on the overall accuracy, which is a weighted average of different evaluation categories. "FC" stands for function-calling mode in contrast to using a customized "prompt" to extract the function calls.* -->
141
-
142
-
143
  ## Benchmark Results
144
 
145
  ### Berkeley Function-Calling Leaderboard (BFCL v3)
146
  <p align="center">
147
- <img width="80%" alt="BFCL Results" src="img/bfcl-result.png">
148
  <br>
149
  <small><i>Performance comparison of different models on BFCL leaderboard. The rank is based on the overall accuracy, which is a weighted average of different evaluation categories. "FC" stands for function-calling mode in contrast to using a customized "prompt" to extract the function calls.</i></small>
150
  </p>
151
 
152
  ### τ-bench Benchmark
 
 
 
 
 
 
 
153
  <p align="center">
154
- <img width="100%" alt="Pass^k curves" src="img/pass_k_curves_retail_airline.png">
155
  <br>
156
  <small><i>Pass^k curves measuring the probability that all 5 independent trials succeed for a given task, averaged across all tasks for τ-retail (left) and τ-airline (right) domains. Higher values indicate better consistency of the models.</i></small>
157
  </p>
@@ -165,9 +164,29 @@ This release is for research purposes only in support of an academic paper. Our
165
 
166
  For all Llama relevant models, please also follow corresponding Llama license and terms. Meta Llama 3 is licensed under the Meta Llama 3 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.
167
 
168
- <!-- ## Citation
169
 
170
- If you find this repo helpful, please consider to cite our papers:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
171
 
172
  ```bibtex
173
  @article{zhang2024xlam,
@@ -181,8 +200,10 @@ If you find this repo helpful, please consider to cite our papers:
181
  ```bibtex
182
  @article{liu2024apigen,
183
  title={Apigen: Automated pipeline for generating verifiable and diverse function-calling datasets},
184
- author={Liu, Zuxin and Hoang, Thai and Zhang, Jianguo and Zhu, Ming and Lan, Tian and Kokane, Shirley and Tan, Juntao and Yao, Weiran and Liu, Zhiwei and Feng, Yihao and others},
185
- journal={arXiv preprint arXiv:2406.18518},
 
 
186
  year={2024}
187
  }
188
  ```
@@ -194,5 +215,5 @@ If you find this repo helpful, please consider to cite our papers:
194
  journal={arXiv preprint arXiv:2402.15506},
195
  year={2024}
196
  }
197
- ``` -->
198
 
 
22
 
23
 
24
  <p align="center">
25
+ <a href="https://arxiv.org/abs/2504.03601">[Paper]</a> |
26
  <a href="https://apigen-mt.github.io/">[Homepage]</a> |
27
+ <a href="https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k">[Dataset (Coming Soon)]</a> |
28
+ <a href="https://github.com/SalesforceAIResearch/xLAM">[Github]</a>
29
  </p>
30
  <hr>
31
 
 
39
  We've also refined the **chat template** and **vLLM integration**, making it easier to build advanced AI agents. Compared to previous xLAM models, xLAM-2 offers superior performance and seamless deployment across applications.
40
 
41
  <p align="center">
42
+ <img width="100%" alt="Model Performance Overview" src="https://github.com/apigen-mt/apigen-mt.github.io/blob/main/img/model_board.png?raw=true">
43
  <br>
44
  <small><i>Comparative performance of larger xLAM-2-fc-r models (8B-70B, trained with APIGen-MT data) against state-of-the-art baselines on function-calling (BFCL v3, as of date 04/02/2025) and agentic (τ-bench) capabilities.</i></small>
45
  </p>
 
47
 
48
  ## Table of Contents
49
  - [Model Series](#model-series)
 
50
  - [Usage](#usage)
51
  - [Basic Usage with Huggingface Chat Template](#basic-usage-with-huggingface-chat-template)
52
+ - [Benchmark Results](#benchmark-results)
53
  - [Citation](#citation)
54
 
55
  ## Model Series
 
132
 
133
 
134
 
 
 
 
 
 
 
 
 
135
  ## Benchmark Results
136
 
137
  ### Berkeley Function-Calling Leaderboard (BFCL v3)
138
  <p align="center">
139
+ <img width="80%" alt="BFCL Results" src="https://github.com/apigen-mt/apigen-mt.github.io/blob/main/img/bfcl-result.png?raw=true">
140
  <br>
141
  <small><i>Performance comparison of different models on BFCL leaderboard. The rank is based on the overall accuracy, which is a weighted average of different evaluation categories. "FC" stands for function-calling mode in contrast to using a customized "prompt" to extract the function calls.</i></small>
142
  </p>
143
 
144
  ### τ-bench Benchmark
145
+
146
+ <p align="center">
147
+ <img width="80%" alt="Tau-bench Results" src="https://github.com/apigen-mt/apigen-mt.github.io/blob/main/img/taubench-result.png?raw=true">
148
+ <br>
149
+ <small><i>Success Rate (pass@1) on τ-bench benchmark averaged across at least 5 trials. Our xLAM-2-70b-fc-r model achieves an overall success rate of 56.2% on τ-bench, significantly outperforming the base Llama 3.1 70B Instruct model (38.2%) and other open-source models like DeepSeek v3 (40.6%). Notably, our best model even outperforms proprietary models such as GPT-4o (52.9%) and approaches the performance of more recent models like Claude 3.5 Sonnet (new) (60.1%).</i></small>
150
+ </p>
151
+
152
  <p align="center">
153
+ <img width="80%" alt="Pass^k curves" src="https://github.com/apigen-mt/apigen-mt.github.io/blob/main/img/pass_k_curves_retail_airline.png?raw=true">
154
  <br>
155
  <small><i>Pass^k curves measuring the probability that all 5 independent trials succeed for a given task, averaged across all tasks for τ-retail (left) and τ-airline (right) domains. Higher values indicate better consistency of the models.</i></small>
156
  </p>
 
164
 
165
  For all Llama relevant models, please also follow corresponding Llama license and terms. Meta Llama 3 is licensed under the Meta Llama 3 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.
166
 
167
+ ## Citation
168
 
169
+ If you use our model or dataset in your work, please cite our paper:
170
+
171
+ ```bibtex
172
+ @article{prabhakar2025apigenmt,
173
+ title={APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay},
174
+ author={Prabhakar, Akshara and Liu, Zuxin and Yao, Weiran and Zhang, Jianguo and Zhu, Ming and Wang, Shiyu and Liu, Zhiwei and Awalgaonkar, Tulika and Chen, Haolin and Hoang, Thai and Niebles, Juan Carlos and Heinecke, Shelby and Wang, Huan and Savarese, Silvio and Xiong, Caiming},
175
+ journal={arXiv preprint arXiv:2504.03601},
176
+ year={2025}
177
+ }
178
+ ```
179
+
180
+ Additionally, please check our other related works regarding xLAM and consider citing them as well:
181
+
182
+ ```bibtex
183
+ @article{zhang2025actionstudio,
184
+ title={ActionStudio: A Lightweight Framework for Data and Training of Action Models},
185
+ author={Zhang, Jianguo and Hoang, Thai and Zhu, Ming and Liu, Zuxin and Wang, Shiyu and Awalgaonkar, Tulika and Prabhakar, Akshara and Chen, Haolin and Yao, Weiran and Liu, Zhiwei and others},
186
+ journal={arXiv preprint arXiv:2503.22673},
187
+ year={2025}
188
+ }
189
+ ```
190
 
191
  ```bibtex
192
  @article{zhang2024xlam,
 
200
  ```bibtex
201
  @article{liu2024apigen,
202
  title={Apigen: Automated pipeline for generating verifiable and diverse function-calling datasets},
203
+ author={Liu, Zuxin and Hoang, Thai and Zhang, Jianguo and Zhu, Ming and Lan, Tian and Tan, Juntao and Yao, Weiran and Liu, Zhiwei and Feng, Yihao and RN, Rithesh and others},
204
+ journal={Advances in Neural Information Processing Systems},
205
+ volume={37},
206
+ pages={54463--54482},
207
  year={2024}
208
  }
209
  ```
 
215
  journal={arXiv preprint arXiv:2402.15506},
216
  year={2024}
217
  }
218
+ ```
219