Upload 5 files
Browse files- README.md +52 -7
- media/xlam-bfcl.png +0 -0
- media/xlam-toolbench.png +0 -0
- media/xlam-unified_toolquery.png +0 -0
- media/xlam-webshop_toolquery.png +0 -0
README.md
CHANGED
@@ -26,9 +26,11 @@ tags:
|
|
26 |
<img width="500px" alt="xLAM" src="https://huggingface.co/datasets/jianguozhang/logos/resolve/main/xlam-no-background.png">
|
27 |
</p>
|
28 |
<p align="center">
|
29 |
-
<a href="">[Homepage]</a> |
|
30 |
-
<a href="">[Paper]</a> |
|
31 |
-
<a href="https://github.com/SalesforceAIResearch/xLAM">[Github]</a>
|
|
|
|
|
32 |
</p>
|
33 |
<hr>
|
34 |
|
@@ -102,7 +104,7 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
102 |
|
103 |
torch.random.manual_seed(0)
|
104 |
|
105 |
-
model_name = "Salesforce/xLAM-
|
106 |
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto", trust_remote_code=True)
|
107 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
108 |
|
@@ -418,12 +420,55 @@ Output:
|
|
418 |
{"thought": "", "tool_calls": [{"name": "get_earthquake_info", "arguments": {"location": "California"}}]}
|
419 |
````
|
420 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
421 |
|
422 |
## License
|
423 |
The model is distributed under the CC-BY-NC-4.0 license.
|
424 |
|
425 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
426 |
|
427 |
-
If you find this repo helpful, please cite our paper:
|
428 |
```bibtex
|
429 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
26 |
<img width="500px" alt="xLAM" src="https://huggingface.co/datasets/jianguozhang/logos/resolve/main/xlam-no-background.png">
|
27 |
</p>
|
28 |
<p align="center">
|
29 |
+
<a href="https://www.salesforceairesearch.com/projects/xlam-large-action-models">[Homepage]</a> |
|
30 |
+
<a href="https://arxiv.org/abs/2409.03215">[Paper]</a> |
|
31 |
+
<a href="https://github.com/SalesforceAIResearch/xLAM">[Github]</a> |
|
32 |
+
<a href="https://blog.salesforceairesearch.com/large-action-model-ai-agent/">[Blog]</a> |
|
33 |
+
<a href="https://huggingface.co/spaces/Tonic/Salesforce-Xlam-7b-r">[Community Demo]</a>
|
34 |
</p>
|
35 |
<hr>
|
36 |
|
|
|
104 |
|
105 |
torch.random.manual_seed(0)
|
106 |
|
107 |
+
model_name = "Salesforce/xLAM-7b-r"
|
108 |
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto", trust_remote_code=True)
|
109 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
110 |
|
|
|
420 |
{"thought": "", "tool_calls": [{"name": "get_earthquake_info", "arguments": {"location": "California"}}]}
|
421 |
````
|
422 |
|
423 |
+
## Benchmark Results
|
424 |
+
Note: **Bold** and <u>Underline</u> results denote the best result and the second best result for Success Rate, respectively.
|
425 |
+
|
426 |
+
### Berkeley Function-Calling Leaderboard (BFCL)
|
427 |
+
![xlam-bfcl](media/xlam-bfcl.png)
|
428 |
+
*Table 1: Performance comparison on BFCL-v2 leaderboard (cutoff date 09/03/2024). The rank is based on the overall accuracy, which is a weighted average of different evaluation categories. "FC" stands for function-calling mode in contrast to using a customized "prompt" to extract the function calls.*
|
429 |
+
|
430 |
+
### Webshop and ToolQuery
|
431 |
+
![xlam-webshop_toolquery](media/xlam-webshop_toolquery.png)
|
432 |
+
*Table 2: Testing results on Webshop and ToolQuery. Bold and Underline results denote the best result and the second best result for Success Rate, respectively.*
|
433 |
+
|
434 |
+
### Unified ToolQuery
|
435 |
+
![xlam-unified_toolquery](media/xlam-unified_toolquery.png)
|
436 |
+
*Table 3: Testing results on ToolQuery-Unified. Bold and Underline results denote the best result and the second best result for Success Rate, respectively. Values in brackets indicate corresponding performance on ToolQuery*
|
437 |
+
|
438 |
+
### ToolBench
|
439 |
+
![xlam-toolbench](media/xlam-toolbench.png)
|
440 |
+
*Table 4: Pass Rate on ToolBench on three distinct scenarios. Bold and Underline results denote the best result and the second best result for each setting, respectively. The results for xLAM-8x22b-r are unavailable due to the ToolBench server being down between 07/28/2024 and our evaluation cutoff date 09/03/2024.*
|
441 |
|
442 |
## License
|
443 |
The model is distributed under the CC-BY-NC-4.0 license.
|
444 |
|
445 |
+
## Citation
|
446 |
+
|
447 |
+
If you find this repo helpful, please consider to cite our papers:
|
448 |
+
|
449 |
+
```bibtex
|
450 |
+
@article{zhang2024xlam,
|
451 |
+
title={xLAM: A Family of Large Action Models to Empower AI Agent Systems},
|
452 |
+
author={Zhang, Jianguo and Lan, Tian and Zhu, Ming and Liu, Zuxin and Hoang, Thai and Kokane, Shirley and Yao, Weiran and Tan, Juntao and Prabhakar, Akshara and Chen, Haolin and others},
|
453 |
+
journal={arXiv preprint arXiv:2409.03215},
|
454 |
+
year={2024}
|
455 |
+
}
|
456 |
+
```
|
457 |
+
|
458 |
+
```bibtex
|
459 |
+
@article{liu2024apigen,
|
460 |
+
title={Apigen: Automated pipeline for generating verifiable and diverse function-calling datasets},
|
461 |
+
author={Liu, Zuxin and Hoang, Thai and Zhang, Jianguo and Zhu, Ming and Lan, Tian and Kokane, Shirley and Tan, Juntao and Yao, Weiran and Liu, Zhiwei and Feng, Yihao and others},
|
462 |
+
journal={arXiv preprint arXiv:2406.18518},
|
463 |
+
year={2024}
|
464 |
+
}
|
465 |
+
```
|
466 |
|
|
|
467 |
```bibtex
|
468 |
+
@article{zhang2024agentohana,
|
469 |
+
title={AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning},
|
470 |
+
author={Zhang, Jianguo and Lan, Tian and Murthy, Rithesh and Liu, Zhiwei and Yao, Weiran and Tan, Juntao and Hoang, Thai and Yang, Liangwei and Feng, Yihao and Liu, Zuxin and others},
|
471 |
+
journal={arXiv preprint arXiv:2402.15506},
|
472 |
+
year={2024}
|
473 |
+
}
|
474 |
+
```
|
media/xlam-bfcl.png
ADDED
media/xlam-toolbench.png
ADDED
media/xlam-unified_toolquery.png
ADDED
media/xlam-webshop_toolquery.png
ADDED