zuxin-llm's picture
Update README (#3)
0633aae verified
|
raw
history blame
4.68 kB
metadata
license: cc-by-nc-4.0

xLAM

[Homepage] | [Github] | [Blog]


Model Summary

This repo provides the GGUF format for the Llama-xLAM-2-8b-fc-r model. Here's a link to original model Llama-xLAM-2-8b-fc-r. Large Action Models (LAMs) are advanced language models designed to enhance decision-making by translating user intentions into executable actions. As the brains of AI agents, LAMs autonomously plan and execute tasks to achieve specific goals, making them invaluable for automating workflows across diverse domains.

Model Overview

The new xLAM-2 series, built on our most advanced data synthesis, processing, and training pipelines, marks a significant leap in multi-turn reasoning and tool usage. It achieves state-of-the-art performance on function-calling benchmarks like BFCL and tau-bench. We've also refined the chat template and vLLM integration, making it easier to build advanced AI agents. Compared to previous xLAM models, xLAM-2 offers superior performance and seamless deployment across applications.
This model release is for research purposes only.

How to download GGUF files

  1. Install Hugging Face CLI:
pip install huggingface-hub
  1. Login to Hugging Face:
huggingface-cli login
  1. Download the GGUF model:
huggingface-cli download Salesforce/Llama-xLAM-2-8b-fc-r-gguf Llama-xLAM-2-8b-fc-r-gguf --local-dir . --local-dir-use-symlinks False

Prompt template

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{TASK_INSTRUCTION}
You have access to a set of tools. When using tools, make calls in a single JSON array: 

[{"name": "tool_call_name", "arguments": {"arg1": "value1", "arg2": "value2"}}, ... (additional parallel tool calls as needed)]

If no tool is suitable, state that explicitly. If the user's input lacks required parameters, ask for clarification. Do not interpret or respond until tool results are returned. Once they are available, process them or make additional calls if needed. For tasks that don't require tools, such as casual conversation or general advice, respond directly in plain text. The available tools are:

{AVAILABLE_TOOLS}

<|eot_id|><|start_header_id|>user<|end_header_id|>

{USER_QUERY}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{ASSISTANT_QUERY}<|eot_id|><|start_header_id|>user<|end_header_id|>

{USER_QUERY}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Usage

Command Line

  1. Install llama.cpp framework from the source here
  2. Run the inference task as below, to configure generation related paramter, refer to llama.cpp
llama-cli -m [PATH-TO-LOCAL-GGUF]

Python framwork

  1. Install llama-cpp-python
pip install llama-cpp-python
  1. Refer to llama-cpp-API, here's a example below
from llama_cpp import Llama
llm = Llama(
      model_path="[PATH-TO-MODEL]"
)
output = llm.create_chat_completion(
      messages = [
        {
          "role": "system",
          "content": "You are a helpful assistant that can use tools. You are developed by Salesforce xLAM team."

        },
        {
          "role": "user",
          "content": "Extract Jason is 25 years old"
        }
      ],
      tools=[{
        "type": "function",
        "function": {
          "name": "UserDetail",
          "parameters": {
            "type": "object",
            "title": "UserDetail",
            "properties": {
              "name": {
                "title": "Name",
                "type": "string"
              },
              "age": {
                "title": "Age",
                "type": "integer"
              }
            },
            "required": [ "name", "age" ]
          }
        }
      }],
      tool_choice={
        "type": "function",
        "function": {
          "name": "UserDetail"
        }
      }
)
print(output['choices'][0]['message'])