README.md · Salesforce/Llama-xLAM-2-8b-fc-r-gguf at 43f095f0adccaf2faf7d8b604fbb2f59fbf3b723

metadata

license: cc-by-nc-4.0

xLAM

[Homepage] | [Github] | [Blog]

Model Summary

This repo provides the GGUF format for the Llama-xLAM-2-8b-fc-r model. Here's a link to original model Llama-xLAM-2-8b-fc-r. Large Action Models (LAMs) are advanced language models designed to enhance decision-making by translating user intentions into executable actions. As the brains of AI agents, LAMs autonomously plan and execute tasks to achieve specific goals, making them invaluable for automating workflows across diverse domains.

Model Overview

The new xLAM-2 series, built on our most advanced data synthesis, processing, and training pipelines, marks a significant leap in multi-turn reasoning and tool usage. It achieves state-of-the-art performance on function-calling benchmarks like BFCL and tau-bench. We've also refined the chat template and vLLM integration, making it easier to build advanced AI agents. Compared to previous xLAM models, xLAM-2 offers superior performance and seamless deployment across applications.
This model release is for research purposes only.

How to download GGUF files

Install Hugging Face CLI:

pip install huggingface-hub

Login to Hugging Face:

huggingface-cli login

Download the GGUF model:

huggingface-cli download Salesforce/Llama-xLAM-2-8b-fc-r-gguf Llama-xLAM-2-8b-fc-r-gguf --local-dir . --local-dir-use-symlinks False

Prompt template

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{TASK_INSTRUCTION}
You have access to a set of tools. When using tools, make calls in a single JSON array: 

[{"name": "tool_call_name", "arguments": {"arg1": "value1", "arg2": "value2"}}, ... (additional parallel tool calls as needed)]

If no tool is suitable, state that explicitly. If the user's input lacks required parameters, ask for clarification. Do not interpret or respond until tool results are returned. Once they are available, process them or make additional calls if needed. For tasks that don't require tools, such as casual conversation or general advice, respond directly in plain text. The available tools are:

{AVAILABLE_TOOLS}

<|eot_id|><|start_header_id|>user<|end_header_id|>

{USER_QUERY}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{ASSISTANT_QUERY}<|eot_id|><|start_header_id|>user<|end_header_id|>

{USER_QUERY}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Usage

Command Line

Install llama.cpp framework from the source here
Run the inference task as below, to configure generation related paramter, refer to llama.cpp

llama-cli -m [PATH-TO-LOCAL-GGUF]

Python framwork

Install llama-cpp-python

pip install llama-cpp-python

Refer to llama-cpp-API, here's a example below

from llama_cpp import Llama
llm = Llama(
      model_path="[PATH-TO-MODEL]"
)
output = llm.create_chat_completion(
      messages = [
        {
          "role": "system",
          "content": "You are a helpful assistant that can use tools. You are developed by Salesforce xLAM team."

        },
        {
          "role": "user",
          "content": "Extract Jason is 25 years old"
        }
      ],
      tools=[{
        "type": "function",
        "function": {
          "name": "UserDetail",
          "parameters": {
            "type": "object",
            "title": "UserDetail",
            "properties": {
              "name": {
                "title": "Name",
                "type": "string"
              },
              "age": {
                "title": "Age",
                "type": "integer"
              }
            },
            "required": [ "name", "age" ]
          }
        }
      }],
      tool_choice={
        "type": "function",
        "function": {
          "name": "UserDetail"
        }
      }
)
print(output['choices'][0]['message'])