Model Card for Qwen2.5-Coder-7B-Instruct-NL2SH

This model translates natural language (English) instructions to Bash commands.

Model Details

Model Description

This model is a fine-tuned version of the Qwen2.5-Coder-7B-Instruct model trained on the NL2SH-ALFA dataset for the task of natural language to Bash translation (NL2SH). For more information, please refer to the paper.

Model Sources

Uses

Direct Use

This model is intended for research on machine translation. The model can also be used as an educational resource for learning Bash.

Out-of-Scope Use

This model should not be used in production or automated systems without human verification.

Considerations for use in high-risk environments: This model should not be used in high-risk environments due to its low accuracy and potential for generating harmful commands.

Bias, Risks, and Limitations

This model has a tendency to generate overly complex and incorrect Bash commands. It may produce harmful commands that delete data or corrupt a system. This model is not intended for natural languages other than English, scripting languages or than Bash, or multi-line Bash scripts.

Recommendations

Users are encouraged to use this model as Bash reference tool and should not execute commands without verification.

How to Get Started with the Model

Use the code below to get started with the model.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

def translate(prompt):
    model_name = "westenfelder/Qwen2.5-Coder-7B-Instruct-NL2SH"
    tokenizer = AutoTokenizer.from_pretrained(model_name, clean_up_tokenization_spaces=False)
    model = AutoModelForCausalLM.from_pretrained(model_name, device_map="cuda", torch_dtype=torch.bfloat16)
        
    messages = [
        {"role": "system", "content": "Your task is to translate a natural language instruction to a Bash command. You will receive an instruction in English and output a Bash command that can be run in a Linux terminal."},
        {"role": "user", "content": f"{prompt}"},
    ]

    tokens = tokenizer.apply_chat_template(
        messages,
        add_generation_prompt=True,
        tokenize=True,
        return_tensors="pt"
    ).to(model.device)

    attention_mask = torch.ones_like(tokens)

    outputs = model.generate(
        tokens,
        attention_mask=attention_mask,
        max_new_tokens=100,
        do_sample=False,
        temperature=None,
        top_p=None,
        top_k=None,
    )
    
    response = outputs[0][tokens.shape[-1]:]
    return tokenizer.decode(response, skip_special_tokens=True)


nl = "List files in the /workspace directory that were accessed over an hour ago."
sh = translate(nl)
print(sh)

Training Details

Training Data

This model was trained on the NL2SH-ALFA dataset.

Training Procedure

Please refer to section 4.1 and 4.3.4 of the paper for information about data pre-processing, training hyper-parameters and hardware.

Evaluation

This model was evaluated on the NL2SH-ALFA test set using the InterCode-ALFA benchmark.

Results

This model achieved an accuracy of 0.51 on the InterCode-ALFA benchmark.

Environmental Impact

Experiments were conducted using a private infrastructure, which has a approximate carbon efficiency of 0.432 kgCO2eq/kWh. A cumulative of 12 hours of computation was performed on hardware of type RTX A6000 (TDP of 300W). Total emissions are estimated to be 1.56 kgCO2eq of which 0 percents were directly offset. Estimations were conducted using the Machine Learning Emissions Calculator.

Citation

BibTeX:

@misc{westenfelder2025llmsupportednaturallanguagebash,
      title={LLM-Supported Natural Language to Bash Translation}, 
      author={Finnian Westenfelder and Erik Hemberg and Miguel Tulla and Stephen Moskal and Una-May O'Reilly and Silviu Chiricescu},
      year={2025},
      eprint={2502.06858},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.06858}, 
}

Model Card Authors

Finn Westenfelder

Model Card Contact

Please email [email protected] or make a pull request.

Downloads last month
5
Safetensors
Model size
7.62B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for westenfelder/Qwen2.5-Coder-7B-Instruct-NL2SH

Base model

Qwen/Qwen2.5-7B
Finetuned
(111)
this model
Quantizations
1 model

Dataset used to train westenfelder/Qwen2.5-Coder-7B-Instruct-NL2SH

Collection including westenfelder/Qwen2.5-Coder-7B-Instruct-NL2SH

Evaluation results