This model will take a instruction template in the format of FineTemplates and a document and return an instantiated instruction and answer pair.

The output will be a JSON object.

Simple Usage Example

import json
import re
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

# Helper to expand excerpts in the answer
def expand(document, text):
    excerpt_pattern = r"<excerpt>(.*?)<\.\.\.>(.*?)</excerpt>"
    matches = re.findall(excerpt_pattern, text, flags=re.DOTALL)
    replacements = {}
    for prefix, suffix in matches:
        match = re.search(
            re.escape(prefix) + r" (.*?) " + re.escape(suffix),
            document,
            flags=re.DOTALL,
        )
        try:
            if match:
                replacements[f"<excerpt>{prefix}<...>{suffix}</excerpt>"] = match.group(
                    0
                )
            else:
                return None
        except Exception:
            return None
    for old, new in replacements.items():
        text = text.replace(old, new)
    return text

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained('fineinstructions/template_instantiator', revision=None)
tokenizer.padding_side = 'left'
model = AutoModelForCausalLM.from_pretrained('fineinstructions/template_instantiator', revision=None)
pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, pad_token_id=tokenizer.pad_token_id, return_full_text=False)

# Run inference to instantiate the instruction template and generate an answer
inputs = [json.dumps({
  "instruction_template": "...",
  "document": "..."
}, indent=2)]
prompts = [tokenizer.apply_chat_template([{'role': 'user', 'content': i}], tokenize=False, add_generation_prompt=True) for i in inputs]
generations = pipe(prompts, max_length=131072, truncation=True, temperature=None, top_p=None, do_sample=False)
output = generations[0][0]['generated_text']
output_json = json.loads()

# Expand the answer
output_json["answer"] = expand(document=inputs[0]["document"], text=output_json["answer"])

# Print the output JSON
print(output_json)

##### Output JSON:
# {
# ..
# }
# 

This model was trained with a synthetic dataset with DataDreamer ๐Ÿค–๐Ÿ’ค. The synthetic dataset card and model card can be found here. The training arguments can be found here.

Downloads last month
5
Safetensors
Model size
1.24B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for fineinstructions/template_instantiator

Finetuned
(434)
this model