File size: 3,191 Bytes
61dd8e6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
---
license: apache-2.0
language: en
tags:
- llama
- instruction-residual
- parameter-efficient
- safetensors
- transformers
base_model:
- meta-llama/Llama-3.1-8B
- meta-llama/Llama-3.1-8B-Instruct
---

# Llama-3.1-8b-Instruct-Residual

**Full-rank instruction residual for Llama-3.1-8B**

This repository provides the **full-rank instruction residual** \(Δθ = θ_{instruct} - θ_{base}\) between the instruction-tuned Llama-3.1-8B-Instruct model and its corresponding base Llama-3.1-8B model. By adding this residual to a fresh base checkpoint, you can restore instruction-following capabilities **without** running a full fine-tuning cycle.

## How it was created

We follow the *instruction residual* approach introduced by Jindal et al. (2024):

> “In this section, we describe the instruction residual approach to simply regain the instruction following capabilities. We compute the instruction residual between an instruction following LLM \(θ_{i,d_1,v_1}\) and its corresponding base model \(θ_{b,d_1}\) in the parametric space as
> \[
> Θ_{r,v_1} = θ_{i,d_1,v_1} - θ_{b,d_1}.
> \]
> This tensor subtraction extracts the instruction-specific information, which can then be added to any base model.”

The full paper is available at: https://arxiv.org/abs/2410.10739

## Files

- `pytorch_model.safetensors` — full-rank FP16 residual weights (~16 GB).
- `config.json` — configuration matching the Llama-3.1-8B architecture.
- `README.md` — this model card.

## Usage

Below is a minimal example showing how to apply the residual to a base model:

```python
from transformers import AutoModelForCausalLM
from safetensors.torch import load_file
import torch

# 1) Load base
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-8B",
    torch_dtype=torch.float16,
    device_map="auto",
)

# 2) Load residual
residual_sd = load_file("pytorch_model.safetensors", device="cpu")

# 3) Apply residual
for name, delta in residual_sd.items():
    param = dict(model.named_parameters())[name]
    param.data += delta.to(param.device).to(param.dtype)

# 4) Save or push
model.save_pretrained("llama-3.1-8b-base-plus-instruct")
```

For full scripts, see the `examples/` folder.

## Intended Use & Limitations

- **Intended Use**: Add instruction-following capabilities to Llama-3.1-8B base models.
- **Limitations**:
  - Residual must match the exact base checkpoint.
  - Stored in FP16 (~16 GB); dequantization needed if working in 4-bit.
  - Applying to mismatched architectures will produce invalid weights.

## License

This residual is released under the **Apache License 2.0**. See the `LICENSE` file for details.

## References
As mentioned before this method was introduced by **Jindal et al., 2024**, arXiv:2410.10739.:

```bibtex
@misc{jindal2024balancingcontinuouspretraininginstruction,
  title={Balancing Continuous Pre-Training and Instruction Fine-Tuning: Optimizing Instruction-Following in LLMs},
  author={Ishan Jindal and Chandana Badrinath and Pranjal Bharti and Lakkidi Vinay and Sachin Dev Sharma},
  year={2024},
  eprint={2410.10739},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2410.10739},
}
```