ljcamargo commited on
Commit
61dd8e6
·
verified ·
1 Parent(s): 7451839

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +95 -0
README.md ADDED
@@ -0,0 +1,95 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language: en
4
+ tags:
5
+ - llama
6
+ - instruction-residual
7
+ - parameter-efficient
8
+ - safetensors
9
+ - transformers
10
+ base_model:
11
+ - meta-llama/Llama-3.1-8B
12
+ - meta-llama/Llama-3.1-8B-Instruct
13
+ ---
14
+
15
+ # Llama-3.1-8b-Instruct-Residual
16
+
17
+ **Full-rank instruction residual for Llama-3.1-8B**
18
+
19
+ This repository provides the **full-rank instruction residual** \(Δθ = θ_{instruct} - θ_{base}\) between the instruction-tuned Llama-3.1-8B-Instruct model and its corresponding base Llama-3.1-8B model. By adding this residual to a fresh base checkpoint, you can restore instruction-following capabilities **without** running a full fine-tuning cycle.
20
+
21
+ ## How it was created
22
+
23
+ We follow the *instruction residual* approach introduced by Jindal et al. (2024):
24
+
25
+ > “In this section, we describe the instruction residual approach to simply regain the instruction following capabilities. We compute the instruction residual between an instruction following LLM \(θ_{i,d_1,v_1}\) and its corresponding base model \(θ_{b,d_1}\) in the parametric space as
26
+ > \[
27
+ > Θ_{r,v_1} = θ_{i,d_1,v_1} - θ_{b,d_1}.
28
+ > \]
29
+ > This tensor subtraction extracts the instruction-specific information, which can then be added to any base model.”
30
+
31
+ The full paper is available at: https://arxiv.org/abs/2410.10739
32
+
33
+ ## Files
34
+
35
+ - `pytorch_model.safetensors` — full-rank FP16 residual weights (~16 GB).
36
+ - `config.json` — configuration matching the Llama-3.1-8B architecture.
37
+ - `README.md` — this model card.
38
+
39
+ ## Usage
40
+
41
+ Below is a minimal example showing how to apply the residual to a base model:
42
+
43
+ ```python
44
+ from transformers import AutoModelForCausalLM
45
+ from safetensors.torch import load_file
46
+ import torch
47
+
48
+ # 1) Load base
49
+ model = AutoModelForCausalLM.from_pretrained(
50
+ "meta-llama/Llama-3.1-8B",
51
+ torch_dtype=torch.float16,
52
+ device_map="auto",
53
+ )
54
+
55
+ # 2) Load residual
56
+ residual_sd = load_file("pytorch_model.safetensors", device="cpu")
57
+
58
+ # 3) Apply residual
59
+ for name, delta in residual_sd.items():
60
+ param = dict(model.named_parameters())[name]
61
+ param.data += delta.to(param.device).to(param.dtype)
62
+
63
+ # 4) Save or push
64
+ model.save_pretrained("llama-3.1-8b-base-plus-instruct")
65
+ ```
66
+
67
+ For full scripts, see the `examples/` folder.
68
+
69
+ ## Intended Use & Limitations
70
+
71
+ - **Intended Use**: Add instruction-following capabilities to Llama-3.1-8B base models.
72
+ - **Limitations**:
73
+ - Residual must match the exact base checkpoint.
74
+ - Stored in FP16 (~16 GB); dequantization needed if working in 4-bit.
75
+ - Applying to mismatched architectures will produce invalid weights.
76
+
77
+ ## License
78
+
79
+ This residual is released under the **Apache License 2.0**. See the `LICENSE` file for details.
80
+
81
+ ## References
82
+ As mentioned before this method was introduced by **Jindal et al., 2024**, arXiv:2410.10739.:
83
+
84
+ ```bibtex
85
+ @misc{jindal2024balancingcontinuouspretraininginstruction,
86
+ title={Balancing Continuous Pre-Training and Instruction Fine-Tuning: Optimizing Instruction-Following in LLMs},
87
+ author={Ishan Jindal and Chandana Badrinath and Pranjal Bharti and Lakkidi Vinay and Sachin Dev Sharma},
88
+ year={2024},
89
+ eprint={2410.10739},
90
+ archivePrefix={arXiv},
91
+ primaryClass={cs.CL},
92
+ url={https://arxiv.org/abs/2410.10739},
93
+ }
94
+ ```
95
+