Add README.md
Browse files
README.md
CHANGED
@@ -1,12 +1,9 @@
|
|
1 |
---
|
2 |
license: other
|
3 |
-
datasets:
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
metrics:
|
8 |
-
- perplexity
|
9 |
-
- accuracy
|
10 |
base_model:
|
11 |
- jeffwan/llama-7b-hf
|
12 |
pipeline_tag: text-generation
|
@@ -45,10 +42,10 @@ from transformers import AutoModel
|
|
45 |
|
46 |
model = AutoModel.from_pretrained("MerantixMomentum/acip_llama1_7b", trust_remote_code=True)
|
47 |
```
|
48 |
-
This will download and create a fully parameterized ACIP model that can be pruned to any compression
|
49 |
For example,
|
50 |
```python
|
51 |
-
model.prune_model_by_score(
|
52 |
```
|
53 |
will prune `model` to 40% if its original size measured in number of parameters, i.e., 60% compression rate.
|
54 |
A unique feature of ACIP is that this operation is revertible in the sense that you can rerun `model.prune_model_by_score` as often as you like to evaluate your model at different sizes. Finally, you can "commit" to a certain ratio and run
|
@@ -65,7 +62,7 @@ to save even more memory (we have only tested 4bit quantization with `bitsandbyt
|
|
65 |
|
66 |
**🚀 That's it! You can now use your compressed model for inference or fine-tuning as any other Causal Language Model from 🤗 transformers.**
|
67 |
|
68 |
-
**Note**: The parameter `
|
69 |
|
70 |
# Dependencies
|
71 |
|
|
|
1 |
---
|
2 |
license: other
|
3 |
+
datasets: ['allenai/c4']
|
4 |
+
language: ['en']
|
5 |
+
metrics: ['perplexity', 'accuracy']
|
6 |
+
tags: ['acip', 'pytorch']
|
|
|
|
|
|
|
7 |
base_model:
|
8 |
- jeffwan/llama-7b-hf
|
9 |
pipeline_tag: text-generation
|
|
|
42 |
|
43 |
model = AutoModel.from_pretrained("MerantixMomentum/acip_llama1_7b", trust_remote_code=True)
|
44 |
```
|
45 |
+
This will download and create a fully parameterized ACIP model that can be pruned to any compression rate you wish.
|
46 |
For example,
|
47 |
```python
|
48 |
+
model.prune_model_by_score(size_ratio=0.4)
|
49 |
```
|
50 |
will prune `model` to 40% if its original size measured in number of parameters, i.e., 60% compression rate.
|
51 |
A unique feature of ACIP is that this operation is revertible in the sense that you can rerun `model.prune_model_by_score` as often as you like to evaluate your model at different sizes. Finally, you can "commit" to a certain ratio and run
|
|
|
62 |
|
63 |
**🚀 That's it! You can now use your compressed model for inference or fine-tuning as any other Causal Language Model from 🤗 transformers.**
|
64 |
|
65 |
+
**Note**: The parameter `size_ratio` ranges from 1.0 to 0.0, indicating the model size after compression. For example, 0.4 means that the model has only 40% of the original number of parameters and 1.0 means no compression at all. Alternatively, you can also set `compression_rate` in `prune_model_by_score`, which is equivalent to `size_ratio = 1.0 - compression_rate`.
|
66 |
|
67 |
# Dependencies
|
68 |
|