Update README.md
Browse files
README.md
CHANGED
@@ -5,10 +5,12 @@ library_name: transformers
|
|
5 |
tags:
|
6 |
- mergekit
|
7 |
- merge
|
8 |
-
|
9 |
---
|
10 |
# mistral-7b-instruct-v0.3-depth-upscaling
|
11 |
|
|
|
|
|
12 |
This is an attempt at depth upscaling, Based on the paper [SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling](https://arxiv.org/abs/2312.15166), this model employs a depth up-scaling technique designed to efficiently scale large language models. The process begins with structural depthwise scaling which may initially reduce performance, but this is rapidly restored during a crucial continued pretraining phase. This phase optimizes the expanded model's parameters to the new depth configuration, significantly enhancing performance.
|
13 |
|
14 |
It's important to note that this represents only the initial phase of the model's development. The next critical steps involve fine-tuning,
|
@@ -37,4 +39,4 @@ slices:
|
|
37 |
layer_range: [8, 32]
|
38 |
merge_method: passthrough
|
39 |
dtype: bfloat16
|
40 |
-
```
|
|
|
5 |
tags:
|
6 |
- mergekit
|
7 |
- merge
|
8 |
+
license: apache-2.0
|
9 |
---
|
10 |
# mistral-7b-instruct-v0.3-depth-upscaling
|
11 |
|
12 |
+

|
13 |
+
|
14 |
This is an attempt at depth upscaling, Based on the paper [SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling](https://arxiv.org/abs/2312.15166), this model employs a depth up-scaling technique designed to efficiently scale large language models. The process begins with structural depthwise scaling which may initially reduce performance, but this is rapidly restored during a crucial continued pretraining phase. This phase optimizes the expanded model's parameters to the new depth configuration, significantly enhancing performance.
|
15 |
|
16 |
It's important to note that this represents only the initial phase of the model's development. The next critical steps involve fine-tuning,
|
|
|
39 |
layer_range: [8, 32]
|
40 |
merge_method: passthrough
|
41 |
dtype: bfloat16
|
42 |
+
```
|