giannisan commited on
Commit
bd71d13
·
verified ·
1 Parent(s): eddc666

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -5,10 +5,12 @@ library_name: transformers
5
  tags:
6
  - mergekit
7
  - merge
8
-
9
  ---
10
  # mistral-7b-instruct-v0.3-depth-upscaling
11
 
 
 
12
  This is an attempt at depth upscaling, Based on the paper [SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling](https://arxiv.org/abs/2312.15166), this model employs a depth up-scaling technique designed to efficiently scale large language models. The process begins with structural depthwise scaling which may initially reduce performance, but this is rapidly restored during a crucial continued pretraining phase. This phase optimizes the expanded model's parameters to the new depth configuration, significantly enhancing performance.
13
 
14
  It's important to note that this represents only the initial phase of the model's development. The next critical steps involve fine-tuning,
@@ -37,4 +39,4 @@ slices:
37
  layer_range: [8, 32]
38
  merge_method: passthrough
39
  dtype: bfloat16
40
- ```
 
5
  tags:
6
  - mergekit
7
  - merge
8
+ license: apache-2.0
9
  ---
10
  # mistral-7b-instruct-v0.3-depth-upscaling
11
 
12
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/643eab4f05a395e2b1c727e3/elcrExK_Q5MQjcdAjYi9V.png)
13
+
14
  This is an attempt at depth upscaling, Based on the paper [SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling](https://arxiv.org/abs/2312.15166), this model employs a depth up-scaling technique designed to efficiently scale large language models. The process begins with structural depthwise scaling which may initially reduce performance, but this is rapidly restored during a crucial continued pretraining phase. This phase optimizes the expanded model's parameters to the new depth configuration, significantly enhancing performance.
15
 
16
  It's important to note that this represents only the initial phase of the model's development. The next critical steps involve fine-tuning,
 
39
  layer_range: [8, 32]
40
  merge_method: passthrough
41
  dtype: bfloat16
42
+ ```