giannisan commited on
Commit
d0030b9
·
verified ·
1 Parent(s): fd0b337

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -0
README.md CHANGED
@@ -8,8 +8,11 @@ license: apache-2.0
8
  language:
9
  - en
10
  ---
 
11
  # mistral-7b-instruct-v0.3-depth-upscaling
12
 
 
 
13
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/643eab4f05a395e2b1c727e3/elcrExK_Q5MQjcdAjYi9V.png)
14
 
15
  This is an attempt at depth upscaling , Based on the paper [SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling](https://arxiv.org/abs/2312.15166), which is a technique designed to efficiently scale large language models. The process begins with structural depthwise scaling which may initially reduce performance, but this is rapidly restored during a crucial continued pretraining phase. This phase optimizes the expanded model's parameters to the new depth configuration, significantly enhancing performance.
 
8
  language:
9
  - en
10
  ---
11
+
12
  # mistral-7b-instruct-v0.3-depth-upscaling
13
 
14
+ ![image/webp](https://cdn-uploads.huggingface.co/production/uploads/643eab4f05a395e2b1c727e3/qwYq9q2PpTfYwb1nsym9u.webp)
15
+
16
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/643eab4f05a395e2b1c727e3/elcrExK_Q5MQjcdAjYi9V.png)
17
 
18
  This is an attempt at depth upscaling , Based on the paper [SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling](https://arxiv.org/abs/2312.15166), which is a technique designed to efficiently scale large language models. The process begins with structural depthwise scaling which may initially reduce performance, but this is rapidly restored during a crucial continued pretraining phase. This phase optimizes the expanded model's parameters to the new depth configuration, significantly enhancing performance.