giannisan
/

Mistral-10.7B-Instruct-v0.3-depth-upscaling

Text Generation

text-generation-inference

Model card Files Files and versions Community

giannisan commited on May 31, 2024

Commit

d0030b9

·

verified ·

1 Parent(s): fd0b337

Update README.md

Files changed (1) hide show

README.md +3 -0

README.md CHANGED Viewed

@@ -8,8 +8,11 @@ license: apache-2.0
 language:
 - en
 ---
 # mistral-7b-instruct-v0.3-depth-upscaling
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/643eab4f05a395e2b1c727e3/elcrExK_Q5MQjcdAjYi9V.png)
 This is an attempt at depth upscaling , Based on the paper [SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling](https://arxiv.org/abs/2312.15166), which is a technique designed to efficiently scale large language models. The process begins with structural depthwise scaling which may initially reduce performance, but this is rapidly restored during a crucial continued pretraining phase. This phase optimizes the expanded model's parameters to the new depth configuration, significantly enhancing performance.

 language:
 - en
 ---
 # mistral-7b-instruct-v0.3-depth-upscaling
+![image/webp](https://cdn-uploads.huggingface.co/production/uploads/643eab4f05a395e2b1c727e3/qwYq9q2PpTfYwb1nsym9u.webp)
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/643eab4f05a395e2b1c727e3/elcrExK_Q5MQjcdAjYi9V.png)
 This is an attempt at depth upscaling , Based on the paper [SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling](https://arxiv.org/abs/2312.15166), which is a technique designed to efficiently scale large language models. The process begins with structural depthwise scaling which may initially reduce performance, but this is rapidly restored during a crucial continued pretraining phase. This phase optimizes the expanded model's parameters to the new depth configuration, significantly enhancing performance.