Update README.md
Browse files
README.md
CHANGED
@@ -8,8 +8,11 @@ license: apache-2.0
|
|
8 |
language:
|
9 |
- en
|
10 |
---
|
|
|
11 |
# mistral-7b-instruct-v0.3-depth-upscaling
|
12 |
|
|
|
|
|
13 |

|
14 |
|
15 |
This is an attempt at depth upscaling , Based on the paper [SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling](https://arxiv.org/abs/2312.15166), which is a technique designed to efficiently scale large language models. The process begins with structural depthwise scaling which may initially reduce performance, but this is rapidly restored during a crucial continued pretraining phase. This phase optimizes the expanded model's parameters to the new depth configuration, significantly enhancing performance.
|
|
|
8 |
language:
|
9 |
- en
|
10 |
---
|
11 |
+
|
12 |
# mistral-7b-instruct-v0.3-depth-upscaling
|
13 |
|
14 |
+

|
15 |
+
|
16 |

|
17 |
|
18 |
This is an attempt at depth upscaling , Based on the paper [SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling](https://arxiv.org/abs/2312.15166), which is a technique designed to efficiently scale large language models. The process begins with structural depthwise scaling which may initially reduce performance, but this is rapidly restored during a crucial continued pretraining phase. This phase optimizes the expanded model's parameters to the new depth configuration, significantly enhancing performance.
|