Update README.md
Browse files
README.md
CHANGED
@@ -11,7 +11,7 @@ license: apache-2.0
|
|
11 |
|
12 |

|
13 |
|
14 |
-
This is an attempt at depth upscaling, Based on the paper [SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling](https://arxiv.org/abs/2312.15166),
|
15 |
|
16 |
It's important to note that this represents only the initial phase of the model's development. The next critical steps involve fine-tuning,
|
17 |
|
|
|
11 |
|
12 |

|
13 |
|
14 |
+
This is an attempt at depth upscaling , Based on the paper [SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling](https://arxiv.org/abs/2312.15166), which is a technique designed to efficiently scale large language models. The process begins with structural depthwise scaling which may initially reduce performance, but this is rapidly restored during a crucial continued pretraining phase. This phase optimizes the expanded model's parameters to the new depth configuration, significantly enhancing performance.
|
15 |
|
16 |
It's important to note that this represents only the initial phase of the model's development. The next critical steps involve fine-tuning,
|
17 |
|