giannisan commited on
Commit
ad01868
·
verified ·
1 Parent(s): c32defb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -38
README.md CHANGED
@@ -1,38 +1,40 @@
1
- ---
2
- base_model:
3
- - mistralai/Mistral-7B-Instruct-v0.3
4
- library_name: transformers
5
- tags:
6
- - mergekit
7
- - merge
8
-
9
- ---
10
- # mistral-7b-instruct-v0.3-depth-upscaling
11
-
12
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
13
-
14
- ## Merge Details
15
- ### Merge Method
16
-
17
- This model was merged using the passthrough merge method.
18
-
19
- ### Models Merged
20
-
21
- The following models were included in the merge:
22
- * [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)
23
-
24
- ### Configuration
25
-
26
- The following YAML configuration was used to produce this model:
27
-
28
- ```yaml
29
- slices:
30
- - sources:
31
- - model: mistralai/Mistral-7B-Instruct-v0.3
32
- layer_range: [0, 24]
33
- - sources:
34
- - model: mistralai/Mistral-7B-Instruct-v0.3
35
- layer_range: [8, 32]
36
- merge_method: passthrough
37
- dtype: bfloat16
38
- ```
 
 
 
1
+ ---
2
+ base_model:
3
+ - mistralai/Mistral-7B-Instruct-v0.3
4
+ library_name: transformers
5
+ tags:
6
+ - mergekit
7
+ - merge
8
+
9
+ ---
10
+ # mistral-7b-instruct-v0.3-depth-upscaling
11
+
12
+ This is an attempt at depth upscaling, Based on the paper "SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling" from arXiv document [2312.15166], this model employs a depth up-scaling technique designed to efficiently scale large language models. The process begins with structural depthwise scaling which may initially reduce performance, but this is rapidly restored during a crucial continued pretraining phase. This phase optimizes the expanded model's parameters to the new depth configuration, significantly enhancing performance.
13
+
14
+ It's important to note that this represents only the initial phase of the model's development. The next critical steps involve fine-tuning,
15
+
16
+ ## Merge Details
17
+ ### Merge Method
18
+
19
+ This model was merged using the passthrough merge method.
20
+
21
+ ### Models Merged
22
+
23
+ The following models were included in the merge:
24
+ * [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)
25
+
26
+ ### Configuration
27
+
28
+ The following YAML configuration was used to produce this model:
29
+
30
+ ```yaml
31
+ slices:
32
+ - sources:
33
+ - model: mistralai/Mistral-7B-Instruct-v0.3
34
+ layer_range: [0, 24]
35
+ - sources:
36
+ - model: mistralai/Mistral-7B-Instruct-v0.3
37
+ layer_range: [8, 32]
38
+ merge_method: passthrough
39
+ dtype: bfloat16
40
+ ```