OwenArli commited on
Commit
9885125
·
verified ·
1 Parent(s): cce793e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -4
README.md CHANGED
@@ -12,7 +12,11 @@ base_model:
12
 
13
  <small>Image generated using Arli AI Image Generation https://www.arliai.com/image-generation</small>
14
 
15
- ## RpR v3 Changes:
 
 
 
 
16
 
17
  - Fixed dissasociated thoughts:
18
 
@@ -22,9 +26,13 @@ base_model:
22
 
23
  The previous RpR v1 dataset was generated with vanilla QwQ which caused some refusals in both the thinking and response examples, with RpR v3 the dataset generation is now done using QwQ-abliterated which prevents any refusals from coming through.
24
 
25
- - Used QwQ-abliterated as base:
26
 
27
- In an effort to further prevent random refusals and allowing the model to do anything you want it to do, RpR v3 now use an abliterated version of QwQ as the starting base for the LoRA being finetuned.
 
 
 
 
28
 
29
  ## RpR Series Overview: Building on RPMax with Reasoning
30
 
@@ -46,7 +54,7 @@ Ask questions in our new Discord Server https://discord.com/invite/t75KbPgwhk or
46
 
47
  ## Model Description
48
 
49
- QwQ-32B-ArliAI-RpR-v3 is the second release in the RpR series. It is a 32-billion parameter model fine-tuned using the RpR dataset based on the curated RPMax dataset combined with techniques to maintain reasoning abilities in long multi-turn chats.
50
 
51
  ### Specs
52
 
@@ -62,6 +70,7 @@ QwQ-32B-ArliAI-RpR-v3 is the second release in the RpR series. It is a 32-billio
62
  * **Fine-tuning Method**: RS-QLORA+ (Rank-Stabilized LoRA + LoRA Plus 8x)
63
  * **Rank/Alpha**: 128-rank 128-alpha
64
  * **Learning Rate**: 0.00001
 
65
  * **Gradient accumulation**: 32
66
 
67
  ### Very Nice Training graphs :)
 
12
 
13
  <small>Image generated using Arli AI Image Generation https://www.arliai.com/image-generation</small>
14
 
15
+ ## RpR v3 Changes compared to v1:
16
+
17
+ - No longer use QwQ-abliterated as base:
18
+
19
+ v3 is a re-do of v2 but without the problems stemming from starting out with a QwQ-lorablated base. This turned out to not be a good move as it clearly lobotomizes the model more and was even visible from higher training and eval loss values.
20
 
21
  - Fixed dissasociated thoughts:
22
 
 
26
 
27
  The previous RpR v1 dataset was generated with vanilla QwQ which caused some refusals in both the thinking and response examples, with RpR v3 the dataset generation is now done using QwQ-abliterated which prevents any refusals from coming through.
28
 
29
+ - Fixed nonsense words found in dataset:
30
 
31
+ There were a bunch of presumably censoring attempts found on the open datasets used for the RPMax/RpR datasets and these misplaced words/phrases has now been fixed to prevent the model from copying this behavior.
32
+
33
+ - Rex scheduler:
34
+
35
+ v3 is trained using the newer and better Rex scheduler instead of the regular cosine scheduler in order to improve the model learning nuances from more of the dataset as this scheduler keeps the learning rate higher for longer.
36
 
37
  ## RpR Series Overview: Building on RPMax with Reasoning
38
 
 
54
 
55
  ## Model Description
56
 
57
+ QwQ-32B-ArliAI-RpR-v3 is the third release in the RpR series. It is a 32-billion parameter model fine-tuned using the RpR dataset based on the curated RPMax dataset combined with techniques to maintain reasoning abilities in long multi-turn chats.
58
 
59
  ### Specs
60
 
 
70
  * **Fine-tuning Method**: RS-QLORA+ (Rank-Stabilized LoRA + LoRA Plus 8x)
71
  * **Rank/Alpha**: 128-rank 128-alpha
72
  * **Learning Rate**: 0.00001
73
+ * **Scheduler**: Rex
74
  * **Gradient accumulation**: 32
75
 
76
  ### Very Nice Training graphs :)