ArliAI
/

QwQ-32B-ArliAI-RpR-v3

@@ -12,7 +12,11 @@ base_model:
 <small>Image generated using Arli AI Image Generation https://www.arliai.com/image-generation</small>
-## RpR v3 Changes:
 - Fixed dissasociated thoughts:
@@ -22,9 +26,13 @@ base_model:
   The previous RpR v1 dataset was generated with vanilla QwQ which caused some refusals in both the thinking and response examples, with RpR v3 the dataset generation is now done using QwQ-abliterated which prevents any refusals from coming through.
-- Used QwQ-abliterated as base:
-  In an effort to further prevent random refusals and allowing the model to do anything you want it to do, RpR v3 now use an abliterated version of QwQ as the starting base for the LoRA being finetuned.
 ## RpR Series Overview: Building on RPMax with Reasoning
@@ -46,7 +54,7 @@ Ask questions in our new Discord Server https://discord.com/invite/t75KbPgwhk or
 ## Model Description
-QwQ-32B-ArliAI-RpR-v3 is the second release in the RpR series. It is a 32-billion parameter model fine-tuned using the RpR dataset based on the curated RPMax dataset combined with techniques to maintain reasoning abilities in long multi-turn chats.
 ### Specs
@@ -62,6 +70,7 @@ QwQ-32B-ArliAI-RpR-v3 is the second release in the RpR series. It is a 32-billio
 *   **Fine-tuning Method**: RS-QLORA+ (Rank-Stabilized LoRA + LoRA Plus 8x)
 *   **Rank/Alpha**: 128-rank 128-alpha
 *   **Learning Rate**: 0.00001
 *   **Gradient accumulation**: 32
 ### Very Nice Training graphs :)

 <small>Image generated using Arli AI Image Generation https://www.arliai.com/image-generation</small>
+## RpR v3 Changes compared to v1:
+- No longer use QwQ-abliterated as base:
+  v3 is a re-do of v2 but without the problems stemming from starting out with a QwQ-lorablated base. This turned out to not be a good move as it clearly lobotomizes the model more and was even visible from higher training and eval loss values.
 - Fixed dissasociated thoughts:
   The previous RpR v1 dataset was generated with vanilla QwQ which caused some refusals in both the thinking and response examples, with RpR v3 the dataset generation is now done using QwQ-abliterated which prevents any refusals from coming through.
+- Fixed nonsense words found in dataset:
+  There were a bunch of presumably censoring attempts found on the open datasets used for the RPMax/RpR datasets and these misplaced words/phrases has now been fixed to prevent the model from copying this behavior.
+- Rex scheduler:
+  v3 is trained using the newer and better Rex scheduler instead of the regular cosine scheduler in order to improve the model learning nuances from more of the dataset as this scheduler keeps the learning rate higher for longer.
 ## RpR Series Overview: Building on RPMax with Reasoning
 ## Model Description
+QwQ-32B-ArliAI-RpR-v3 is the third release in the RpR series. It is a 32-billion parameter model fine-tuned using the RpR dataset based on the curated RPMax dataset combined with techniques to maintain reasoning abilities in long multi-turn chats.
 ### Specs
 *   **Fine-tuning Method**: RS-QLORA+ (Rank-Stabilized LoRA + LoRA Plus 8x)
 *   **Rank/Alpha**: 128-rank 128-alpha
 *   **Learning Rate**: 0.00001
+*   **Scheduler**: Rex
 *   **Gradient accumulation**: 32
 ### Very Nice Training graphs :)