--- base_model: - zerofata/L3.3-GeneticLemonade-Unleashed-70B library_name: transformers license: llama3 --- GENETIC LEMONADE UNLEASHED v3

GENETIC LEMONADE

UNLEASHED v3
![image/png](https://cdn-uploads.huggingface.co/production/uploads/65b19c6c638328850e12d38c/_XKaHDAVin1ZkdlHyh09q.png)

01 // OVERVIEW

An experimental release.

zerofata/GeneticLemonade-Unleashed SFT+DPO QLora finetune.

This is a creative model intended to excel at character driven RP / ERP. It has not been tested or trained on adventure stories or any large amounts of creative writing.

This model is designed to provide longer, narrative heavy responses where characters are portrayed accurately and proactively.

02 // SILLYTAVERN SETTINGS

Play with these, they are not the 'best' settings just a stable baseline.

Recommended Samplers

> Temp: 0.9 - 1.0
> MinP: 0.03 - 0.04
> TopP: 0.9 - 1.0
> Dry: 0.8, 1.75, 4

Instruct

Llama-3-Instruct-Names but you will need to uncheck "System same as user".

03 // QUANTIZATIONS

04 // TRAINING PROCESS

The model first went through SFT with a small synthetic dataset of 2.9 million tokens, approximately 750 conversations. Primarily RP data with small amounts of random instruct / assistant data and creative writing.

The model then went through DPO training using approx 1100 chosen examples from the SFT dataset that were of exceptional quality or showed verifiable instruction following. Rejected samples were generated using another Llama 3.3 finetune that is known for poor instruction following.

Axolotl configs

Neither are optimized for cost / performance efficiency, YMMV.

SFT 1*H200

```yml # ==================== # MODEL CONFIGURATION # ==================== base_model: zerofata/L3.3-GeneticLemonade-Unleashed-70B model_type: AutoModelForCausalLM tokenizer_type: AutoTokenizer special_tokens: pad_token: "<|finetune_right_pad_id|>" chat_template: llama3 # ==================== # DATASET CONFIGURATION # ==================== datasets: - path: ./dataset.jsonl type: chat_template split: train chat_template_strategy: tokenizer field_messages: messages message_property_mappings: role: role content: content roles: user: ["user"] assistant: ["assistant"] system: ["system"] test_datasets: - path: ./validate_dataset.jsonl type: chat_template split: train chat_template_strategy: tokenizer field_messages: messages message_property_mappings: role: role content: content roles: user: ["user"] assistant: ["assistant"] system: ["system"] dataset_prepared_path: train_on_inputs: false # Only train on assistant responses # ==================== # QLORA CONFIGURATION # ==================== adapter: qlora load_in_4bit: true lora_r: 64 lora_alpha: 128 lora_dropout: 0.1 lora_target_linear: true # lora_modules_to_save: # Uncomment only if you added NEW tokens # ==================== # TRAINING PARAMETERS # ==================== num_epochs: 2 micro_batch_size: 4 gradient_accumulation_steps: 2 learning_rate: 1.5e-5 optimizer: paged_adamw_8bit lr_scheduler: rex warmup_ratio: 0.05 weight_decay: 0.01 max_grad_norm: 1.0 # ==================== # SEQUENCE & PACKING # ==================== sequence_len: 8192 sample_packing: true eval_sample_packing: false pad_to_sequence_len: true # ==================== # HARDWARE OPTIMIZATIONS # ==================== bf16: auto flash_attention: true gradient_checkpointing: true # ==================== # EVALUATION & CHECKPOINTING # ==================== evaluation_strategy: steps eval_steps: 5 save_strategy: steps save_steps: 5 save_total_limit: 5 # Keep best + last few checkpoints load_best_model_at_end: true metric_for_best_model: eval_loss greater_is_better: false early_stopping_patience: 5 # ==================== # LOGGING & OUTPUT # ==================== output_dir: ./output_model logging_steps: 2 save_safetensors: true # ==================== # WANDB TRACKING # ==================== wandb_project: project_name # wandb_entity: your_entity # wandb_name: your_run_name ```

DPO 2*H200

```yml # ==================== # MODEL CONFIGURATION # ==================== base_model: ApocalypseParty/unleashed-fulldata30 model_type: AutoModelForCausalLM tokenizer_type: AutoTokenizer special_tokens: {} chat_template: tokenizer_default # ==================== # RL/DPO CONFIGURATION # ==================== rl: dpo rl_beta: 0.07 # ==================== # DATASET CONFIGURATION # ==================== datasets: - path: ./dpo_cleaned-v3_deduplicated.jsonl type: chat_template.default field_messages: conversation field_chosen: chosen field_rejected: rejected message_property_mappings: role: role content: content roles: system: ["system"] user: ["user"] assistant: ["assistant"] dataset_prepared_path: train_on_inputs: false # Only train on assistant responses # ==================== # QLORA CONFIGURATION # ==================== adapter: qlora load_in_4bit: true lora_r: 32 lora_alpha: 64 lora_dropout: 0.05 lora_target_linear: true # lora_modules_to_save: # Uncomment only if you added NEW tokens # ==================== # TRAINING PARAMETERS # ==================== num_epochs: 1 micro_batch_size: 4 gradient_accumulation_steps: 2 learning_rate: 2e-6 optimizer: adamw_8bit lr_scheduler: cosine warmup_steps: 5 weight_decay: 0.01 max_grad_norm: 1.0 # ==================== # SEQUENCE CONFIGURATION # ==================== sequence_len: 4096 pad_to_sequence_len: true # ==================== # HARDWARE OPTIMIZATIONS # ==================== bf16: auto tf32: false flash_attention: true gradient_checkpointing: offload deepspeed: deepspeed_configs/zero1.json # ==================== # CHECKPOINTING # ==================== save_steps: 10 save_total_limit: 10 load_best_model_at_end: true metric_for_best_model: eval_loss greater_is_better: false # ==================== # LOGGING & OUTPUT # ==================== output_dir: ./dpo_model logging_steps: 2 save_safetensors: true # ==================== # WANDB TRACKING # ==================== wandb_project: project_name # wandb_entity: your_entity # wandb_name: your_run_name ```