# Phi-4 Training Space Deployment Checklist ## Critical Configuration Review Before updating the Hugging Face Space, verify each of these items to prevent deployment issues: ### 1. Model Configuration ✓ - [ ] Confirmed model name in transformers_config.json: `unsloth/phi-4-unsloth-bnb-4bit` - [ ] BF16 precision enabled, FP16 disabled (`"bf16": true, "fp16": false`) - [ ] Chat template correctly set to `"phi"` in config - [ ] LoRA parameters properly configured: - [ ] `r`: 32 - [ ] `lora_alpha`: 16 - [ ] `target_modules`: All required attention modules included - [ ] Max sequence length matches dataset needs (default: 2048) ### 2. GPU & Memory Management ✓ - [ ] Per-device batch size set to 16 or lower - [ ] Gradient accumulation steps set to 3 or higher - [ ] Device mapping set to "auto" for multi-GPU - [ ] Max memory limit set to 85% of each GPU's capacity - [ ] `PYTORCH_CUDA_ALLOC_CONF` includes `"expandable_segments:True"` - [ ] Gradient checkpointing enabled (`"gradient_checkpointing": true`) - [ ] Dataloader workers reduced to 2 (from 4) - [ ] FSDP configuration enabled for multi-GPU setups ### 3. Dataset Handling ✓ - [ ] Dataset configuration correctly specified in dataset_config.json - [ ] Conversation structure preserved (id + conversations fields) - [ ] SimpleDataCollator configured to use apply_chat_template - [ ] No re-ordering or sorting of the dataset (preserves original order) - [ ] Sequential sampler used in dataloader (no shuffling) - [ ] Max sequence length of 2048 applied - [ ] Format validation for first few examples enabled ### 4. Dependency Management ✓ - [ ] requirements.txt includes all necessary packages: - [ ] unsloth - [ ] peft - [ ] bitsandbytes - [ ] einops - [ ] sentencepiece - [ ] datasets - [ ] transformers - [ ] Optional packages marked as such (e.g., flash-attn) - [ ] Dependency version constraints avoid known conflicts ### 5. Error Handling & Logging ✓ - [ ] Proper error catching for dataset loading - [ ] Fallback mechanisms for chat template application - [ ] Clear, concise log messages that work with HF Space interface - [ ] Memory usage tracking at key points (start, end, periodic) - [ ] Third-party loggers set to WARNING to reduce noise - [ ] Low-verbosity log format for better HF Space compatibility ### 6. Training Setup ✓ - [ ] Number of epochs properly configured (default: 3) - [ ] Learning rate appropriate (default: 2e-5) - [ ] Warmup ratio set (default: 0.05) - [ ] Checkpointing frequency set to reasonable value (default: 100 steps) - [ ] Output directory correctly configured - [ ] HuggingFace Hub parameters set correctly if pushing models ### 7. Pre-Flight Verification ✓ - [ ] No linting errors or indentation issues - [ ] Updated config values are consistent across files - [ ] Batch size × gradient accumulation × GPUs gives reasonable total batch - [ ] Verified that requirements.txt matches actual imports in code - [ ] Confirmed tokenizer settings match the model requirements --- ## Last-Minute Configuration Changes If you've made any configuration changes, record them here before deployment: | Date | Parameter Changed | Old Value | New Value | Reason | Reviewer | |------|-------------------|-----------|-----------|--------|----------| | | | | | | | | | | | | | | --- ## Deployment Notes **Current Space Hardware**: 4× NVIDIA L4 GPUs (24GB VRAM each) **Expected Training Speed**: ~XXX examples/second with current configuration **Memory Requirements**: Peak usage expected to be ~20GB per GPU **Common Issues to Watch For**: - OOM errors on GPU 0: If seen, reduce batch size by 2 and increase grad accumulation by 1 - Imbalanced GPU usage: Check device mapping and FSDP configuration - Slow training: Verify that all GPUs are being utilized efficiently - Log flooding: Reduce verbosity of component logs (transformers, datasets, etc.) --- *Last Updated: 2025-03-09*