Spaces:
Runtime error
Runtime error
File size: 2,096 Bytes
ae57ea2 4a1fd53 ae57ea2 4a1fd53 ae57ea2 4a1fd53 ae57ea2 4a1fd53 ae57ea2 4a1fd53 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
# Phi-4 Training Critical Deployment Checklist
## Essential Configuration Requirements
### 1. Model Configuration
- [ ] Model name: `unsloth/phi-4-unsloth-bnb-4bit`
- [ ] BF16 precision enabled, FP16 disabled
- [ ] Appropriate sequence length (2048)
- [ ] LoRA parameters correctly configured (r: 32, alpha: 16)
### 2. Hardware & Resource Management
- [ ] Per-device batch size ≤ 16
- [ ] Gradient accumulation steps ≥ 3
- [ ] Gradient checkpointing enabled
- [ ] Memory usage limits properly set (85% of GPU capacity)
### 3. Critical Dataset Handling Rules
- [ ] **NO REORDERING of dataset entries** - original order must be preserved
- [ ] **NO COMBINING of separate entries** - each entry must remain distinct
- [ ] **SEQUENTIAL PROCESSING required** - entries must be processed one after another
- [ ] `sort_by_id` and `maintain_paper_order` flags properly set to preserve data sequence
- [ ] Sequential sampler used with no shuffling (`"shuffle": false`)
- [ ] Dataset sequential integrity verified with validation samples
- [ ] Conversation structure preserved (original format maintained)
### 4. Essential Error Handling
- [ ] Clear error catching for dataset loading issues
- [ ] Memory tracking at key training points
- [ ] Low-verbosity logging for HF Space compatibility
### 5. Training Core Requirements
- [ ] Appropriate learning rate (2e-5)
- [ ] Proper checkpointing frequency
- [ ] Hub settings correctly configured for model saving
---
## Pre-Deployment Verification
| Requirement | Status | Notes |
|-------------|--------|-------|
| Data sequential integrity | | Confirm entries processed in order |
| GPU memory within limits | | Check peak memory doesn't exceed 20GB per GPU |
| Training batch verification | | Verify first few batches maintain proper order |
---
**Current Hardware**: 4× NVIDIA L4 GPUs (24GB VRAM each)
**CRITICAL REMINDER**: Data sequence preservation is the highest priority - any shuffling, reordering, or combining of entries will compromise model quality.
*Last Updated: 2025-03-09* |