|
# FramePack Dancing Image-to-Video Generation |
|
|
|
This repository contains the necessary steps and scripts to generate videos using the Dancing image-to-video model. The model leverages LoRA (Low-Rank Adaptation) weights and pre-trained components to create high-quality anime-style videos based on textual prompts. |
|
|
|
## Prerequisites |
|
|
|
Before proceeding, ensure that you have the following installed on your system: |
|
|
|
• **Ubuntu** (or a compatible Linux distribution) |
|
• **Python 3.x** |
|
• **pip** (Python package manager) |
|
• **Git** |
|
• **Git LFS** (Git Large File Storage) |
|
• **FFmpeg** |
|
|
|
## Installation |
|
|
|
1. **Update and Install Dependencies** |
|
|
|
```bash |
|
sudo apt-get update && sudo apt-get install cbm git-lfs ffmpeg |
|
``` |
|
|
|
2. **Clone the Repository** |
|
|
|
```bash |
|
git clone https://huggingface.co/svjack/YiChen_FramePack_lora_early |
|
cd YiChen_FramePack_lora_early |
|
``` |
|
|
|
3. **Install Python Dependencies** |
|
|
|
```bash |
|
pip install torch torchvision |
|
pip install -r requirements.txt |
|
pip install ascii-magic matplotlib tensorboard huggingface_hub datasets |
|
pip install moviepy==1.0.3 |
|
pip install sageattention==1.0.6 |
|
``` |
|
|
|
4. **Download Model Weights** |
|
|
|
```bash |
|
git clone https://huggingface.co/lllyasviel/FramePackI2V_HY |
|
git clone https://huggingface.co/hunyuanvideo-community/HunyuanVideo |
|
git clone https://huggingface.co/Comfy-Org/HunyuanVideo_repackaged |
|
git clone https://huggingface.co/Comfy-Org/sigclip_vision_384 |
|
``` |
|
|
|
## Usage |
|
|
|
To generate a video, use the `fpack_generate_video.py` script with the appropriate parameters. Below are examples of how to generate videos using the Dancing model. |
|
|
|
|
|
|
|
### 1. Furina |
|
- Source Image |
|
|
|
|
|
```bash |
|
python fpack_generate_video.py \ |
|
--dit FramePackI2V_HY/diffusion_pytorch_model-00001-of-00003.safetensors \ |
|
--vae HunyuanVideo/vae/diffusion_pytorch_model.safetensors \ |
|
--text_encoder1 HunyuanVideo_repackaged/split_files/text_encoders/llava_llama3_fp16.safetensors \ |
|
--text_encoder2 HunyuanVideo_repackaged/split_files/text_encoders/clip_l.safetensors \ |
|
--image_encoder sigclip_vision_384/sigclip_vision_patch14_384.safetensors \ |
|
--image_path fln.png \ |
|
--prompt "In the style of Yi Chen Dancing White Background , The character's movements shift dynamically throughout the video, transitioning from poised stillness to lively dance steps. Her expressions evolve seamlessly—starting with focused determination, then flashing surprise as she executes a quick spin, before breaking into a joyful smile mid-leap. Her hands flow through choreographed positions, sometimes extending gracefully like unfolding wings, other times clapping rhythmically against her wrists. During a dramatic hip sway, her fingers fan open near her cheek, then sweep downward as her whole body dips into a playful crouch, the sequins on her costume catching the light with every motion." \ |
|
--video_size 960 544 --video_seconds 3 --fps 30 --infer_steps 25 \ |
|
--attn_mode sdpa --fp8_scaled \ |
|
--vae_chunk_size 32 --vae_spatial_tile_sample_min_size 128 \ |
|
--save_path save --output_type both \ |
|
--seed 1234 --lora_multiplier 1.0 --lora_weight framepack_yichen_output/framepack-yichen-lora-000006.safetensors |
|
|
|
|
|
``` |
|
|
|
- Without Lora |
|
|
|
- With Lora |
|
|
|
|
|
### 2. Roper |
|
- Source Image |
|
|
|
|
|
|
|
```bash |
|
python fpack_generate_video.py \ |
|
--dit FramePackI2V_HY/diffusion_pytorch_model-00001-of-00003.safetensors \ |
|
--vae HunyuanVideo/vae/diffusion_pytorch_model.safetensors \ |
|
--text_encoder1 HunyuanVideo_repackaged/split_files/text_encoders/llava_llama3_fp16.safetensors \ |
|
--text_encoder2 HunyuanVideo_repackaged/split_files/text_encoders/clip_l.safetensors \ |
|
--image_encoder sigclip_vision_384/sigclip_vision_patch14_384.safetensors \ |
|
--image_path shengjiang.png \ |
|
--prompt "In the style of Yi Chen Dancing White Background , The character's movements shift dynamically throughout the video, transitioning from poised stillness to lively dance steps. Her expressions evolve seamlessly—starting with focused determination, then flashing surprise as she executes a quick spin, before breaking into a joyful smile mid-leap. Her hands flow through choreographed positions, sometimes extending gracefully like unfolding wings, other times clapping rhythmically against her wrists. During a dramatic hip sway, her fingers fan open near her cheek, then sweep downward as her whole body dips into a playful crouch, the sequins on her costume catching the light with every motion." \ |
|
--video_size 960 544 --video_seconds 3 --fps 30 --infer_steps 25 \ |
|
--attn_mode sdpa --fp8_scaled \ |
|
--vae_chunk_size 32 --vae_spatial_tile_sample_min_size 128 \ |
|
--save_path save --output_type both \ |
|
--seed 1234 --lora_multiplier 1.0 --lora_weight framepack_yichen_output/framepack-yichen-lora-000006.safetensors |
|
|
|
``` |
|
|
|
- With Lora |
|
|
|
|
|
|
|
### 3. Varesa |
|
- Source Image |
|
|
|
|
|
```bash |
|
python fpack_generate_video.py \ |
|
--dit FramePackI2V_HY/diffusion_pytorch_model-00001-of-00003.safetensors \ |
|
--vae HunyuanVideo/vae/diffusion_pytorch_model.safetensors \ |
|
--text_encoder1 HunyuanVideo_repackaged/split_files/text_encoders/llava_llama3_fp16.safetensors \ |
|
--text_encoder2 HunyuanVideo_repackaged/split_files/text_encoders/clip_l.safetensors \ |
|
--image_encoder sigclip_vision_384/sigclip_vision_patch14_384.safetensors \ |
|
--image_path waliesha.jpg \ |
|
--prompt "In the style of Yi Chen Dancing White Background , The dancer’s energy pulses in waves—one moment a statue, poised and precise, the next a whirl of motion as her feet flicker across the floor. Her face tells its own story: brows knit in concentration, then eyes widening mid-turn as if startled by her own speed, before dissolving into laughter as she springs upward, weightless. Her arms carve the air—now arcing like ribbons unfurling, now snapping sharp as a whip’s crack, palms meeting wrists in staccato beats. A roll of her hips sends her fingers fluttering near her temple, then cascading down as she folds into a teasing dip, the beads on her dress scattering light like sparks." \ |
|
--video_size 960 544 --video_seconds 3 --fps 30 --infer_steps 25 \ |
|
--attn_mode sdpa --fp8_scaled \ |
|
--vae_chunk_size 32 --vae_spatial_tile_sample_min_size 128 \ |
|
--save_path save --output_type both \ |
|
--seed 1234 --lora_multiplier 1.0 --lora_weight framepack_yichen_output/framepack-yichen-lora-000006.safetensors |
|
|
|
``` |
|
- With Lora |
|
|
|
|
|
|
|
### 4. Scaramouche |
|
- Source Image |
|
|
|
|
|
```bash |
|
python fpack_generate_video.py \ |
|
--dit FramePackI2V_HY/diffusion_pytorch_model-00001-of-00003.safetensors \ |
|
--vae HunyuanVideo/vae/diffusion_pytorch_model.safetensors \ |
|
--text_encoder1 HunyuanVideo_repackaged/split_files/text_encoders/llava_llama3_fp16.safetensors \ |
|
--text_encoder2 HunyuanVideo_repackaged/split_files/text_encoders/clip_l.safetensors \ |
|
--image_encoder sigclip_vision_384/sigclip_vision_patch14_384.safetensors \ |
|
--image_path shanbing.jpg \ |
|
--prompt "In the style of Yi Chen Dancing White Background , The dancer’s energy pulses in waves—one moment a statue, poised and precise, the next a whirl of motion as her feet flicker across the floor. Her face tells its own story: brows knit in concentration, then eyes widening mid-turn as if startled by her own speed, before dissolving into laughter as she springs upward, weightless. Her arms carve the air—now arcing like ribbons unfurling, now snapping sharp as a whip’s crack, palms meeting wrists in staccato beats. A roll of her hips sends her fingers fluttering near her temple, then cascading down as she folds into a teasing dip, the beads on her dress scattering light like sparks." \ |
|
--video_size 960 544 --video_seconds 3 --fps 30 --infer_steps 25 \ |
|
--attn_mode sdpa --fp8_scaled \ |
|
--vae_chunk_size 32 --vae_spatial_tile_sample_min_size 128 \ |
|
--save_path save --output_type both \ |
|
--seed 1234 --lora_multiplier 1.0 --lora_weight framepack_yichen_output/framepack-yichen-lora-000006.safetensors |
|
|
|
``` |
|
|
|
- With Lora |
|
|
|
|
|
|
|
|
|
## Parameters |
|
|
|
* `--fp8`: Enable FP8 precision (optional). |
|
* `--task`: Specify the task (e.g., `t2v-1.3B`). |
|
* `--video_size`: Set the resolution of the generated video (e.g., `1024 1024`). |
|
* `--video_length`: Define the length of the video in frames. |
|
* `--infer_steps`: Number of inference steps. |
|
* `--save_path`: Directory to save the generated video. |
|
* `--output_type`: Output type (e.g., `both` for video and frames). |
|
* `--dit`: Path to the diffusion model weights. |
|
* `--vae`: Path to the VAE model weights. |
|
* `--t5`: Path to the T5 model weights. |
|
* `--attn_mode`: Attention mode (e.g., `torch`). |
|
* `--lora_weight`: Path to the LoRA weights. |
|
* `--lora_multiplier`: Multiplier for LoRA weights. |
|
* `--prompt`: Textual prompt for video generation. |
|
|
|
|
|
|
|
## Output |
|
|
|
The generated video and frames will be saved in the specified `save_path` directory. |
|
|
|
## Troubleshooting |
|
|
|
• Ensure all dependencies are correctly installed. |
|
• Verify that the model weights are downloaded and placed in the correct locations. |
|
• Check for any missing Python packages and install them using `pip`. |
|
|
|
## License |
|
|
|
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details. |
|
|
|
## Acknowledgments |
|
|
|
• **Hugging Face** for hosting the model weights. |
|
• **Wan-AI** for providing the pre-trained models. |
|
• **DeepBeepMeep** for contributing to the model weights. |
|
|
|
## Contact |
|
|
|
For any questions or issues, please open an issue on the repository or contact the maintainer. |
|
|
|
--- |
|
|
|
|