--- license: mit language: - en tags: - python - ai --- # 🎵 GhostAI Music Generator 🎸 & VOCAL UPDATE* barks.py 1.5B Optimized to run on 8GB Will release a Large model 12-24 GB soon UPDATE* Stable float16/32 working on INT8 FLOAT16/32 CUDA 11.8 & 12.1 4bit for lower end 8 bit full Welcome to the GhostAI Music Generator! This web-based tool utilizes Meta AI's `musicgen-medium` model to craft high-quality instrumental tracks across genres such as Rock, Techno, Jazz, Classical, and Hip-Hop. The application structures compositions with sections like intros, verses, and choruses, all accessible through an intuitive Gradio interface. Outputs are high-quality MP3 files at 320 kbps, complete with embedded metadata. To enhance audio quality, we've integrated processing features including equalization (EQ), a chorus effect, and peak limiting for a polished sound. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6421b1c68adc8881b974a89d/LZkcrdpN5PQXOF4pj33bu.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6421b1c68adc8881b974a89d/sIIjdL3it8MSw9w5XBz0q.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6421b1c68adc8881b974a89d/HcBK7X9373CVYO5zyo4YL.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6421b1c68adc8881b974a89d/MoQb9arla6rXGepgFugNp.png) ## Project Evolution and Optimization Initially, the project faced VRAM limitations on an NVIDIA RTX 3060 Ti with 7.69 GiB. To address this, we divided 30-second tracks into manageable chunks—first into three 10-second segments, then into two 15-second segments—to optimize memory usage. The Bark model was removed to focus solely on instrumental generation, and we standardized the output format to MP3 for broader compatibility. To achieve a more natural song flow, we varied prompts for each chunk. For instance, the first chunk might use "dynamic intro and expressive verse," while the second employs "powerful chorus and energetic outro," providing a realistic song structure. Audio enhancements include: - **EQ**: Low-pass filter at 6000 Hz and high-pass filter at 100 Hz. - **Chorus Effect**: 20ms delay with a -4 dB gain. - **Peak Limiting**: Strict limiting at -8.0 dB to control peaks. - **Gain Adjustment**: +2 dB boost before crossfading to address amplitude dips. - **Compression**: Removed to preserve dynamic range. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6421b1c68adc8881b974a89d/b78antJwwWAx-jFfXoYHk.png) ## System Requirements To get started, ensure your system meets the following requirements: - **Operating System**: Ubuntu (Note: Windows/macOS are untested). - **GPU**: CUDA-capable GPU with at least 8 GB VRAM. - **Python**: Version 3.10. - **ffmpeg**: Installed for audio processing. ## Installation and Setup 1. **Clone the Repository**: ```bash git clone https://huggingface.co/your-username/ghostai-music-generator cd ghostai-music-generator ``` 2. **Set Up a Virtual Environment**: ```bash python3 -m venv venv source venv/bin/activate ``` 3. **Install PyTorch**: For CUDA 12.1: ```bash pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121 ``` For other CUDA versions, refer to the [PyTorch installation guide](https://pytorch.org/get-started/locally/). 4. **Install Other Dependencies**: ```bash pip install -r requirements.txt ``` 5. **Install ffmpeg**: ```bash sudo apt-get install ffmpeg ``` 6. **Authenticate with Hugging Face**: ```bash huggingface-cli login ``` Retrieve your token from [Hugging Face Tokens](https://huggingface.co/settings/tokens). 7. **Request Access to the Model**: Visit [facebook/musicgen-medium](https://huggingface.co/facebook/musicgen-medium) and request access. 8. **Download and Place Model Weights**: ```bash mkdir -p /home/ubuntu/ghostai_music_generator/models/musicgen-medium ``` Place the model weights in the directory above. If you store the model elsewhere, update the `local_model_path` in `app.py` accordingly. ## Running the Application Start the application by executing: ```bash python app.py ``` This will launch a Gradio UI at `http://0.0.0.0:9999`. Open this URL in your browser to access the interface. ## Using the Interface Within the Gradio interface: - **Select a Genre**: Choose from Rock, Techno, Jazz, Classical, or Hip-Hop. - **Custom Prompt**: Enter a custom prompt, such as: ``` Hard rock with a dynamic intro, expressive verse, and powerful chorus, featuring electric guitars, steady heavy drums, and deep bass. ``` - **Adjust Parameters**: - **Guidance Scale (CFG)**: Default is 3.0. - **Top-K Sampling**: Default is 300. - **Top-P Sampling**: Default is 0.95. - **Temperature**: Default is 1.0. - **Total Duration**: Set to 30 seconds (range: 10-60). - **Crossfade Duration**: Set to 500 ms (range: 100-2000). - **Generate Music**: Click "Generate Music" to create the track. The output will be saved as `output_cleaned.mp3` and played within Gradio. Monitor the terminal output for VRAM and GPU memory usage to ensure smooth operation. ## Troubleshooting and Customization - **Quiet Spots in Waveform**: Edit `app.py` to increase gain before crossfading: ```python next_segment = next_segment + 3 ``` Use tools like Audacity to inspect and adjust the waveform. - **Enhancing the Chorus**: Modify the second chunk prompt to: ``` explosive chorus with soaring guitars and pounding drums ``` Or increase the temperature to 1.2 and `top_k` to 350 in the UI. - **Audio Distortion**: Reduce the chorus effect gain in `apply_chorus`: ```python delayed = segment - 6 ``` Adjust EQ settings in `apply_eq` with a high-pass at 80 Hz and low-pass at 5000 Hz. - **MP3 Export Issues**: Ensure `ffmpeg` is installed: ```bash sudo apt-get install ffmpeg ``` Check the existence of `chunk_{i}.mp3` and `output_cleaned.mp3` files. - **VRAM Constraints**: Reduce the total duration to 20 seconds, close other GPU-intensive applications using `nvidia-smi`, and monitor usage with: ```python print(torch.cuda.memory_summary()) ``` ## Customization Options - **Lock Dependencies**: ```bash pip freeze > requirements.txt ``` - **Add New Genres**: In `app.py`, define a new genre prompt: ```python def set_pop_prompt(): return "Pop with a catchy intro, upbeat verse, and anthemic chorus, featuring bright synths, punchy drums, and groovy bass" ``` Add a button for the new genre: ```python pop_btn = gr.Button("Pop", elem_classes="genre-btn") pop_btn.click(set_pop_prompt, inputs=None, outputs=[instrumental_prompt]) ``` - **Edit MP3 Files**: Use Audacity or similar tools for more control over the final output. - **Use a Smaller Model**: If VRAM is limited, switch to `musicgen-small` by updating `app.py`: ```python musicgen_model = MusicGen.get_pretrained('facebook/musicgen-small', device=device) ``` ### Prerequisites - Ubuntu system with Python 3.10 installed. - NVIDIA RTX 3060 Ti GPU with CUDA support (CUDA 11.8 recommended). - Internet connection to download the `musicgen-medium` model. ### Step 1: Make the Setup Script Executable The `start_bash.sh` script sets up the virtual environment, installs dependencies, and downloads the `musicgen-medium` model. First, make the script executable: ```bash chmod +x start_bash.sh ## License and Acknowledgments This project is licensed under the MIT License. Please include a LICENSE file with the MIT License text. Special thanks to: - Meta AI for `musicgen-medium` and Audiocraft. - Hugging Face for hosting and CLI tools. - Gradio for the web interface. - pydub for audio processing and MP3 export. - xAI for their support. Enjoy creating music! If you have questions or suggestions, feel free to open an issue on the repository. Let's make some tunes! 🎉 CUDA 12 MEMORY MANAGEMENT UPDATE ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6421b1c68adc8881b974a89d/fzyGz3Ondrr_snqH8yHiG.png)