๐ต GhostAI Music Generator ๐ธ & VOCAL UPDATE* barks.py 1.5B Optimized to run on 8GB Will release a Large model 12-24 GB soon UPDATE* Stable float16/32 working on INT8
FLOAT16/32 CUDA 11.8 & 12.1 4bit for lower end 8 bit full
Welcome to the GhostAI Music Generator! This web-based tool utilizes Meta AI's musicgen-medium
model to craft high-quality instrumental tracks across genres such as Rock, Techno, Jazz, Classical, and Hip-Hop. The application structures compositions with sections like intros, verses, and choruses, all accessible through an intuitive Gradio interface. Outputs are high-quality MP3 files at 320 kbps, complete with embedded metadata. To enhance audio quality, we've integrated processing features including equalization (EQ), a chorus effect, and peak limiting for a polished sound.
Project Evolution and Optimization
Initially, the project faced VRAM limitations on an NVIDIA RTX 3060 Ti with 7.69 GiB. To address this, we divided 30-second tracks into manageable chunksโfirst into three 10-second segments, then into two 15-second segmentsโto optimize memory usage. The Bark model was removed to focus solely on instrumental generation, and we standardized the output format to MP3 for broader compatibility. To achieve a more natural song flow, we varied prompts for each chunk. For instance, the first chunk might use "dynamic intro and expressive verse," while the second employs "powerful chorus and energetic outro," providing a realistic song structure.
Audio enhancements include:
- EQ: Low-pass filter at 6000 Hz and high-pass filter at 100 Hz.
- Chorus Effect: 20ms delay with a -4 dB gain.
- Peak Limiting: Strict limiting at -8.0 dB to control peaks.
- Gain Adjustment: +2 dB boost before crossfading to address amplitude dips.
- Compression: Removed to preserve dynamic range.
System Requirements
To get started, ensure your system meets the following requirements:
- Operating System: Ubuntu (Note: Windows/macOS are untested).
- GPU: CUDA-capable GPU with at least 8 GB VRAM.
- Python: Version 3.10.
- ffmpeg: Installed for audio processing.
Installation and Setup
Clone the Repository:
git clone https://huggingface.co/your-username/ghostai-music-generator cd ghostai-music-generator
Set Up a Virtual Environment:
python3 -m venv venv source venv/bin/activate
Install PyTorch: For CUDA 12.1:
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121
For other CUDA versions, refer to the PyTorch installation guide.
Install Other Dependencies:
pip install -r requirements.txt
Install ffmpeg:
sudo apt-get install ffmpeg
Authenticate with Hugging Face:
huggingface-cli login
Retrieve your token from Hugging Face Tokens.
Request Access to the Model: Visit facebook/musicgen-medium and request access.
Download and Place Model Weights:
mkdir -p /home/ubuntu/ghostai_music_generator/models/musicgen-medium
Place the model weights in the directory above. If you store the model elsewhere, update the
local_model_path
inapp.py
accordingly.
Running the Application
Start the application by executing:
python app.py
This will launch a Gradio UI at http://0.0.0.0:9999
. Open this URL in your browser to access the interface.
Using the Interface
Within the Gradio interface:
- Select a Genre: Choose from Rock, Techno, Jazz, Classical, or Hip-Hop.
- Custom Prompt: Enter a custom prompt, such as:
Hard rock with a dynamic intro, expressive verse, and powerful chorus, featuring electric guitars, steady heavy drums, and deep bass.
- Adjust Parameters:
- Guidance Scale (CFG): Default is 3.0.
- Top-K Sampling: Default is 300.
- Top-P Sampling: Default is 0.95.
- Temperature: Default is 1.0.
- Total Duration: Set to 30 seconds (range: 10-60).
- Crossfade Duration: Set to 500 ms (range: 100-2000).
- Generate Music: Click "Generate Music" to create the track. The output will be saved as
output_cleaned.mp3
and played within Gradio.
Monitor the terminal output for VRAM and GPU memory usage to ensure smooth operation.
Troubleshooting and Customization
Quiet Spots in Waveform: Edit
app.py
to increase gain before crossfading:next_segment = next_segment + 3
Use tools like Audacity to inspect and adjust the waveform.
Enhancing the Chorus: Modify the second chunk prompt to:
explosive chorus with soaring guitars and pounding drums
Or increase the temperature to 1.2 and
top_k
to 350 in the UI.Audio Distortion: Reduce the chorus effect gain in
apply_chorus
:delayed = segment - 6
Adjust EQ settings in
apply_eq
with a high-pass at 80 Hz and low-pass at 5000 Hz.MP3 Export Issues: Ensure
ffmpeg
is installed:sudo apt-get install ffmpeg
Check the existence of
chunk_{i}.mp3
andoutput_cleaned.mp3
files.VRAM Constraints: Reduce the total duration to 20 seconds, close other GPU-intensive applications using
nvidia-smi
, and monitor usage with:print(torch.cuda.memory_summary())
Customization Options
Lock Dependencies:
pip freeze > requirements.txt
Add New Genres: In
app.py
, define a new genre prompt:def set_pop_prompt(): return "Pop with a catchy intro, upbeat verse, and anthemic chorus, featuring bright synths, punchy drums, and groovy bass"
Add a button for the new genre:
pop_btn = gr.Button("Pop", elem_classes="genre-btn") pop_btn.click(set_pop_prompt, inputs=None, outputs=[instrumental_prompt])
Edit MP3 Files: Use Audacity or similar tools for more control over the final output.
Use a Smaller Model: If VRAM is limited, switch to
musicgen-small
by updatingapp.py
:musicgen_model = MusicGen.get_pretrained('facebook/musicgen-small', device=device)
Prerequisites
- Ubuntu system with Python 3.10 installed.
- NVIDIA RTX 3060 Ti GPU with CUDA support (CUDA 11.8 recommended).
- Internet connection to download the
musicgen-medium
model.
Step 1: Make the Setup Script Executable
The start_bash.sh
script sets up the virtual environment, installs dependencies, and downloads the musicgen-medium
model. First, make the script executable:
chmod +x start_bash.sh
## License and Acknowledgments
This project is licensed under the MIT License. Please include a LICENSE file with the MIT License text.
Special thanks to:
- Meta AI for `musicgen-medium` and Audiocraft.
- Hugging Face for hosting and CLI tools.
- Gradio for the web interface.
- pydub for audio processing and MP3 export.
- xAI for their support.
Enjoy creating music! If you have questions or suggestions, feel free to open an issue on the repository. Let's make some tunes! ๐
CUDA 12 MEMORY MANAGEMENT UPDATE
