GHOSTSONAFB / README.md
ghostai1's picture
Update README.md
f5a9926 verified
|
raw
history blame
8.11 kB
---
license: mit
language:
- en
tags:
- python
- ai
---
# 🎵 GhostAI Music Generator 🎸 & VOCAL UPDATE* barks.py 1.5B Optimized to run on 8GB Will release a Large model 12-24 GB soon UPDATE* Stable float16/32 working on INT8
FLOAT16/32 CUDA 11.8 & 12.1 4bit for lower end 8 bit full
Welcome to the GhostAI Music Generator! This web-based tool utilizes Meta AI's `musicgen-medium` model to craft high-quality instrumental tracks across genres such as Rock, Techno, Jazz, Classical, and Hip-Hop. The application structures compositions with sections like intros, verses, and choruses, all accessible through an intuitive Gradio interface. Outputs are high-quality MP3 files at 320 kbps, complete with embedded metadata. To enhance audio quality, we've integrated processing features including equalization (EQ), a chorus effect, and peak limiting for a polished sound.
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6421b1c68adc8881b974a89d/LZkcrdpN5PQXOF4pj33bu.png)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6421b1c68adc8881b974a89d/sIIjdL3it8MSw9w5XBz0q.png)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6421b1c68adc8881b974a89d/HcBK7X9373CVYO5zyo4YL.png)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6421b1c68adc8881b974a89d/MoQb9arla6rXGepgFugNp.png)
## Project Evolution and Optimization
Initially, the project faced VRAM limitations on an NVIDIA RTX 3060 Ti with 7.69 GiB. To address this, we divided 30-second tracks into manageable chunks—first into three 10-second segments, then into two 15-second segments—to optimize memory usage. The Bark model was removed to focus solely on instrumental generation, and we standardized the output format to MP3 for broader compatibility. To achieve a more natural song flow, we varied prompts for each chunk. For instance, the first chunk might use "dynamic intro and expressive verse," while the second employs "powerful chorus and energetic outro," providing a realistic song structure.
Audio enhancements include:
- **EQ**: Low-pass filter at 6000 Hz and high-pass filter at 100 Hz.
- **Chorus Effect**: 20ms delay with a -4 dB gain.
- **Peak Limiting**: Strict limiting at -8.0 dB to control peaks.
- **Gain Adjustment**: +2 dB boost before crossfading to address amplitude dips.
- **Compression**: Removed to preserve dynamic range.
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6421b1c68adc8881b974a89d/b78antJwwWAx-jFfXoYHk.png)
## System Requirements
To get started, ensure your system meets the following requirements:
- **Operating System**: Ubuntu (Note: Windows/macOS are untested).
- **GPU**: CUDA-capable GPU with at least 8 GB VRAM.
- **Python**: Version 3.10.
- **ffmpeg**: Installed for audio processing.
## Installation and Setup
1. **Clone the Repository**:
```bash
git clone https://huggingface.co/your-username/ghostai-music-generator
cd ghostai-music-generator
```
2. **Set Up a Virtual Environment**:
```bash
python3 -m venv venv
source venv/bin/activate
```
3. **Install PyTorch**:
For CUDA 12.1:
```bash
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121
```
For other CUDA versions, refer to the [PyTorch installation guide](https://pytorch.org/get-started/locally/).
4. **Install Other Dependencies**:
```bash
pip install -r requirements.txt
```
5. **Install ffmpeg**:
```bash
sudo apt-get install ffmpeg
```
6. **Authenticate with Hugging Face**:
```bash
huggingface-cli login
```
Retrieve your token from [Hugging Face Tokens](https://huggingface.co/settings/tokens).
7. **Request Access to the Model**:
Visit [facebook/musicgen-medium](https://huggingface.co/facebook/musicgen-medium) and request access.
8. **Download and Place Model Weights**:
```bash
mkdir -p /home/ubuntu/ghostai_music_generator/models/musicgen-medium
```
Place the model weights in the directory above. If you store the model elsewhere, update the `local_model_path` in `app.py` accordingly.
## Running the Application
Start the application by executing:
```bash
python app.py
```
This will launch a Gradio UI at `http://0.0.0.0:9999`. Open this URL in your browser to access the interface.
## Using the Interface
Within the Gradio interface:
- **Select a Genre**: Choose from Rock, Techno, Jazz, Classical, or Hip-Hop.
- **Custom Prompt**: Enter a custom prompt, such as:
```
Hard rock with a dynamic intro, expressive verse, and powerful chorus, featuring electric guitars, steady heavy drums, and deep bass.
```
- **Adjust Parameters**:
- **Guidance Scale (CFG)**: Default is 3.0.
- **Top-K Sampling**: Default is 300.
- **Top-P Sampling**: Default is 0.95.
- **Temperature**: Default is 1.0.
- **Total Duration**: Set to 30 seconds (range: 10-60).
- **Crossfade Duration**: Set to 500 ms (range: 100-2000).
- **Generate Music**: Click "Generate Music" to create the track. The output will be saved as `output_cleaned.mp3` and played within Gradio.
Monitor the terminal output for VRAM and GPU memory usage to ensure smooth operation.
## Troubleshooting and Customization
- **Quiet Spots in Waveform**: Edit `app.py` to increase gain before crossfading:
```python
next_segment = next_segment + 3
```
Use tools like Audacity to inspect and adjust the waveform.
- **Enhancing the Chorus**: Modify the second chunk prompt to:
```
explosive chorus with soaring guitars and pounding drums
```
Or increase the temperature to 1.2 and `top_k` to 350 in the UI.
- **Audio Distortion**: Reduce the chorus effect gain in `apply_chorus`:
```python
delayed = segment - 6
```
Adjust EQ settings in `apply_eq` with a high-pass at 80 Hz and low-pass at 5000 Hz.
- **MP3 Export Issues**: Ensure `ffmpeg` is installed:
```bash
sudo apt-get install ffmpeg
```
Check the existence of `chunk_{i}.mp3` and `output_cleaned.mp3` files.
- **VRAM Constraints**: Reduce the total duration to 20 seconds, close other GPU-intensive applications using `nvidia-smi`, and monitor usage with:
```python
print(torch.cuda.memory_summary())
```
## Customization Options
- **Lock Dependencies**:
```bash
pip freeze > requirements.txt
```
- **Add New Genres**: In `app.py`, define a new genre prompt:
```python
def set_pop_prompt():
return "Pop with a catchy intro, upbeat verse, and anthemic chorus, featuring bright synths, punchy drums, and groovy bass"
```
Add a button for the new genre:
```python
pop_btn = gr.Button("Pop", elem_classes="genre-btn")
pop_btn.click(set_pop_prompt, inputs=None, outputs=[instrumental_prompt])
```
- **Edit MP3 Files**: Use Audacity or similar tools for more control over the final output.
- **Use a Smaller Model**: If VRAM is limited, switch to `musicgen-small` by updating `app.py`:
```python
musicgen_model = MusicGen.get_pretrained('facebook/musicgen-small', device=device)
```
### Prerequisites
- Ubuntu system with Python 3.10 installed.
- NVIDIA RTX 3060 Ti GPU with CUDA support (CUDA 11.8 recommended).
- Internet connection to download the `musicgen-medium` model.
### Step 1: Make the Setup Script Executable
The `start_bash.sh` script sets up the virtual environment, installs dependencies, and downloads the `musicgen-medium` model. First, make the script executable:
```bash
chmod +x start_bash.sh
## License and Acknowledgments
This project is licensed under the MIT License. Please include a LICENSE file with the MIT License text.
Special thanks to:
- Meta AI for `musicgen-medium` and Audiocraft.
- Hugging Face for hosting and CLI tools.
- Gradio for the web interface.
- pydub for audio processing and MP3 export.
- xAI for their support.
Enjoy creating music! If you have questions or suggestions, feel free to open an issue on the repository. Let's make some tunes! 🎉
CUDA 12 MEMORY MANAGEMENT UPDATE
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6421b1c68adc8881b974a89d/fzyGz3Ondrr_snqH8yHiG.png)