File size: 8,109 Bytes
f9202ba
 
 
 
 
 
 
 
144716b
b64f686
fba37c0
86c2845
1b053c8
86c2845
b21a336
5852ce4
 
688f159
5852ce4
b21a336
a1d3723
 
b21a336
 
1b053c8
86c2845
1b053c8
86c2845
1b053c8
 
 
 
 
 
 
aa27650
 
 
 
1b053c8
 
 
 
 
 
 
 
 
86c2845
 
1b053c8
 
 
f9202ba
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9d90e84
 
 
 
 
 
 
 
 
 
 
 
f9202ba
 
 
 
 
 
 
 
 
 
 
9479caf
 
 
 
 
 
f5a9926
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
---
license: mit
language:
- en
tags:
- python
- ai
---
# 🎵 GhostAI Music Generator 🎸 & VOCAL UPDATE* barks.py 1.5B Optimized to run on 8GB Will release a Large model 12-24 GB soon UPDATE* Stable float16/32 working on INT8

FLOAT16/32 CUDA 11.8 & 12.1  4bit for lower end 8 bit full

Welcome to the GhostAI Music Generator! This web-based tool utilizes Meta AI's `musicgen-medium` model to craft high-quality instrumental tracks across genres such as Rock, Techno, Jazz, Classical, and Hip-Hop. The application structures compositions with sections like intros, verses, and choruses, all accessible through an intuitive Gradio interface. Outputs are high-quality MP3 files at 320 kbps, complete with embedded metadata. To enhance audio quality, we've integrated processing features including equalization (EQ), a chorus effect, and peak limiting for a polished sound.


![image/png](https://cdn-uploads.huggingface.co/production/uploads/6421b1c68adc8881b974a89d/LZkcrdpN5PQXOF4pj33bu.png)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6421b1c68adc8881b974a89d/sIIjdL3it8MSw9w5XBz0q.png)

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6421b1c68adc8881b974a89d/HcBK7X9373CVYO5zyo4YL.png)

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6421b1c68adc8881b974a89d/MoQb9arla6rXGepgFugNp.png)



## Project Evolution and Optimization

Initially, the project faced VRAM limitations on an NVIDIA RTX 3060 Ti with 7.69 GiB. To address this, we divided 30-second tracks into manageable chunks—first into three 10-second segments, then into two 15-second segments—to optimize memory usage. The Bark model was removed to focus solely on instrumental generation, and we standardized the output format to MP3 for broader compatibility. To achieve a more natural song flow, we varied prompts for each chunk. For instance, the first chunk might use "dynamic intro and expressive verse," while the second employs "powerful chorus and energetic outro," providing a realistic song structure.

Audio enhancements include:
- **EQ**: Low-pass filter at 6000 Hz and high-pass filter at 100 Hz.
- **Chorus Effect**: 20ms delay with a -4 dB gain.
- **Peak Limiting**: Strict limiting at -8.0 dB to control peaks.
- **Gain Adjustment**: +2 dB boost before crossfading to address amplitude dips.
- **Compression**: Removed to preserve dynamic range.



![image/png](https://cdn-uploads.huggingface.co/production/uploads/6421b1c68adc8881b974a89d/b78antJwwWAx-jFfXoYHk.png)

## System Requirements

To get started, ensure your system meets the following requirements:
- **Operating System**: Ubuntu (Note: Windows/macOS are untested).
- **GPU**: CUDA-capable GPU with at least 8 GB VRAM.
- **Python**: Version 3.10.
- **ffmpeg**: Installed for audio processing.

## Installation and Setup

1. **Clone the Repository**:
   ```bash
   git clone https://huggingface.co/your-username/ghostai-music-generator
   cd ghostai-music-generator
   ```

2. **Set Up a Virtual Environment**:
   ```bash
   python3 -m venv venv
   source venv/bin/activate
   ```

3. **Install PyTorch**:
   For CUDA 12.1:
   ```bash
   pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121
   ```
   For other CUDA versions, refer to the [PyTorch installation guide](https://pytorch.org/get-started/locally/).

4. **Install Other Dependencies**:
   ```bash
   pip install -r requirements.txt
   ```

5. **Install ffmpeg**:
   ```bash
   sudo apt-get install ffmpeg
   ```

6. **Authenticate with Hugging Face**:
   ```bash
   huggingface-cli login
   ```
   Retrieve your token from [Hugging Face Tokens](https://huggingface.co/settings/tokens).

7. **Request Access to the Model**:
   Visit [facebook/musicgen-medium](https://huggingface.co/facebook/musicgen-medium) and request access.

8. **Download and Place Model Weights**:
   ```bash
   mkdir -p /home/ubuntu/ghostai_music_generator/models/musicgen-medium
   ```
   Place the model weights in the directory above. If you store the model elsewhere, update the `local_model_path` in `app.py` accordingly.

## Running the Application

Start the application by executing:
```bash
python app.py
```
This will launch a Gradio UI at `http://0.0.0.0:9999`. Open this URL in your browser to access the interface.

## Using the Interface

Within the Gradio interface:

- **Select a Genre**: Choose from Rock, Techno, Jazz, Classical, or Hip-Hop.
- **Custom Prompt**: Enter a custom prompt, such as:
  ```
  Hard rock with a dynamic intro, expressive verse, and powerful chorus, featuring electric guitars, steady heavy drums, and deep bass.
  ```
- **Adjust Parameters**:
  - **Guidance Scale (CFG)**: Default is 3.0.
  - **Top-K Sampling**: Default is 300.
  - **Top-P Sampling**: Default is 0.95.
  - **Temperature**: Default is 1.0.
  - **Total Duration**: Set to 30 seconds (range: 10-60).
  - **Crossfade Duration**: Set to 500 ms (range: 100-2000).
- **Generate Music**: Click "Generate Music" to create the track. The output will be saved as `output_cleaned.mp3` and played within Gradio.

Monitor the terminal output for VRAM and GPU memory usage to ensure smooth operation.

## Troubleshooting and Customization

- **Quiet Spots in Waveform**: Edit `app.py` to increase gain before crossfading:
  ```python
  next_segment = next_segment + 3
  ```
  Use tools like Audacity to inspect and adjust the waveform.

- **Enhancing the Chorus**: Modify the second chunk prompt to:
  ```
  explosive chorus with soaring guitars and pounding drums
  ```
  Or increase the temperature to 1.2 and `top_k` to 350 in the UI.

- **Audio Distortion**: Reduce the chorus effect gain in `apply_chorus`:
  ```python
  delayed = segment - 6
  ```
  Adjust EQ settings in `apply_eq` with a high-pass at 80 Hz and low-pass at 5000 Hz.

- **MP3 Export Issues**: Ensure `ffmpeg` is installed:
  ```bash
  sudo apt-get install ffmpeg
  ```
  Check the existence of `chunk_{i}.mp3` and `output_cleaned.mp3` files.

- **VRAM Constraints**: Reduce the total duration to 20 seconds, close other GPU-intensive applications using `nvidia-smi`, and monitor usage with:
  ```python
  print(torch.cuda.memory_summary())
  ```

## Customization Options

- **Lock Dependencies**:
  ```bash
  pip freeze > requirements.txt
  ```

- **Add New Genres**: In `app.py`, define a new genre prompt:
  ```python
  def set_pop_prompt():
      return "Pop with a catchy intro, upbeat verse, and anthemic chorus, featuring bright synths, punchy drums, and groovy bass"
  ```
  Add a button for the new genre:
  ```python
  pop_btn = gr.Button("Pop", elem_classes="genre-btn")
  pop_btn.click(set_pop_prompt, inputs=None, outputs=[instrumental_prompt])
  ```

- **Edit MP3 Files**: Use Audacity or similar tools for more control over the final output.

- **Use a Smaller Model**: If VRAM is limited, switch to `musicgen-small` by updating `app.py`:
  ```python
  musicgen_model = MusicGen.get_pretrained('facebook/musicgen-small', device=device)
  ```


### Prerequisites
- Ubuntu system with Python 3.10 installed.
- NVIDIA RTX 3060 Ti GPU with CUDA support (CUDA 11.8 recommended).
- Internet connection to download the `musicgen-medium` model.

### Step 1: Make the Setup Script Executable
The `start_bash.sh` script sets up the virtual environment, installs dependencies, and downloads the `musicgen-medium` model. First, make the script executable:

```bash
chmod +x start_bash.sh

## License and Acknowledgments

This project is licensed under the MIT License. Please include a LICENSE file with the MIT License text.

Special thanks to:
- Meta AI for `musicgen-medium` and Audiocraft.
- Hugging Face for hosting and CLI tools.
- Gradio for the web interface.
- pydub for audio processing and MP3 export.
- xAI for their support.

Enjoy creating music! If you have questions or suggestions, feel free to open an issue on the repository. Let's make some tunes! 🎉


CUDA 12 MEMORY MANAGEMENT UPDATE



![image/png](https://cdn-uploads.huggingface.co/production/uploads/6421b1c68adc8881b974a89d/fzyGz3Ondrr_snqH8yHiG.png)