๐ŸŽต GhostAI Music Generator ๐ŸŽธ & VOCAL UPDATE* barks.py 1.5B Optimized to run on 8GB Will release a Large model 12-24 GB soon UPDATE* Stable float16/32 working on INT8

FLOAT16/32 CUDA 11.8 & 12.1 4bit for lower end 8 bit full

Welcome to the GhostAI Music Generator! This web-based tool utilizes Meta AI's musicgen-medium model to craft high-quality instrumental tracks across genres such as Rock, Techno, Jazz, Classical, and Hip-Hop. The application structures compositions with sections like intros, verses, and choruses, all accessible through an intuitive Gradio interface. Outputs are high-quality MP3 files at 320 kbps, complete with embedded metadata. To enhance audio quality, we've integrated processing features including equalization (EQ), a chorus effect, and peak limiting for a polished sound.

image/png image/png

image/png

image/png

Project Evolution and Optimization

Initially, the project faced VRAM limitations on an NVIDIA RTX 3060 Ti with 7.69 GiB. To address this, we divided 30-second tracks into manageable chunksโ€”first into three 10-second segments, then into two 15-second segmentsโ€”to optimize memory usage. The Bark model was removed to focus solely on instrumental generation, and we standardized the output format to MP3 for broader compatibility. To achieve a more natural song flow, we varied prompts for each chunk. For instance, the first chunk might use "dynamic intro and expressive verse," while the second employs "powerful chorus and energetic outro," providing a realistic song structure.

Audio enhancements include:

  • EQ: Low-pass filter at 6000 Hz and high-pass filter at 100 Hz.
  • Chorus Effect: 20ms delay with a -4 dB gain.
  • Peak Limiting: Strict limiting at -8.0 dB to control peaks.
  • Gain Adjustment: +2 dB boost before crossfading to address amplitude dips.
  • Compression: Removed to preserve dynamic range.

image/png

System Requirements

To get started, ensure your system meets the following requirements:

  • Operating System: Ubuntu (Note: Windows/macOS are untested).
  • GPU: CUDA-capable GPU with at least 8 GB VRAM.
  • Python: Version 3.10.
  • ffmpeg: Installed for audio processing.

Installation and Setup

  1. Clone the Repository:

    git clone https://huggingface.co/your-username/ghostai-music-generator
    cd ghostai-music-generator
    
  2. Set Up a Virtual Environment:

    python3 -m venv venv
    source venv/bin/activate
    
  3. Install PyTorch: For CUDA 12.1:

    pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121
    

    For other CUDA versions, refer to the PyTorch installation guide.

  4. Install Other Dependencies:

    pip install -r requirements.txt
    
  5. Install ffmpeg:

    sudo apt-get install ffmpeg
    
  6. Authenticate with Hugging Face:

    huggingface-cli login
    

    Retrieve your token from Hugging Face Tokens.

  7. Request Access to the Model: Visit facebook/musicgen-medium and request access.

  8. Download and Place Model Weights:

    mkdir -p /home/ubuntu/ghostai_music_generator/models/musicgen-medium
    

    Place the model weights in the directory above. If you store the model elsewhere, update the local_model_path in app.py accordingly.

Running the Application

Start the application by executing:

python app.py

This will launch a Gradio UI at http://0.0.0.0:9999. Open this URL in your browser to access the interface.

Using the Interface

Within the Gradio interface:

  • Select a Genre: Choose from Rock, Techno, Jazz, Classical, or Hip-Hop.
  • Custom Prompt: Enter a custom prompt, such as:
    Hard rock with a dynamic intro, expressive verse, and powerful chorus, featuring electric guitars, steady heavy drums, and deep bass.
    
  • Adjust Parameters:
    • Guidance Scale (CFG): Default is 3.0.
    • Top-K Sampling: Default is 300.
    • Top-P Sampling: Default is 0.95.
    • Temperature: Default is 1.0.
    • Total Duration: Set to 30 seconds (range: 10-60).
    • Crossfade Duration: Set to 500 ms (range: 100-2000).
  • Generate Music: Click "Generate Music" to create the track. The output will be saved as output_cleaned.mp3 and played within Gradio.

Monitor the terminal output for VRAM and GPU memory usage to ensure smooth operation.

Troubleshooting and Customization

  • Quiet Spots in Waveform: Edit app.py to increase gain before crossfading:

    next_segment = next_segment + 3
    

    Use tools like Audacity to inspect and adjust the waveform.

  • Enhancing the Chorus: Modify the second chunk prompt to:

    explosive chorus with soaring guitars and pounding drums
    

    Or increase the temperature to 1.2 and top_k to 350 in the UI.

  • Audio Distortion: Reduce the chorus effect gain in apply_chorus:

    delayed = segment - 6
    

    Adjust EQ settings in apply_eq with a high-pass at 80 Hz and low-pass at 5000 Hz.

  • MP3 Export Issues: Ensure ffmpeg is installed:

    sudo apt-get install ffmpeg
    

    Check the existence of chunk_{i}.mp3 and output_cleaned.mp3 files.

  • VRAM Constraints: Reduce the total duration to 20 seconds, close other GPU-intensive applications using nvidia-smi, and monitor usage with:

    print(torch.cuda.memory_summary())
    

Customization Options

  • Lock Dependencies:

    pip freeze > requirements.txt
    
  • Add New Genres: In app.py, define a new genre prompt:

    def set_pop_prompt():
        return "Pop with a catchy intro, upbeat verse, and anthemic chorus, featuring bright synths, punchy drums, and groovy bass"
    

    Add a button for the new genre:

    pop_btn = gr.Button("Pop", elem_classes="genre-btn")
    pop_btn.click(set_pop_prompt, inputs=None, outputs=[instrumental_prompt])
    
  • Edit MP3 Files: Use Audacity or similar tools for more control over the final output.

  • Use a Smaller Model: If VRAM is limited, switch to musicgen-small by updating app.py:

    musicgen_model = MusicGen.get_pretrained('facebook/musicgen-small', device=device)
    

Prerequisites

  • Ubuntu system with Python 3.10 installed.
  • NVIDIA RTX 3060 Ti GPU with CUDA support (CUDA 11.8 recommended).
  • Internet connection to download the musicgen-medium model.

Step 1: Make the Setup Script Executable

The start_bash.sh script sets up the virtual environment, installs dependencies, and downloads the musicgen-medium model. First, make the script executable:

chmod +x start_bash.sh

## License and Acknowledgments

This project is licensed under the MIT License. Please include a LICENSE file with the MIT License text.

Special thanks to:
- Meta AI for `musicgen-medium` and Audiocraft.
- Hugging Face for hosting and CLI tools.
- Gradio for the web interface.
- pydub for audio processing and MP3 export.
- xAI for their support.

Enjoy creating music! If you have questions or suggestions, feel free to open an issue on the repository. Let's make some tunes! ๐ŸŽ‰


CUDA 12 MEMORY MANAGEMENT UPDATE



![image/png](https://cdn-uploads.huggingface.co/production/uploads/6421b1c68adc8881b974a89d/fzyGz3Ondrr_snqH8yHiG.png)


Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support