Spaces:
Running
on
Zero
Running
on
Zero
A newer version of the Gradio SDK is available:
5.26.0
metadata
title: Qwen2.5 Omni 7B Demo
emoji: 🏆
colorFrom: indigo
colorTo: gray
sdk: gradio
sdk_version: 5.23.1
app_file: app.py
pinned: false
license: mit
short_description: A space exploring omni modality capabilities
Qwen2.5-Omni Multimodal Chat Demo
This Space demonstrates the capabilities of Qwen2.5-Omni, an end-to-end multimodal model that can perceive and generate text, images, audio, and video.
Features
- Omni-modal Understanding: Process text, images, audio, and video inputs
- Multimodal Responses: Generate both text and natural speech outputs
- Real-time Interaction: Stream responses as they're generated
- Customizable Voice: Choose between male and female voice outputs
How to Use
- Text Input: Type your message in the text box and click "Send Text"
- Multimodal Input:
- Upload images, audio files, or videos
- Optionally add accompanying text
- Click "Send Multimodal Input"
- Voice Settings:
- Toggle audio output on/off
- Select preferred voice type
Examples
Try these interactions:
- Upload an image and ask "Describe what you see"
- Upload an audio clip and ask "What is being said here?"
- Upload a video and ask "What's happening in this video?"
- Ask complex questions like "Explain quantum computing in simple terms"
Technical Details
This demo uses:
- Qwen2.5-Omni-7B model
- FlashAttention-2 for accelerated inference
- Gradio for the interactive interface