metadata

title: Qwen2.5 Omni 7B Demo
emoji: 🏆
colorFrom: indigo
colorTo: gray
sdk: gradio
sdk_version: 5.23.1
app_file: app.py
pinned: false
license: mit
short_description: A space exploring omni modality capabilities

Qwen2.5-Omni Multimodal Chat Demo

This Space demonstrates the capabilities of Qwen2.5-Omni, an end-to-end multimodal model that can perceive and generate text, images, audio, and video.

Features

Omni-modal Understanding: Process text, images, audio, and video inputs
Multimodal Responses: Generate both text and natural speech outputs
Real-time Interaction: Stream responses as they're generated
Customizable Voice: Choose between male and female voice outputs

How to Use

Text Input: Type your message in the text box and click "Send Text"
Multimodal Input:
- Upload images, audio files, or videos
- Optionally add accompanying text
- Click "Send Multimodal Input"
Voice Settings:
- Toggle audio output on/off
- Select preferred voice type

Examples

Try these interactions:

Upload an image and ask "Describe what you see"
Upload an audio clip and ask "What is being said here?"
Upload a video and ask "What's happening in this video?"
Ask complex questions like "Explain quantum computing in simple terms"

Technical Details

This demo uses:

Qwen2.5-Omni-7B model
FlashAttention-2 for accelerated inference
Gradio for the interactive interface