aimeri's picture
Add application file
039d869

A newer version of the Gradio SDK is available: 5.26.0

Upgrade
metadata
title: Qwen2.5 Omni 7B Demo
emoji: 🏆
colorFrom: indigo
colorTo: gray
sdk: gradio
sdk_version: 5.23.1
app_file: app.py
pinned: false
license: mit
short_description: A space exploring omni modality capabilities

Qwen2.5-Omni Multimodal Chat Demo

This Space demonstrates the capabilities of Qwen2.5-Omni, an end-to-end multimodal model that can perceive and generate text, images, audio, and video.

Features

  • Omni-modal Understanding: Process text, images, audio, and video inputs
  • Multimodal Responses: Generate both text and natural speech outputs
  • Real-time Interaction: Stream responses as they're generated
  • Customizable Voice: Choose between male and female voice outputs

How to Use

  1. Text Input: Type your message in the text box and click "Send Text"
  2. Multimodal Input:
    • Upload images, audio files, or videos
    • Optionally add accompanying text
    • Click "Send Multimodal Input"
  3. Voice Settings:
    • Toggle audio output on/off
    • Select preferred voice type

Examples

Try these interactions:

  • Upload an image and ask "Describe what you see"
  • Upload an audio clip and ask "What is being said here?"
  • Upload a video and ask "What's happening in this video?"
  • Ask complex questions like "Explain quantum computing in simple terms"

Technical Details

This demo uses:

  • Qwen2.5-Omni-7B model
  • FlashAttention-2 for accelerated inference
  • Gradio for the interactive interface