ZeroGPU Explorers

community

AI & ML interests

None defined yet.

Recent Activity

wanghaofan authored a paper 7 days ago

InstantCharacter: Personalize Any Characters with a Scalable Diffusion Transformer Framework

Nealeon authored a paper 14 days ago

Kimi-VL Technical Report

Lakonik authored a paper 17 days ago

Gaussian Mixture Flow Matching Models

View all activity

zero-gpu-explorers's activity

victor

posted an update 1 day ago

Post

1546

DIA TTS is just amazing - please share your funniest gens (here is mine) 😂
nari-labs/Dia-1.6B

Jiayi-Pan

authored a paper 1 day ago

Learning Adaptive Parallel Reasoning with Language Models

Paper • 2504.15466 • Published 3 days ago • 38

LXT

authored 13 papers 9 days ago

RelationBooth: Towards Relation-Aware Customized Object Generation

Paper • 2410.23280 • Published Oct 30, 2024 • 1

MIMAFace: Face Animation via Motion-Identity Modulated Appearance Feature Learning

Paper • 2409.15179 • Published Sep 23, 2024

PredFormer: Transformers Are Effective Spatial-Temporal Predictive Learners

Paper • 2410.04733 • Published Oct 7, 2024

Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs

Paper • 2501.04670 • Published Jan 8

Point Cloud Mamba: Point Cloud Learning via State Space Model

Paper • 2403.00762 • Published Mar 1, 2024

An Open and Comprehensive Pipeline for Unified Object Grounding and Detection

Paper • 2401.02361 • Published Jan 4, 2024

MambaAD: Exploring State Space Models for Multi-class Unsupervised Anomaly Detection

Paper • 2404.06564 • Published Apr 9, 2024

SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model

Paper • 2412.04292 • Published Dec 5, 2024

Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation

Paper • 2410.10676 • Published Oct 14, 2024

Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion Transfer

Paper • 2503.17350 • Published Mar 21 • 1

PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild

Paper • 2504.11326 • Published 10 days ago • 6

Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding

Paper • 2504.10465 • Published 11 days ago • 28

The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer

Paper • 2504.10462 • Published 11 days ago • 15

ajibawa-2023

posted an update 15 days ago

Post

3912

Hi All, I recently released two Audio datasets which are generated using my earlier released dataset: ajibawa-2023/Children-Stories-Collection

First Audio Dataset:https://huggingface.co/datasets/ajibawa-2023/Audio-Children-Stories-Collection-Large has 5600++ stories in .mp3 format.

Second Audio Dataset:https://huggingface.co/datasets/ajibawa-2023/Audio-Children-Stories-Collection has 600 stories in .mp3 format.

3 replies

·

LXT

authored a paper 16 days ago

An Empirical Study of GPT-4o Image Generation Capabilities

Paper • 2504.05979 • Published 17 days ago • 61

mrfakename

posted an update 22 days ago

Post

2432

Papla P1 from Papla Media is now available on the TTS Arena!

Try out Papla's new ultra-realistic TTS model + compare it with other leading models on the TTS Arena: TTS-AGI/TTS-Arena

Aurelien-Morgan

posted an update 27 days ago

Post

1987

Almost there !
https://test.pypi.org/project/test-010-retrain-pipelines/

osanseviero

authored a paper 28 days ago

Gemma 3 Technical Report

Paper • 2503.19786 • Published about 1 month ago • 48