12 12 5

Ho Kei Cheng PRO

hkchengrex

https://hkchengrex.com/

AI & ML interests

None yet

Recent Activity

upvoted a paper 11 days ago

Perception Encoder: The best visual embeddings are not at the output of the network

upvoted a paper 11 days ago

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

upvoted a paper 13 days ago

Gaussian Mixture Flow Matching Models

View all activity

Organizations

None yet

hkchengrex's activity

upvoted 2 papers 11 days ago

Perception Encoder: The best visual embeddings are not at the output of the network

Paper • 2504.13181 • Published 12 days ago • 31

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

Paper • 2504.12626 • Published 12 days ago • 48

upvoted a paper 13 days ago

Gaussian Mixture Flow Matching Models

Paper • 2504.05304 • Published 22 days ago • 12

updated a Space 13 days ago

675

MMAudio — generating synchronized audio from video/text

🔊

Generate audio from video or text prompts

upvoted 2 papers about 1 month ago

CFG-Zero*: Improved Classifier-Free Guidance for Flow Matching Models

Paper • 2503.18886 • Published Mar 24 • 21

Tokenize Image as a Set

Paper • 2503.16425 • Published Mar 20 • 16

New activity in hkchengrex/MMAudio about 1 month ago

Runtime error

#15 opened about 1 month ago by

CzanCzan

ZeroGPU: No CUDA GPUs are available

#3 opened 5 months ago by

hkchengrex

authored a paper about 2 months ago

The Curse of Conditions: Analyzing and Improving Optimal Transport for Conditional Flow-Based Generation

Paper • 2503.10636 • Published Mar 13 • 3

upvoted a paper about 2 months ago

The Curse of Conditions: Analyzing and Improving Optimal Transport for Conditional Flow-Based Generation

Paper • 2503.10636 • Published Mar 13 • 3

commented a paper about 2 months ago

The Curse of Conditions: Analyzing and Improving Optimal Transport for Conditional Flow-Based Generation

Paper • 2503.10636 • Published Mar 13 • 3 •

liked a dataset 2 months ago

Loie/VGGSound

Viewer • Updated Mar 26, 2023 • 1 • 1.86k • 32

upvoted a paper 2 months ago

TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding

Paper • 2502.19400 • Published Feb 26 • 49

New activity in hkchengrex/MMAudio 2 months ago

WHY DID IT START DOING THE TALKING BEN BURPS FOR NO REASON AT ALL

#14 opened 2 months ago by

jerrythejohnson

Why do videos get much bigger by processing them?

#13 opened 2 months ago by

KurtWoloch

upvoted a paper 2 months ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20 • 143

New activity in hkchengrex/MMAudio 4 months ago

iframe

#9 opened 4 months ago by

Hamed744

upvoted a paper 4 months ago

Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models

Paper • 2501.01423 • Published Jan 2 • 43

liked 2 datasets 4 months ago

deepghs/gelbooru_full

Preview • Updated 26 days ago • 2.25k • 43

deepghs/sankaku_full

Updated Jan 3 • 6.16k • 76