GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control Paper • 2503.03751 • Published Mar 5 • 22 • 4
Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing Paper • 2402.15151 • Published Feb 23, 2024 • 7 • 2
AfroDigits: A Community-Driven Spoken Digit Dataset for African Languages Paper • 2303.12582 • Published Mar 22, 2023 • 20 • 3
DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing Paper • 2306.14435 • Published Jun 26, 2023 • 20 • 5
Agile Catching with Whole-Body MPC and Blackbox Policy Learning Paper • 2306.08205 • Published Jun 14, 2023 • 8 • 1
RoboCat: A Self-Improving Foundation Agent for Robotic Manipulation Paper • 2306.11706 • Published Jun 20, 2023 • 7 • 1