cool-papers - a hbkang Collection

hbkang 's Collections

Makeup Transfer

ID-Preserving Generation

interesting architecture

generative-model-training

talking-head-generation

artistic rendering

full-body-generation

cool-papers

updated 9 days ago

Depth Anything V2

Paper • 2406.09414 • Published Jun 13, 2024 • 103
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

Paper • 2406.09415 • Published Jun 13, 2024 • 52
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion

Paper • 2406.04338 • Published Jun 6, 2024 • 40
SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1, 2024 • 115
GraCo: Granularity-Controllable Interactive Segmentation

Paper • 2405.00587 • Published May 1, 2024
Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 178
APOLLO: SGD-like Memory, AdamW-level Performance

Paper • 2412.05270 • Published Dec 6, 2024 • 39
Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published Dec 13, 2024 • 99
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 107
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token

Paper • 2501.03895 • Published Jan 7 • 53
The GAN is dead; long live the GAN! A Modern GAN Baseline

Paper • 2501.05441 • Published Jan 9 • 92
Infecting Generative AI With Viruses

Paper • 2501.05542 • Published Jan 9 • 13
MatAnyone: Stable Video Matting with Consistent Memory Propagation

Paper • 2501.14677 • Published Jan 24 • 35
ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features

Paper • 2502.04320 • Published Feb 6 • 37
Diffusion Models without Classifier-free Guidance

Paper • 2502.12154 • Published Feb 17 • 7
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

Paper • 2502.09509 • Published Feb 13 • 7
Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator

Paper • 2502.19204 • Published Feb 26 • 11
UniTok: A Unified Tokenizer for Visual Generation and Understanding

Paper • 2502.20321 • Published Feb 27 • 30
How far can we go with ImageNet for Text-to-Image generation?

Paper • 2502.21318 • Published Feb 28 • 25
AI-Invented Tonal Languages: Preventing a Machine Lingua Franca Beyond Human Understanding

Paper • 2503.01063 • Published Mar 2 • 5
Large Language Diffusion Models

Paper • 2502.09992 • Published Feb 14 • 112
Fine-Tuning Small Language Models for Domain-Specific AI: An Edge AI Perspective

Paper • 2503.01933 • Published Mar 3 • 11
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM

Paper • 2503.04724 • Published Mar 6 • 69
Forgetting Transformer: Softmax Attention with a Forget Gate

Paper • 2503.02130 • Published Mar 3 • 29
AnyMoLe: Any Character Motion In-betweening Leveraging Video Diffusion Models

Paper • 2503.08417 • Published Mar 11 • 8
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

Paper • 2503.09573 • Published Mar 12 • 68
The Curse of Conditions: Analyzing and Improving Optimal Transport for Conditional Flow-Based Generation

Paper • 2503.10636 • Published Mar 13 • 3
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video

Paper • 2503.11647 • Published Mar 14 • 134
Tokenize Image as a Set

Paper • 2503.16425 • Published 30 days ago • 15
When Less is Enough: Adaptive Token Reduction for Efficient Image Representation

Paper • 2503.16660 • Published 30 days ago • 72
Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models

Paper • 2503.20240 • Published 24 days ago • 21
SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling

Paper • 2503.21732 • Published 23 days ago • 8
X^{2}-Gaussian: 4D Radiative Gaussian Splatting for Continuous-time Tomographic Reconstruction

Paper • 2503.21779 • Published 23 days ago • 3
AdaptiVocab: Enhancing LLM Efficiency in Focused Domains through Lightweight Vocabulary Adaptation

Paper • 2503.19693 • Published 25 days ago • 75
Scaling Language-Free Visual Representation Learning

Paper • 2504.01017 • Published 18 days ago • 26
Gaussian Mixture Flow Matching Models

Paper • 2504.05304 • Published 12 days ago • 11
DDT: Decoupled Diffusion Transformer

Paper • 2504.05741 • Published 11 days ago • 70