LeafInTheTree (Feuilleaubois)

upvoted a paper 5 months ago

3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark

Paper • 2412.07825 • Published Dec 10, 2024 • 11

upvoted a paper 7 months ago

MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures

Paper • 2410.13754 • Published Oct 17, 2024 • 76

upvoted a collection 8 months ago

Moshi v0.1 Release

Collection

MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 15 items • Updated about 1 month ago • 228

upvoted an article 8 months ago

Article

Mixture of Experts Explained

By

and 5 others •

Dec 11, 2023

• 624

upvoted 5 collections 9 months ago

upvoted a paper 9 months ago

SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners

Paper • 2408.16768 • Published Aug 29, 2024 • 29

upvoted a collection 9 months ago

video

Collection

273 items • Updated 10 days ago • 7

upvoted 5 papers 9 months ago

CogVLM2: Visual Language Models for Image and Video Understanding

Paper • 2408.16500 • Published Aug 29, 2024 • 58

Law of Vision Representation in MLLMs

Paper • 2408.16357 • Published Aug 29, 2024 • 96

LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation

Paper • 2408.15881 • Published Aug 28, 2024 • 22

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Paper • 2408.08872 • Published Aug 16, 2024 • 101

LLaVA-OneVision: Easy Visual Task Transfer

Paper • 2408.03326 • Published Aug 6, 2024 • 61

upvoted 4 collections 9 months ago

Multi-modality LVM

Collection

28 items • Updated Dec 16, 2024 • 1

Multimodal LLM

Collection

215 items • Updated 3 days ago • 21

multimodal

Collection

261 items • Updated 4 days ago • 10

MFM - Multimodal Foundation Models

Collection

32 items • Updated Jan 28 • 1

Feuilleaubois

AI & ML interests

Organizations

LeafInTheTree's activity

3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark

MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures

Moshi v0.1 Release

Mixture of Experts Explained

VisionLM

General Multimodal Learning

Marqo-FashionCLIP and Marqo-FashionSigLIP

Multimodal Benchmarks

3d

SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners

video

CogVLM2: Visual Language Models for Image and Video Understanding

Law of Vision Representation in MLLMs

LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

LLaVA-OneVision: Easy Visual Task Transfer

Multi-modality LVM

Multimodal LLM

multimodal

MFM - Multimodal Foundation Models