Xin Li's picture

Xin Li PRO

lixin4ever

·

https://lixin4ever.github.io/

lixin4ever

AI & ML interests

Natural Language Processing, Machine Learning

Recent Activity

liked a dataset 3 days ago

facebook/PE-Video

liked a dataset 6 days ago

BAAI/ShareRobot

liked a model 11 days ago

allenai/OLMo-2-0325-32B-Instruct

View all activity

Organizations

lixin4ever's activity

upvoted a paper 20 days ago

Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy

Paper • 2503.19757 • Published 27 days ago • 50

upvoted a paper 23 days ago

Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks

Paper • 2503.21696 • Published 25 days ago • 22

upvoted 3 papers about 1 month ago

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18 • 119

VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search

Paper • 2503.10582 • Published Mar 13 • 22

KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding

Paper • 2503.02951 • Published Mar 4 • 29

upvoted a paper about 2 months ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20 • 142

upvoted a paper 2 months ago

LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization

Paper • 2502.13922 • Published Feb 19 • 26

upvoted a collection 2 months ago

VideoRefer

6 items • Updated Mar 11 • 2

upvoted a paper 2 months ago

Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Paper • 2502.02339 • Published Feb 4 • 22

upvoted a collection 2 months ago

Ovis2

Our latest advancement in multi-modal large language models (MLLMs) • 15 items • Updated 27 days ago • 59

upvoted a paper 2 months ago

Kimi k1.5: Scaling Reinforcement Learning with LLMs

Paper • 2501.12599 • Published Jan 22 • 113

upvoted 2 collections 3 months ago

🖼️ MLLM by the Chinese community - 2025

14 items • Updated 10 days ago • 1

VideoLLaMA3

Frontier Multimodal Foundation Models for Video Understanding • 14 items • Updated Mar 11 • 14

upvoted 3 papers 3 months ago

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

Paper • 2501.13106 • Published Jan 22 • 91

MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Paper • 2501.12380 • Published Jan 21 • 86

REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4 • 99

upvoted 2 papers 4 months ago

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 107

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Paper • 2501.00599 • Published Dec 31, 2024 • 48

upvoted a collection 4 months ago

PixMo

A set of vision-language datasets built by Ai2 and used to train the Molmo family of models. Read more at https://molmo.allenai.org/blog • 10 items • Updated Mar 13 • 68

upvoted a collection 5 months ago

Inf-CL

The corresponding demos/checkpoints/papers/datasets of Inf-CL. • 2 items • Updated Mar 11 • 3