9 86 171

YangWang92

yangwang92

AI & ML interests

None yet

Recent Activity

upvoted a paper 4 days ago

ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

upvoted a paper 4 days ago

BitNet b1.58 2B4T Technical Report

upvoted a paper 4 days ago

DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning

View all activity

Organizations

yangwang92's activity

upvoted 5 papers 4 days ago

ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

Paper • 2504.11536 • Published 5 days ago • 53

BitNet b1.58 2B4T Technical Report

Paper • 2504.12285 • Published 4 days ago • 58

DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning

Paper • 2504.11456 • Published 5 days ago • 11

Seedream 3.0 Technical Report

Paper • 2504.11346 • Published 5 days ago • 37

How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients

Paper • 2504.10766 • Published 6 days ago • 37

upvoted a paper 9 days ago

DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning

Paper • 2504.07128 • Published 19 days ago • 80

upvoted a paper 16 days ago

Inference-Time Scaling for Generalist Reward Modeling

Paper • 2504.02495 • Published 18 days ago • 52

upvoted a paper about 2 months ago

Process-based Self-Rewarding Language Models

Paper • 2503.03746 • Published Mar 5 • 39

upvoted a collection about 2 months ago

Qwen2.5-1M

Collection

The long-context version of Qwen2.5, supporting 1M-token context lengths • 3 items • Updated Feb 26 • 117

upvoted a paper about 2 months ago

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 181

upvoted a paper 2 months ago

Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model

Paper • 2502.10248 • Published Feb 14 • 55

upvoted a collection 2 months ago

CodeI/O

Collection

Collection for CodeI/O @ https://codei-o.github.io/ • 15 items • Updated Feb 13 • 6

upvoted a paper 2 months ago

CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction

Paper • 2502.07316 • Published Feb 11 • 48

upvoted an article 2 months ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

Jan 28

• 844

upvoted 2 papers 2 months ago

Matryoshka Quantization

Paper • 2502.06786 • Published Feb 10 • 30

QuEST: Stable Training of LLMs with 1-Bit Weights and Activations

Paper • 2502.05003 • Published Feb 7 • 44

upvoted a collection 3 months ago

Reasoning Datasets

Collection

Distilled synthetic Reasoning datasets • 7 items • Updated Feb 2 • 60

upvoted a paper 3 months ago

Proximal Policy Optimization Algorithms

Paper • 1707.06347 • Published Jul 20, 2017 • 8