Jiarui Yao's picture

3

Jiarui Yao

FlippyDora

·

AI & ML interests

None yet

Recent Activity

updated a model 1 day ago

ScaleML-RLHF/Qwen2.5-Math-7B-raft-plusplus-numina_math_em-cliphigher0.35-n8-8-iter1

published a model 1 day ago

ScaleML-RLHF/Qwen2.5-Math-7B-raft-plusplus-numina_math_em-cliphigher0.35-n8-8-iter1

updated a model 2 days ago

ScaleML-RLHF/Qwen2.5-Math-7B-raftpp-cliphigher-n8-step130

View all activity

Organizations

FlippyDora's activity

upvoted 2 papers 5 days ago

OTC: Optimal Tool Calls via Reinforcement Learning

Paper • 2504.14870 • Published 6 days ago • 31

ToolRL: Reward is All Tool Learning Needs

Paper • 2504.13958 • Published 10 days ago • 39

upvoted a paper about 2 months ago

Self-rewarding correction for mathematical reasoning

Paper • 2502.19613 • Published Feb 26 • 84