Exploration with a more stable RL pipeline with outcome-only reward and scaled-up LLMs.
Bowen
PeterJinGo
AI & ML interests
None yet
Recent Activity
updated
a model
4 days ago
rubricrm/rubric_rm_qwen2.5_7B_LR1.0e-6_filtered_sky_code_8k_math_10k_rubric_evidence_classify_4k4k_PPO
published
a model
4 days ago
rubricrm/rubric_rm_qwen2.5_7B_LR1.0e-6_filtered_sky_code_8k_math_10k_rubric_evidence_classify_4k4k_PPO
updated
a model
10 days ago
rubricrm/qwen2.5_7B_LR1.0e-6_evidence_rubric_4k4k_separate_PPO
Organizations
Collections
2
Preliminary checkpoints with outcome-only RL.
-
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 28 -
PeterJinGo/SearchR1-nq_hotpotqa_train-llama3.2-3b-em-ppo
Updated • 129 -
PeterJinGo/SearchR1-nq_hotpotqa_train-llama3.2-3b-em-grpo
Updated • 2 -
PeterJinGo/SearchR1-nq_hotpotqa_train-llama3.2-3b-it-em-ppo
Updated • 40
models
31
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-14b-em-ppo-v0.2
Updated
•
2
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-14b-it-em-ppo-v0.2
Updated
•
2
PeterJinGo/R1-nq_hotpotqa_train-qwen2.5-3b-em-ppo-v0.2
Updated
•
9
PeterJinGo/R1-nq_hotpotqa_train-qwen2.5-3b-it-em-ppo-v0.2
Updated
•
23
PeterJinGo/R1-nq_hotpotqa_train-qwen2.5-14b-em-ppo-v0.2
Updated
PeterJinGo/R1-nq_hotpotqa_train-qwen2.5-14b-it-em-ppo-v0.2
Updated
•
3
PeterJinGo/R1-nq_hotpotqa_train-qwen2.5-7b-it-em-ppo-v0.2
Updated
•
3
PeterJinGo/R1-nq_hotpotqa_train-qwen2.5-7b-em-ppo-v0.2
Updated
•
3
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-7b-it-em-grpo-v0.2
Updated
•
14
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-7b-em-grpo-v0.2
Updated
•
5
datasets
13
PeterJinGo/wiki-18-e5-index-HNSW64
Updated
•
62
PeterJinGo/wiki-18-bm25-index
Updated
•
101
PeterJinGo/nq_hotpotqa_train
Viewer
•
Updated
•
221k
•
439
•
1
PeterJinGo/wiki-18-e5-index
Updated
•
2.99k
PeterJinGo/wiki-18-corpus
Updated
•
1.92k
PeterJinGo/ultrafeedback_first_5000
Viewer
•
Updated
•
5k
•
12
PeterJinGo/gsm8k-chat
Viewer
•
Updated
•
7.47k
•
19
PeterJinGo/math-zeroshot-chat
Viewer
•
Updated
•
7.5k
•
25
PeterJinGo/math-zeroshot
Viewer
•
Updated
•
7.5k
•
20
PeterJinGo/math2
Viewer
•
Updated
•
7.5k
•
21