ZHLiu627/verl_agent_alfworld-GRPO-coef0.9-Llama-3.1-8B-Instruct-150step-150step Updated about 7 hours ago
ZHLiu627/verl_agent_alfworld-GRPO-coef1.1-Llama-3.1-8B-Instruct-150step-150step Updated about 7 hours ago
ZHLiu627/verl_agent_alfworld-GRPO-wo6-coef1.1-Llama-3.1-8B-Instruct-150step Updated about 9 hours ago
ZHLiu627/verl_agent_alfworld-GRPO-wo6-coef0.9-Llama-3.1-8B-Instruct-150step Updated about 10 hours ago
ZHLiu627/updated_qwen2.5_code_1.5b_grpo_iter0_full_data_miao_0212__self_correction_iter1_v1 Viewer • Updated Feb 27 • 29.3k • 31
ZHLiu627/dataset_qwen2.5_code_1.5b_grpo_iter0_full_data_miao_0212_2_global_step_70filtered_v1_v1 Viewer • Updated Feb 27 • 29.3k • 62 • 1
ZHLiu627/dataset_qwen2.5_code_1.5b_grpo_iter0_full_data_miao_0212_2_global_step_70filtered_v1 Viewer • Updated Feb 22 • 29.3k • 34
ZHLiu627/updated_qwen2.5_code_1.5b_grpo_iter0_full_data_miao_0212__self_correction_iter1_v2filtered Viewer • Updated Feb 19 • 28.9k • 39
ZHLiu627/updated_qwen2.5_code_1.5b_grpo_iter0_full_data_miao_0212__self_correction_iter1_v2 Viewer • Updated Feb 19 • 29.3k • 26
ZHLiu627/dataset_qwen2.5_code_1.5b_grpo_iter0_full_data_miao_0212_2_global_step_70filteredd Viewer • Updated Feb 19 • 29.3k • 26
ZHLiu627/updated_qwen2.5_code_1.5b_grpo_iter0_full_data_miao_0212__self_correction_iter1_v1filtered Viewer • Updated Feb 19 • 29.1k • 24
ZHLiu627/qwen2.5_code_1.5b_grpo_iter0_full_data_miao_0212__self_correction_iter1_v2 Viewer • Updated Feb 18 • 29.3k • 29