core_leaderboard

Running

Zachary Siegel commited on Oct 1, 2024

Commit

8fafb33

1 Parent(s): 22212ce

submit to any of the three levels

Files changed (1) hide show

agent_submission.md CHANGED Viewed

@@ -17,7 +17,7 @@
    If you have any trouble implementing this, feel free to reach out to us for support.
-2. **Run your agent** on all tasks of the test set. You will almost certainly need to run your agent using our Azure VM harness (with the `--use_azure` flag) to avoid long experiment times. Set the `--experiment_name` flag to be the name of your agent. We highly encourage you to run your agent on all three levels of the benchmark: CORE-Bench-Easy, CORE-Bench-Medium, and CORE-Bench-Hard, but you can choose to run on any subset of these levels.
 3. **Submit the following two directories from the harness**:
    - `benchmark/results/[experiment_name]`: Contains the results of your agent on each task.

    If you have any trouble implementing this, feel free to reach out to us for support.
+2. **Run your agent** on all tasks of the test set. You will almost certainly need to run your agent using our Azure VM harness (with the `--use_azure` flag) to avoid long experiment times. Set the `--experiment_name` flag to be the name of your agent. You can submit results for any of the three levels of the benchmark: CORE-Bench-Easy, CORE-Bench-Medium, or CORE-Bench-Hard.
 3. **Submit the following two directories from the harness**:
    - `benchmark/results/[experiment_name]`: Contains the results of your agent on each task.