Zachary Siegel commited on
Commit
8fafb33
·
1 Parent(s): 22212ce

submit to any of the three levels

Browse files
Files changed (1) hide show
  1. agent_submission.md +1 -1
agent_submission.md CHANGED
@@ -17,7 +17,7 @@
17
 
18
  If you have any trouble implementing this, feel free to reach out to us for support.
19
 
20
- 2. **Run your agent** on all tasks of the test set. You will almost certainly need to run your agent using our Azure VM harness (with the `--use_azure` flag) to avoid long experiment times. Set the `--experiment_name` flag to be the name of your agent. We highly encourage you to run your agent on all three levels of the benchmark: CORE-Bench-Easy, CORE-Bench-Medium, and CORE-Bench-Hard, but you can choose to run on any subset of these levels.
21
 
22
  3. **Submit the following two directories from the harness**:
23
  - `benchmark/results/[experiment_name]`: Contains the results of your agent on each task.
 
17
 
18
  If you have any trouble implementing this, feel free to reach out to us for support.
19
 
20
+ 2. **Run your agent** on all tasks of the test set. You will almost certainly need to run your agent using our Azure VM harness (with the `--use_azure` flag) to avoid long experiment times. Set the `--experiment_name` flag to be the name of your agent. You can submit results for any of the three levels of the benchmark: CORE-Bench-Easy, CORE-Bench-Medium, or CORE-Bench-Hard.
21
 
22
  3. **Submit the following two directories from the harness**:
23
  - `benchmark/results/[experiment_name]`: Contains the results of your agent on each task.