Spaces:
Running
Running
Zachary Siegel
commited on
Commit
·
8fafb33
1
Parent(s):
22212ce
submit to any of the three levels
Browse files- agent_submission.md +1 -1
agent_submission.md
CHANGED
@@ -17,7 +17,7 @@
|
|
17 |
|
18 |
If you have any trouble implementing this, feel free to reach out to us for support.
|
19 |
|
20 |
-
2. **Run your agent** on all tasks of the test set. You will almost certainly need to run your agent using our Azure VM harness (with the `--use_azure` flag) to avoid long experiment times. Set the `--experiment_name` flag to be the name of your agent.
|
21 |
|
22 |
3. **Submit the following two directories from the harness**:
|
23 |
- `benchmark/results/[experiment_name]`: Contains the results of your agent on each task.
|
|
|
17 |
|
18 |
If you have any trouble implementing this, feel free to reach out to us for support.
|
19 |
|
20 |
+
2. **Run your agent** on all tasks of the test set. You will almost certainly need to run your agent using our Azure VM harness (with the `--use_azure` flag) to avoid long experiment times. Set the `--experiment_name` flag to be the name of your agent. You can submit results for any of the three levels of the benchmark: CORE-Bench-Easy, CORE-Bench-Medium, or CORE-Bench-Hard.
|
21 |
|
22 |
3. **Submit the following two directories from the harness**:
|
23 |
- `benchmark/results/[experiment_name]`: Contains the results of your agent on each task.
|