File size: 1,480 Bytes
f9de007 3abe250 f9de007 3abe250 f9de007 bee2b26 8e81d94 99470e5 f9de007 d042d5c f9de007 3abe250 f9de007 3386023 3abe250 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
"""
gradio app that reads results.csv and display it in a table, title is "AgentRewardBench Leaderboard"
"""
import gradio as gr
import pandas as pd
def load_data():
# read the csv file
df = pd.read_csv("./results.csv")
# remove Recall and F1 columns
df = df.drop(columns=["Recall", "F1"])
# return the dataframe
return df
with gr.Blocks() as demo:
gr.Markdown(
"""
# AgentRewardBench Leaderboard
| [**💾Code**](https://github.com/McGill-NLP/agent-reward-bench) |[**📄Paper**](https://arxiv.org/abs/2504.08942) | [**🌐Website**](https://agent-reward-bench.github.io) |
| :--: | :--: | :--: |
| [**🤗Dataset**](https://huggingface.co/datasets/McGill-NLP/agent-reward-bench) | [**💻Demo**](https://huggingface.co/spaces/McGill-NLP/agent-reward-bench-demo) | [**🏆Leaderboard**](https://huggingface.co/spaces/McGill-NLP/agent-reward-bench-leaderboard) |
This is the leaderboard for the AgentRewardBench. The scores are based on the results of the agents on the benchmark. We report the *precision* score.
[Open an issue to submit your results to the leadeboard](https://github.com/McGill-NLP/agent-reward-bench/issues/new?template=add-results-to-leaderboard.yml). We will review your results and add them to the leaderboard.
"""
)
df = load_data()
table = gr.DataFrame(df, show_label=False)
demo.queue(default_concurrency_limit=40).launch() |