Update results.csv
Browse files- results.csv +1 -1
results.csv
CHANGED
@@ -1,6 +1,6 @@
|
|
1 |
Judge,Overall,Recall,F1,AB,VWA,WA,Work,Work++,URL,Author
|
2 |
Rule-based,83.8,55.9,67.1,25.0,85.2,79.0,100.0,83.3,https://arxiv.org/abs/2504.08942,Lù et al.
|
3 |
-
WebJudge,73.7
|
4 |
AER-C (GPT-4o),67.7,71.9,69.7,83.3,56.0,68.8,100.0,66.7,https://arxiv.org/abs/2504.08942,Lù et al.
|
5 |
AER-V (GPT-4o),67.6,71.5,69.5,83.3,61.2,67.6,96.4,59.3,https://arxiv.org/abs/2504.08942,Lù et al.
|
6 |
NNetNav (Llama-3.3 70B),52.5,82.4,64.1,20.8,54.5,54.3,77.3,43.2,https://arxiv.org/abs/2504.08942,Lù et al.
|
|
|
1 |
Judge,Overall,Recall,F1,AB,VWA,WA,Work,Work++,URL,Author
|
2 |
Rule-based,83.8,55.9,67.1,25.0,85.2,79.0,100.0,83.3,https://arxiv.org/abs/2504.08942,Lù et al.
|
3 |
+
WebJudge,73.7,-,-,66.7,69.8,72.6,92.3,75.0,https://arxiv.org/pdf/2504.01382,Xue et al.
|
4 |
AER-C (GPT-4o),67.7,71.9,69.7,83.3,56.0,68.8,100.0,66.7,https://arxiv.org/abs/2504.08942,Lù et al.
|
5 |
AER-V (GPT-4o),67.6,71.5,69.5,83.3,61.2,67.6,96.4,59.3,https://arxiv.org/abs/2504.08942,Lù et al.
|
6 |
NNetNav (Llama-3.3 70B),52.5,82.4,64.1,20.8,54.5,54.3,77.3,43.2,https://arxiv.org/abs/2504.08942,Lù et al.
|