Update results.csv
Browse files- results.csv +15 -14
results.csv
CHANGED
@@ -1,14 +1,15 @@
|
|
1 |
-
Judge,Overall,Recall,F1,AB,VWA,WA,Work,Work
|
2 |
-
Rule-based,83.8,55.9,67.1,25.0,85.2,79.0,100.0,83.3
|
3 |
-
|
4 |
-
AER-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
GPT-4o
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
-
GPT-4o
|
14 |
-
|
|
|
|
1 |
+
Judge,Overall,Recall,F1,AB,VWA,WA,Work,Work++,URL,Author
|
2 |
+
Rule-based,83.8,55.9,67.1,25.0,85.2,79.0,100.0,83.3,https://arxiv.org/abs/2504.08942,Lù et al.
|
3 |
+
WebJudge,73.7,N/A,66.7,69.8,72.6,92.3,75.0,https://arxiv.org/pdf/2504.01382,Xue et al.
|
4 |
+
AER-C (GPT-4o),67.7,71.9,69.7,83.3,56.0,68.8,100.0,66.7,https://arxiv.org/abs/2504.08942,Lù et al.
|
5 |
+
AER-V (GPT-4o),67.6,71.5,69.5,83.3,61.2,67.6,96.4,59.3,https://arxiv.org/abs/2504.08942,Lù et al.
|
6 |
+
NNetNav (Llama-3.3 70B),52.5,82.4,64.1,20.8,54.5,54.3,77.3,43.2,https://arxiv.org/abs/2504.08942,Lù et al.
|
7 |
+
Claude 3.7 S. (A),68.8,81.6,74.7,87.5,61.0,69.3,85.0,66.7,https://arxiv.org/abs/2504.08942,Lù et al.
|
8 |
+
GPT-4o (A),69.8,83.1,75.9,77.8,63.0,70.2,94.6,63.0,https://arxiv.org/abs/2504.08942,Lù et al.
|
9 |
+
GPT-4o Mini (A),61.5,86.1,71.7,80.0,57.9,63.5,84.2,49.4,https://arxiv.org/abs/2504.08942,Lù et al.
|
10 |
+
Llama 3.3 (A),67.7,79.0,72.9,75.0,59.6,68.2,94.3,62.7,https://arxiv.org/abs/2504.08942,Lù et al.
|
11 |
+
Qwen2.5-VL (A),64.3,89.8,75.0,72.7,59.3,63.6,87.2,60.3,https://arxiv.org/abs/2504.08942,Lù et al.
|
12 |
+
Claude 3.7 S. (S),69.4,76.3,72.7,71.4,64.8,69.3,85.3,66.7,https://arxiv.org/abs/2504.08942,Lù et al.
|
13 |
+
GPT-4o (S),68.1,80.3,73.7,77.8,60.7,69.9,93.8,59.6,https://arxiv.org/abs/2504.08942,Lù et al.
|
14 |
+
GPT-4o Mini (S),64.5,78.3,70.8,80.0,57.4,66.9,90.3,54.8,https://arxiv.org/abs/2504.08942,Lù et al.
|
15 |
+
Qwen2.5-VL (S),64.5,86.1,73.7,70.0,58.5,62.9,93.8,64.4,https://arxiv.org/abs/2504.08942,Lù et al.
|