xhluca commited on
Commit
40751ac
·
verified ·
1 Parent(s): 71b9983

Update results.csv

Browse files
Files changed (1) hide show
  1. results.csv +15 -14
results.csv CHANGED
@@ -1,14 +1,15 @@
1
- Judge,Overall,Recall,F1,AB,VWA,WA,Work,Work++
2
- Rule-based,83.8,55.9,67.1,25.0,85.2,79.0,100.0,83.3
3
- AER-C (GPT-4o),67.7,71.9,69.7,83.3,56.0,68.8,100.0,66.7
4
- AER-V (GPT-4o),67.6,71.5,69.5,83.3,61.2,67.6,96.4,59.3
5
- NNetNav (Llama-3.3 70B),52.5,82.4,64.1,20.8,54.5,54.3,77.3,43.2
6
- Claude 3.7 S. (A),68.8,81.6,74.7,87.5,61.0,69.3,85.0,66.7
7
- GPT-4o (A),69.8,83.1,75.9,77.8,63.0,70.2,94.6,63.0
8
- GPT-4o Mini (A),61.5,86.1,71.7,80.0,57.9,63.5,84.2,49.4
9
- Llama 3.3 (A),67.7,79.0,72.9,75.0,59.6,68.2,94.3,62.7
10
- Qwen2.5-VL (A),64.3,89.8,75.0,72.7,59.3,63.6,87.2,60.3
11
- Claude 3.7 S. (S),69.4,76.3,72.7,71.4,64.8,69.3,85.3,66.7
12
- GPT-4o (S),68.1,80.3,73.7,77.8,60.7,69.9,93.8,59.6
13
- GPT-4o Mini (S),64.5,78.3,70.8,80.0,57.4,66.9,90.3,54.8
14
- Qwen2.5-VL (S),64.5,86.1,73.7,70.0,58.5,62.9,93.8,64.4
 
 
1
+ Judge,Overall,Recall,F1,AB,VWA,WA,Work,Work++,URL,Author
2
+ Rule-based,83.8,55.9,67.1,25.0,85.2,79.0,100.0,83.3,https://arxiv.org/abs/2504.08942,Lù et al.
3
+ WebJudge,73.7,N/A,66.7,69.8,72.6,92.3,75.0,https://arxiv.org/pdf/2504.01382,Xue et al.
4
+ AER-C (GPT-4o),67.7,71.9,69.7,83.3,56.0,68.8,100.0,66.7,https://arxiv.org/abs/2504.08942,Lù et al.
5
+ AER-V (GPT-4o),67.6,71.5,69.5,83.3,61.2,67.6,96.4,59.3,https://arxiv.org/abs/2504.08942,Lù et al.
6
+ NNetNav (Llama-3.3 70B),52.5,82.4,64.1,20.8,54.5,54.3,77.3,43.2,https://arxiv.org/abs/2504.08942,Lù et al.
7
+ Claude 3.7 S. (A),68.8,81.6,74.7,87.5,61.0,69.3,85.0,66.7,https://arxiv.org/abs/2504.08942,Lù et al.
8
+ GPT-4o (A),69.8,83.1,75.9,77.8,63.0,70.2,94.6,63.0,https://arxiv.org/abs/2504.08942,Lù et al.
9
+ GPT-4o Mini (A),61.5,86.1,71.7,80.0,57.9,63.5,84.2,49.4,https://arxiv.org/abs/2504.08942,Lù et al.
10
+ Llama 3.3 (A),67.7,79.0,72.9,75.0,59.6,68.2,94.3,62.7,https://arxiv.org/abs/2504.08942,Lù et al.
11
+ Qwen2.5-VL (A),64.3,89.8,75.0,72.7,59.3,63.6,87.2,60.3,https://arxiv.org/abs/2504.08942,Lù et al.
12
+ Claude 3.7 S. (S),69.4,76.3,72.7,71.4,64.8,69.3,85.3,66.7,https://arxiv.org/abs/2504.08942,Lù et al.
13
+ GPT-4o (S),68.1,80.3,73.7,77.8,60.7,69.9,93.8,59.6,https://arxiv.org/abs/2504.08942,Lù et al.
14
+ GPT-4o Mini (S),64.5,78.3,70.8,80.0,57.4,66.9,90.3,54.8,https://arxiv.org/abs/2504.08942,Lù et al.
15
+ Qwen2.5-VL (S),64.5,86.1,73.7,70.0,58.5,62.9,93.8,64.4,https://arxiv.org/abs/2504.08942,Lù et al.