BenchHub: A Unified Benchmark Suite for Holistic and Customizable LLM Evaluation Paper • 2506.00482 • Published 7 days ago • 8
FS-DAG: Few Shot Domain Adapting Graph Networks for Visually Rich Document Understanding Paper • 2505.17330 • Published 15 days ago • 22
Hard Negative Mining for Domain-Specific Retrieval in Enterprise Systems Paper • 2505.18366 • Published 14 days ago • 25
SweEval: Do LLMs Really Swear? A Safety Benchmark for Testing Limits for Enterprise Use Paper • 2505.17332 • Published 15 days ago • 31