Nandan Thakur's picture

Nandan Thakur

nthakur

·

https://thakur-nandan.github.io

AI & ML interests

NLP, IR, QA

Recent Activity

updated a dataset 5 days ago

freshstack/corpus-oct-2024

updated a dataset 5 days ago

freshstack/queries-oct-2024

updated a dataset 6 days ago

nthakur/bge-retrieval-data-7-datasets-680K-hn-removed

View all activity

Organizations

nthakur's activity

upvoted a paper 9 days ago

FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents

Paper • 2504.13128 • Published 10 days ago • 5

upvoted a collection 11 days ago

Multimodal DSE Retrievers

A collection of DSE models for multimodal retrieval • 5 items • Updated 12 days ago • 14

upvoted 2 collections about 1 month ago

🌐 NoMIRACL Dataset [EMNLP'24]

A collection of multilingual relevance assessment datasets. We also have SFT fine-tuned models (Mistral-7B & Llama-3 8B) • 7 items • Updated 26 days ago • 1

🏜️MIRAGE-Bench [NAACL'25]

Dataset Collection from the MIRAGE-Bench paper • 13 items • Updated 26 days ago • 2

upvoted a collection about 2 months ago

DRAMA

A collection of small (sub-1B) multilingual dense retrievers that generalize well across a number of tasks and languages. • 3 items • Updated Feb 26 • 6

upvoted a paper 3 months ago

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 388

upvoted a paper 5 months ago

NoMIRACL: Knowing When You Don't Know for Robust Multilingual Retrieval-Augmented Generation

Paper • 2312.11361 • Published Dec 18, 2023 • 1

upvoted a paper 11 months ago

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Paper • 2406.01574 • Published Jun 3, 2024 • 47

upvoted an article 11 months ago

Article

Training and Finetuning Embedding Models with Sentence Transformers v3

May 28, 2024

• 215

upvoted a collection 12 months ago

🦢SWIM-IR Dataset [NAACL'24]

29 million Synthetic Wikipedia-based Multilingual Retrieval Training Pairs. • 4 items • Updated 26 days ago • 7

upvoted a paper about 1 year ago

Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval

Paper • 2311.05800 • Published Nov 10, 2023 • 3

upvoted a paper over 1 year ago

BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models

Paper • 2104.08663 • Published Apr 17, 2021 • 3