File size: 8,718 Bytes
4895038 9753821 4895038 9753821 4895038 2fa3291 9753821 2fa3291 9753821 2fa3291 9753821 2fa3291 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 |
---
base_model:
- BAAI/bge-m3
library_name: treehop-rag
license: mit
pipeline_tag: text-ranking
tags:
- Information Retrieval
- Retrieval-Augmented Generation
- model_hub_mixin
- multi-hop question answering
- pytorch_model_hub_mixin
---
# TreeHop: Generate and Filter Next Query Embeddings Efficiently for Multi-hop Question Answering
[](https://arxiv.org/abs/2504.20114)
[](https://huggingface.co/allen-li1231/treehop-rag)
[](https://img.shields.io/badge/license-MIT-blue)
[](https://www.python.org/downloads/)
## Table of Contents
- [Introduction](#introduction)
- [Why TreeHop for Multi-hop Retrieval?](#why-treehop-for-multi-hop-retrieval)
- [System Requirement](#system-requirement)
- [Python Environment](#python-environment)
- [Preliminaries](#preliminaries)
- [Embedding Databases](#embedding-databases)
- [Multi-hop Retrieval with TreeHop: How-to-Use](#multi-hop-retrieval-with-treehop-how-to-use)
- [Paper Reproduction](#paper-reproduction)
- [Training TreeHop](#training-treehop)
- [Citation](#citation)
## Introduction
TreeHop is a lightweight, embedding-level framework designed to address the computational inefficiencies of traditional recursive retrieval paradigm in the realm of Retrieval-Augmented Generation (RAG). By eliminating the need for iterative LLM-based query rewriting, TreeHop significantly reduces latency while maintaining state-of-the-art performance. It achieves this through dynamic query embedding updates and pruning strategies, enabling a streamlined "Retrieve-Embed-Retrieve" workflow.

## Why TreeHop for Multi-hop Retrieval?
- **Handle Complex Queries**: Real-world questions often require multiple hops to retrieve relevant information, which traditional retrieval methods struggle with.
- **Cost-Effective**: 25M parameters vs. billions in existing query rewriters, significantly reducing computational overhead.
- **Speed**: 99% faster inference compared to iterative LLM approaches, ideal for industrial applications where response speed is crucial.
- **Performant**: Maintains high recall with controlled number of retrieved passages, ensuring relevance without overwhelming the system.

## System Requirement
> Ubuntu 18.06 LTS+ or MacOS Big Sur+. \
> Nvidia GPU or Apple Metal with 32GB of RAM at minimum. \
> 16GB of system RAM for [reproduction](#paper-reproduction), 64GB for [training](#train-treehop). \
> 50GB of free space on hard drive.
### Python Environment
Please refer to [requirements.txt](/requirements.txt).
## Preliminaries
This repository comes with [evaluate embedding databases](./embedding_data/) for reproduction purpose. Activate [git LFS](https://docs.github.com/en/repositories/working-with-files/managing-large-files/installing-git-large-file-storage) to clone the repository using `git lfs clone [LINK_TO_REPO]`, or pull the data under the existing local repository using:
```sh
git lfs pull
```
### Embedding Databases
We adopt [BGE-m3](https://arxiv.org/abs/2402.03216) for embedding generation, upon which we also train our TreeHop model for multi-hop retrieval.
Run the following two scripts that generate all necessary training and evaluate embedding databases.
You are **not required** to run them if you do not plan to train TreeHop, as all necessary evaluate embedding databases are provided in the repository.
```sh
python init_train_vectors.py
python init_multihop_rag.py
```
## Multi-hop Retrieval with TreeHop: How-to-Use
Here we use [MultiHop RAG evaluate dataset](https://arxiv.org/abs/2401.15391) in the example below.
The repository comes with the necessary files for the example to run, see [preliminaries](#preliminaries).
```python
from tree_hop import TreeHopModel
from passage_retrieval import MultiHopRetriever
EVALUATE_DATASET = "multihop_rag"
# load TreeHop model from HuggingFace
tree_hop_model = TreeHopModel.from_pretrained("allen-li1231/treehop-rag")
# load retriever
retriever = MultiHopRetriever(
"BAAI/bge-m3",
passages=f"embedding_data/{EVALUATE_DATASET}/eval_passages.jsonl",
passage_embeddings=f"embedding_data/{EVALUATE_DATASET}/eval_content_dense.npy",
# uncomment this if faiss index is initialized, resulting in a faster loading
# faiss_index=f"embedding_data/{EVALUATE_DATASET}/index.faiss",
tree_hop_model=tree_hop_model,
projection_size=1024,
save_or_load_index=True,
indexing_batch_size=10240,
index_device="cuda" # or cpu on Apple Metal
)
```
> :bell: Notes
> * The passage jsonl file contains id, title and text for each passage in the retrieval database.
> * The passage_embeddings and faiss_index file contain passage embeddings in numpy array and faiss index format, respectively. To replace them with your own database, please refer to logics in [init_multihop_rag.py](init_multihop_rag.py).
> * For more detailed structure of passages file, please refer to [MultiHop RAG evaluate passages file](embedding_data/multihop_rag/eval_passages.jsonl).
The `retriever` has `multihop_search_passages` method that supports retrieving both single query and batch queries.
For single query:
```python
retrieve_result = retriever.multihop_search_passages(
"Did Engadget report a discount on the 13.6-inch MacBook Air \
before The Verge reported a discount on Samsung Galaxy Buds 2?",
n_hop=2,
top_n=5
)
```
For batch queries:
```python
LIST_OF_QUESTIONS = [
"Did Engadget report a discount on the 13.6-inch MacBook Air \
before The Verge reported a discount on Samsung Galaxy Buds 2?",
"Did 'The Independent - Travel' report on Tremblant Ski Resort \
before 'Essentially Sports' mentioned Jeff Shiffrin's skiing habits?"
]
retrieve_result = retriever.multihop_search_passages(
LIST_OF_QUESTIONS,
n_hop=2,
top_n=5,
# change batch sizes on your device to optimize performance
index_batch_size=2048,
generate_batch_size=1024
)
```
To access retrieved passages and corresponding multi-hop retrieval paths:
```python
# retrieved passages for questions
print(retrieve_result.passage)
# employ networkx graph to depict multi-hop retrieval
retrieve_result = retriever.multihop_search_passages(
LIST_OF_QUESTIONS,
n_hop=2,
top_n=5,
index_batch_size=2048,
generate_batch_size=1024,
return_tree=True # simply add this argument
)
# `retrieve_result.tree_hop_graph` is a list of networkx objects
# correspondent to the retrieval paths of the queries in LIST_OF_QUESTIONS.
# take the first query for example, to draw the respective path:
retrieval_tree = retrieve_result.tree_hop_graph[0]
retrieval_tree.plot_tree()
# nodes represent passages in the retrieval graph
# store metadata for the original passages:
print(retrieval_tree.nodes(data=True))
```
## Paper Reproduction
To evaluate the multi-hop retrieval performance of TreeHop, run the following code. Here we take 2WikiMultihop dataset and recall@5 under three hops as example.
The script will print recall rate and average number of retrieved passages at each hop, as well as statistics by types of question.
> :bell: Notes
> * To change evaluate dataset, replace `2wiki` with `musique` or `multihop_rag`.
> * Revise `n_hop` and `top_n` to change number of hops and top retrieval settings.
> * Toggle `redundant_pruning` and `layerwise_top_pruning` to reproduce our ablation study on stop criterion.
```sh
python evaluation.py \
--dataset_name multihop_rag \
--revision paper-reproduction \
--n_hop 3 \
--top_n 5 \
--redundant_pruning True \
--layerwise_top_pruning True
```
## Training TreeHop
Run the following code to generate graph and train TreeHop. Please refer to `parse_args` function in the [training.py](./training.py) for arguments to this script.
For training embedding generation, please refer to code in [init_train_vectors.py](./init_train_vectors.py)
```sh
python training.py --graph_cache_dir ./train_data/
```
## Citation
```cite
@misc{li2025treehopgeneratefilterquery,
title={TreeHop: Generate and Filter Next Query Embeddings Efficiently for Multi-hop Question Answering},
author={Zhonghao Li and Kunpeng Zhang and Jinghuai Ou and Shuliang Liu and Xuming Hu},
year={2025},
eprint={2504.20114},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2504.20114},
}
``` |