Visual Document Retrieval
Transformers
Safetensors
ColPali
English
pretraining
tonywu71 commited on
Commit
dd59f4e
·
verified ·
1 Parent(s): edc1d95

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -5
README.md CHANGED
@@ -29,11 +29,7 @@ The HuggingFace `transformers` 🤗 implementation was contributed by Tony Wu ([
29
 
30
  ## Model Description
31
 
32
- This model is built iteratively starting from an off-the-shelf [SigLIP](https://huggingface.co/google/siglip-so400m-patch14-384) model.
33
- We finetuned it to create [BiSigLIP](https://huggingface.co/vidore/bisiglip) and fed the patch-embeddings output by SigLIP to an LLM, [PaliGemma-3B](https://huggingface.co/google/paligemma-3b-mix-448) to create [BiPali](https://huggingface.co/vidore/bipali).
34
-
35
- One benefit of inputting image patch embeddings through a language model is that they are natively mapped to a latent space similar to textual input (query).
36
- This enables leveraging the [ColBERT](https://arxiv.org/abs/2004.12832) strategy to compute interactions between text tokens and image patches, which enables a step-change improvement in performance compared to BiPali.
37
 
38
  ## Model Training
39
 
 
29
 
30
  ## Model Description
31
 
32
+ Read the `transformers` 🤗 model card: https://huggingface.co/docs/transformers/en/model_doc/colpali.
 
 
 
 
33
 
34
  ## Model Training
35