infly/inf-retriever-v1 · flash attention not working with model

XVII

Apr 9

If you try to use sentence transformers with flash_attention_2 you get error NameError: name '_flash_supports_window_size' is not defined
If you uncomment lines 49-53 in modeling_qwen.py everything woks fine.

Code to reproduce:

from sentence_transformers import SentenceTransformer
import torch

class InfRetrieverV1Embedder:
    def __init__(self):
        self.model = SentenceTransformer(
            "infly/inf-retriever-v1", 
            trust_remote_code=True,
            device='cuda',
            model_kwargs = {
                'attn_implementation': 'flash_attention_2',
                "torch_dtype": torch.bfloat16
            }
        )
        
        self.embedding_dims = 3584
        self.max_length = 4096
        self.batch_size = 8
        self.model_name =  "inf-retriever-v1"

        self.model.max_seq_length = self.max_length
        
    def encode(self, texts, mode='document'):
        assert mode in ('query', 'document')
        if mode=='document':
            res = self.model.encode(texts, batch_size=self.batch_size)
        else:
            res = self.model.encode(
                    texts, 
                    prompt="You are given code snippet with incomplete line. Retrieve relevant code snippets that help to complete this line.",
                    batch_size=self.batch_size
                )
        return res.tolist()
    
embedder = InfRetrieverV1Embedder()

load = ['def hello_world'*10000] * 256
embedder.encode(load)

Transformers 4.49.0 with flash attention 2.7.1post1 and 3.4.1 sentence-transformers

XVII

Apr 9

Is it safe to modify this code, or you have faced some hidden consequencef of using flash attention?

SamuelYang

inftech.ai org Apr 10

Is it safe to modify this code, or you have faced some hidden consequencef of using flash attention?

We commented out lines 49-53 just for convenience, to remove the dependency on flash_attn. You can safely uncomment those lines.