flash attention not working with model

#1
by XVII - opened

If you try to use sentence transformers with flash_attention_2 you get error NameError: name '_flash_supports_window_size' is not defined
If you uncomment lines 49-53 in modeling_qwen.py everything woks fine.

Code to reproduce:

from sentence_transformers import SentenceTransformer
import torch

class InfRetrieverV1Embedder:
    def __init__(self):
        self.model = SentenceTransformer(
            "infly/inf-retriever-v1", 
            trust_remote_code=True,
            device='cuda',
            model_kwargs = {
                'attn_implementation': 'flash_attention_2',
                "torch_dtype": torch.bfloat16
            }
        )
        
        self.embedding_dims = 3584
        self.max_length = 4096
        self.batch_size = 8
        self.model_name =  "inf-retriever-v1"

        self.model.max_seq_length = self.max_length
        
    def encode(self, texts, mode='document'):
        assert mode in ('query', 'document')
        if mode=='document':
            res = self.model.encode(texts, batch_size=self.batch_size)
        else:
            res = self.model.encode(
                    texts, 
                    prompt="You are given code snippet with incomplete line. Retrieve relevant code snippets that help to complete this line.",
                    batch_size=self.batch_size
                )
        return res.tolist()
    
embedder = InfRetrieverV1Embedder()

load = ['def hello_world'*10000] * 256
embedder.encode(load)

Transformers 4.49.0 with flash attention 2.7.1post1 and 3.4.1 sentence-transformers

Is it safe to modify this code, or you have faced some hidden consequencef of using flash attention?

inftech.ai org

Is it safe to modify this code, or you have faced some hidden consequencef of using flash attention?

We commented out lines 49-53 just for convenience, to remove the dependency on flash_attn. You can safely uncomment those lines.

Sign up or log in to comment