flash attention not working with model
#1
by
XVII
- opened
If you try to use sentence transformers with flash_attention_2 you get error NameError: name '_flash_supports_window_size' is not defined
If you uncomment lines 49-53 in modeling_qwen.py everything woks fine.
Code to reproduce:
from sentence_transformers import SentenceTransformer
import torch
class InfRetrieverV1Embedder:
def __init__(self):
self.model = SentenceTransformer(
"infly/inf-retriever-v1",
trust_remote_code=True,
device='cuda',
model_kwargs = {
'attn_implementation': 'flash_attention_2',
"torch_dtype": torch.bfloat16
}
)
self.embedding_dims = 3584
self.max_length = 4096
self.batch_size = 8
self.model_name = "inf-retriever-v1"
self.model.max_seq_length = self.max_length
def encode(self, texts, mode='document'):
assert mode in ('query', 'document')
if mode=='document':
res = self.model.encode(texts, batch_size=self.batch_size)
else:
res = self.model.encode(
texts,
prompt="You are given code snippet with incomplete line. Retrieve relevant code snippets that help to complete this line.",
batch_size=self.batch_size
)
return res.tolist()
embedder = InfRetrieverV1Embedder()
load = ['def hello_world'*10000] * 256
embedder.encode(load)
Transformers 4.49.0 with flash attention 2.7.1post1 and 3.4.1 sentence-transformers
Is it safe to modify this code, or you have faced some hidden consequencef of using flash attention?
Is it safe to modify this code, or you have faced some hidden consequencef of using flash attention?
We commented out lines 49-53 just for convenience, to remove the dependency on flash_attn. You can safely uncomment those lines.