Does LLama4 have chunked attention in generation phase ?

#64

by vanshils - opened Apr 15

Apr 15

Same as title.
I know chunked attention mask is there for context phase. But does LLama4 implement chunked attention mask in generation phase too ?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment