Jul. 23, 2023

Flash Attention

Much progress in AI over the past few years has been fueled by the transformer architecture. Transformers are the closest thing we have right now to machine learnable programs. They can be trained to generate images, text, videos, audio, video games, or even raw byte sequences, you name it. Behind the transformer, powering many of these applications, there are two key operations which make up 99% of the FLOPs: attention and feed-forward layers.