Deep LearningArchitectures

Flash Attention

Overview

An IO-aware attention algorithm that reduces memory reads and writes by tiling the attention computation, enabling faster training of long-context transformer models.

Cross-References(1)

Deep Learning

More in Deep Learning