In the new paper Self-attention Does Not Need O(n2) Memory, a Google Research team presents novel and simple algorithms for attention and self-attention that require only constant memory and logarithmic memory and reduce the self-attention memory overhead by 59x for inference and by 32x for differentiation at a sequence length of 16384.

Here is a quick read: Google Proposes a ‘Simple Trick’ for Dramatically Reducing Transformers’ (Self-)Attention Memory Requirements.

The paper Self-attention Does Not Need O(n2) Memory is on arXiv.



Source link