A team from Tsinghua University and Microsoft Research Asia proposes Fastformer, an efficient Transformer variant based on additive attention that achieves effective context modelling with linear complexity.

Here is a quick read: Tsinghua U & Microsoft Propose Fastformer: An Additive Attention Based Transformer With Linear Complexity.

The paper Fastformer: Additive Attention Can Be All You Need is on arXiv.

Source link