Facebook AI Research proposes NormFormer, an approach that improves pretraining perplexity and downstream task performance for both causal and masked language models, achieving GPT3-Large (1.3B) zero-shot performance 60 percent faster and improving fine-tuned GLUE performance by 1.9 percent.

Here is a quick read: Facebook AI’s NormFormer Employs Extra Normalization to Significantly Improve Transformer Pretraining.

The code to train NormFormer models is available on the project’s GitHub. The paper NormFormer: Improved Transformer Pretraining with Extra Normalization is on arXiv.

Source link