A Google Research team conducts a systematic exploration comprising more than 4800 experiments on Vision Transformers, MLP-Mixers and ResNets with parameters ranging from 10 million to 10 billion, evaluated on more than 20 downstream image recognition tasks, aiming to capture the nonlinear relationships between performance on upstream and downstream tasks.

Here is a quick read: Google Researchers Explore the Limits of Large-Scale Model Pretraining.

The paper Exploring the Limits of Large Scale Pre-training is on arXiv.

Source link