A research team from Google Cloud AI, Google Research and Rutgers University simplifies vision transformers’ complex design, proposing nested transformers (NesT) that simply stack basic transformer layers to process non-overlapping image blocks individually. The approach achieves superior ImageNet classification accuracy and improves model training efficiency.

Here is a quick read: Google & Rutgers’ Aggregating Nested Transformers Yield Better Accuracy, Data Efficiency and Convergence.

The paper Aggregating Nested Transformers is on arXiv.

Source link