Nvidia researchers propose AdaViT, an input-dependent mechanism that adaptively adjusts vision transformers’ inference cost by halting the compute of different tokens at different depths to reserve compute for discriminative tokens.

Here is a quick read: NVIDIA’s AdaViT Halts Token Computation to Adaptively Adjust ViT Inference Cost on Images of Different Complexity.

The paper AdaViT: Adaptive Tokens for Efficient Vision Transformer is on arXiv.



Source link