An architecture for accelerated large-scale inference of transformer-based language models

Unified batch and online transformer inference.

February 2021 · Amir Ganiev, Colton Chapin, Anderson de Andrade, Chen Liu