An architecture for accelerated large-scale inference of transformer-based language modelsUnified batch and online transformer inference.