Rate-distortion optimization for transformer inference

Split computing for language models, extending the theory of usable information.

April 2026 · Anderson de Andrade, Alon Harell, Ivan V. Bajić

An architecture for accelerated large-scale inference of transformer-based language models

Unified batch and online transformer inference.

February 2021 · Amir Ganiev, Colton Chapin, Anderson de Andrade, Chen Liu