Transformers

Rate-distortion optimization for transformer inference

Split computing for language models, extending the theory of usable information.

Unified batch and online transformer inference.