Rate-distortion optimization for transformer inferenceSplit computing for language models, extending the theory of usable information.