Rate-distortion optimization for transformer inference
Split computing for language models, extending the theory of usable information.
Split computing for language models, extending the theory of usable information.
Theoretical considerations and evaluation of split and distillation points.
Improving the shared channel in coding for machines (CfM).