An architecture for accelerated large-scale inference of transformer-based language models

Download

Paper

Abstract

This work demonstrates the development process of a machine learning architecture for inference that can scale to a large volume of requests. In our experiments, we used a BERT model that was fine-tuned for emotion analysis, returning a probability distribution of emotions given a paragraph. The model was deployed as a gRPC service on Kubernetes. Apache Spark was used to perform inference in batches by calling the service. We encountered some performance and concurrency challenges and created solutions to achieve faster running time. Starting with 3.3 successful inference requests per second, we were able to achieve as high as 300 successful requests per second with the same batch job resource allocation. As a result, we successfully stored emotion probabilities for 95 million paragraphs within 96 hours.

Citation

Amir Ganiev, Colt Chapin, Anderson de Andrade, & Chen Liu. (2021). “An architecture for accelerated large-scale inference of transformer-based language models.” NAACL Conference on Human Language Technologies: Industry Papers.

@inproceedings{DBLP:conf/naacl/GanievCAL21,
  author       = {Amir Ganiev and
                  Colton Chapin and
                  Anderson de Andrade and
                  Chen Liu},
  editor       = {Young{-}bum Kim and
                  Yunyao Li and
                  Owen Rambow},
  title        = {An Architecture for Accelerated Large-Scale Inference of Transformer-Based
                  Language Models},
  booktitle    = {Proceedings of the 2021 Conference of the North American Chapter of
                  the Association for Computational Linguistics: Human Language Technologies:
                  Industry Papers, {NAACL-HLT} 2021, Online, June 6-11, 2021},
  pages        = {163--169},
  publisher    = {Association for Computational Linguistics},
  year         = {2021},
  url          = {https://doi.org/10.18653/v1/2021.naacl-industry.21},
  doi          = {10.18653/V1/2021.NAACL-INDUSTRY.21},
  timestamp    = {Wed, 15 Nov 2023 13:49:17 +0100},
  biburl       = {https://dblp.org/rec/conf/naacl/GanievCAL21.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}