Deploying BLOOM: A 176B Parameter Multi-Lingual Large Language Model

In this talk, we will present the technology and infrastructure that enabled the deployment of BLOOM, the largest ever open-access multilingual model.

We will begin by outlining the challenges presented by large language models (LLMs) and the considerations taken to combat them. We will cover how we scaled our modelling code through 2D parameter and activation partitioning. We will present our hardware considerations and how we optimised for throughput. Following this, we will outline how we ported the system from a stand-alone model to a public Hugging Face demo, accepting user requests and returning generated text. We will conclude by discussing how we open-sourced our code and the platform that this provides practitioners in the community.

About the speaker

Sanchit Gandhi

Research Engineer at Hugging Face

An ML Engineer in the open-source speech team, Sanchit is a contributor and maintainer of Hugging Face Transformers, the current most popular state-of-the-art machine learning repository. Sanchit is pioneering the integration of JAX-based models to Transformers, enabling efficient and scalable inference for large language models.

Sanchit’s research interests lie in robust speech recognition, namely the use of pre-trained encoder/decoder checkpoints for generalisable and extensible speech systems.

Prior to working at Hugging Face, Sanchit completed his Master’s Degree from the University of Cambridge, writing his thesis on the topic of “Interpretability for Deep Learning” under the supervision of Professor Mark Gales.


Suraj Patil

Machine Learning Engineer at Hugging Face



Sessions: October 4 – 6
Trainings: October 11 – 14


Presented by