Language based Pre-training for Drug Discovery

Pretraining has taken the NLP world by storm as ever larger language models have broken successive benchmarks.

In this talk, I’ll review some recent work applying pretraining to scientific challenges, and in particular will discuss the challenges of pretraining for molecular machine learning.

I’ll introduce our new architecture, ChemBERTa, which explores the use of BERT-style pretraining for machine learning problems inspired by drug discovery applications.

About the speaker

Bharath Ramsundar 

Co-Founder & CEO at Computable Labs

Bharath received a BA and BS from UC Berkeley in EECS and Mathematics and was valedictorian of his graduating class in mathematics. He did his PhD in computer science at Stanford University where he studied the application of deep-learning to problems in drug-discovery.

At Stanford, Bharath created the open-source project to grow the deep drug discovery open source community, co-created the benchmark suite to facilitate development of molecular algorithms, and more. Bharath’s graduate education was supported by a Hertz Fellowship, the most selective graduate fellowship in the sciences. After his PhD, Bharath co-founded Computable a startup that built better tools for collaborative dataset management. Bharath is currently the CEO of Deep Forest Sciences, a deep tech R&D company that builds AI for deep tech applications. Bharath is also the lead author of “TensorFlow for Deep Learning: From Linear Regression to Reinforcement Learning”, a developer’s introduction to modern machine learning, with O’Reilly Media, and the lead author of “Deep Learning for the Life Sciences”