Analyzing Biomedical and Clinical Text with the Stanza Python NLP Library

The growing interest in biomedical and clinical research has led to a wide need of analyzing and understanding text in these domains. While today’s open-source NLP tools have integrated sophisticated neural architectures that improve their performance on general-domain text, they often lack convenient support for the analysis of biomedical text at the same level of accuracy.

In this talk, I will talk about the out-of-the-box biomedical and clinical packages in the Stanza Python NLP toolkit.

I will start by talking about the fully neural architectural design of Stanza, which allows it to generalize with ease to over 70 languages and multiple domains.

Then I will talk about how we extend this design to build and evaluate the biomedical and clinical pipelines for Stanza, which provide near state-of-the-art performance for linguistic analysis and entity recognition tasks.

Lastly, I will showcase how these biomedical and clinical models can be used in common research and text analysis scenarios. You can try out an online demo of these packages at:

About the speaker

Yuhao Zhang

Researcher at Stanford University & Stanza Committer

Yuhao Zhang is a Ph.D. candidate at Stanford University. He is jointly advised by Prof. Chris Manning in the Stanford NLP group and Prof. Curtis Langlotz in the Stanford Center for Artificial Intelligence in Medicine & Imaging.

His research has focused on various aspects of natural language processing, including information extraction, text summarization, multimodal modeling and their applications in the biomedical domain.

He is also a co-author of the widely used Stanza Python NLP library and leads the efforts in extending Stanza’s functionality to more than 60 human languages and to the biomedical domain.