BERTopic: Topic Modeling by Combining the Old with the New

In this talk, Maarten will introduce BERTopic, a package for topic modeling that combines classical techniques with recent developments in large language models, clustering techniques, and dimensionality reduction.

Next, he will share a brief overview of the underlying theory, algorithms, and philosophy of BERTopic.

Finally, he will focus on concrete examples of how BERTopic can be used to model topics across large numbers of documents. Its main features will be discussed along with the pros and cons of using such a package. Other packages developed by Maarten will also briefly make an appearance.”

This approach is not only helpful for annotators with non-medical background but also a no cost method to annotate notes.

About the speaker

Maarten Grootendorst

Data Scientist at IKNL (Integraal Kankercentrum Nederland)

Maarten Grootendorst is a psychologist turned data scientist and currently works at IKNL (Netherlands Comprehensive Cancer Organisation). He likes to work at the intersection of artificial intelligence and human behavior, particularly NLP-based solutions. He is the author of open-source packages such as BERTopic, PolyFuzz, KeyBERT, and Concept.



Sessions: October 4 – 6
Trainings: October 11 – 14


Presented by