Connecting the Dots in Clinical Document Understanding and Information Extraction

Electronic health records (EHRs) are the primary source of information for clinicians tracking the care of their patients. Due to innate obstacles in extracting information from unstructured text data and the high level of preciseness dictated in the healthcare domain, manual data abstraction has been prevalent in the industry.

Despite several efforts towards using Machine Learning (ML) in information extraction from EHRs, a deeper information extraction process where we can understand not only ‘what’ but also ‘how’ and ‘why’ has been limited and drawing the high-level picture of an entire journey of a patient across multiple documents through years has also been practically impossible.

In this talk, Veysel presents an end-to-end clinical document parsing pipeline which is using state-of-the-art named entity recognition (NER), text classification, assertion status detection, and relation classification models, all empowered by Spark NLP library and deployed in a Kubernetes cluster that is capable of serving run-time requests over Rest APIs as well as capable of parsing large volume of documents in Apache Spark cluster.

This system has already been deployed in a hospital setting and saved hundreds of thousands of manual abstraction hours so far.

About the speaker

Veysel Kocaman 

Principal Data Scientist and ML Engineer at John Snow Labs

Veysel is a Lead Data Scientist at John Snow Labs, improving the Spark NLP for the Healthcare library and delivering hands-on projects in Healthcare and Life Science. He is a seasoned data scientist with a strong background in every aspect of data science including machine learning, artificial intelligence, and big data with over ten years of experience.

He’s also pursuing his Ph.D. in ML at Leiden University, Netherlands, and delivers graduate-level lectures in ML and Distributed Data Processing.

Veysel has broad consulting experience in Statistics, Data Science, Software Architecture, DevOps, Machine Learning, and AI to several start-ups, boot camps, and companies around the globe.

He also speaks at Data Science & AI events, conferences and workshops, and has delivered more than 20 talks at International as well as national conferences and meetups.