Answering Patient Level Questions from Raw Clinical Data

A common need across healthcare is to combine all the known information about a group of patients, including both structured data (EHR tables, claims, FHIR resources, …) and unstructured data (clinical notes, radiology reports, PDF’s of lab reports, …), to create a holistic longitudinal view of each patient. This is then used to either provide a user-friendly interface – such as a chatbot, search, or visual query builder – to ask questions about a specific patient (“has she ever been on an SSRI before?”) or to find a cohort of patients who can be candidates for a clinical trial, population health, or research effort (“find all patients who have had a heart attack in the last six months and don’t take beta blockers yet”).

This session presents a solution that enables organizations with massive amounts of noisy clinical data to provide a natural language interface that can answer such questions automatically, at scale, and with full privacy & compliance. Since this cannot be achieved by simply applying LLM or RAG LLM solutions, it starts with a healthcare-specific data pre-processing pipeline that performs the following tasks:

  • De-Identification – of both structured and unstructured data, including consistent tokenization and obfuscation
  • Information Extraction – detection of 400+ medical facts out of the input raw documents
  • Patient Level Reasoning – using fine-tuned model to infer patient-level facts (such as cancer staging, for example)
  • Data Modeling – transforming all the inferred facts into an OMOP relational data model.

Once the raw data has been prepared this way, which can be done privacy at scale on commodity hardware, other models can be used to query it.

We’ll show a chatbot that provides a natural language interface, so that users can ask about patients and cohorts in plain English and get current and explainable results.


About the speaker

Veysel Kocaman

Head of Data Science at John Snow Labs

Veysel is a Head of Data Science at John Snow Labs, improving the Spark NLP for the Healthcare library and delivering hands-on projects in Healthcare and Life Science. Holding a PhD degree in ML, Dr. Kocaman has authored more than 25 papers in peer reviewed journals and conferences in the last few years, focusing on solving real world problems in healthcare with NLP.

He is a seasoned data scientist with a strong background in every aspect of data science including machine learning, artificial intelligence, and big data with over ten years of experience. Veysel has broad consulting experience in Statistics, Data Science, Software Architecture, DevOps, Machine Learning, and AI to several start-ups, boot camps, and companies around the globe.

He also speaks at Data Science & AI events, conferences and workshops, and has delivered more than a hundred talks at international as well as national conferences and meetups.



Sessions: April 2nd – 3rd 2024
Trainings: April 15th – 19th 2024


Presented by