Automated Medical Data De-Identification and Obfuscation​

This talk examines the crucial need for de-identifying protected health information (PHI) in unstructured patient-level data to harness its potential while ensuring compliance with legal and privacy requirements. With an abundance of sensitive data managed by healthcare providers and industry stakeholders, de-identification facilitates the creation of innovative healthcare and pharma solutions, benefitting various parties involved. The presentation addresses the reasoning behind de-identification and the stringent regulations enforced by the HIPAA and GDPR frameworks, emphasizing the balance between privacy and data utility.

 The talk also investigates the performance of manual and automated de-identification methods, assessing their accuracy and cost-effectiveness. While manual de-identification faces challenges in accuracy, consistency, and high costs, particularly for extensive datasets, automated de-identification supported by natural language processing (NLP) presents a practical alternative. In this context, the presentation outlines the capabilities of the Healthcare NLP library by John Snow Labs, which has showcased cutting-edge performance on standardized benchmarks.

Built upon the Spark big data framework, the library offers tailored de-identification solutions, capable of processing millions of records on large Spark or Databricks clusters. John Snow Labs has consistently improved its solution, attaining an F1 score of 98.2% on the English n2b2 standard de-identification benchmark in 2022 and analogous results in other European languages. The talk highlights the importance of this advancement, which signifies a 70% reduction in the error rate compared to human benchmarks.

About the speaker

Veysel Kocaman

Lead Data Scientist at John Snow Labs

Veysel is a Lead Data Scientist and ML Engineer at John Snow Labs, improving the Spark NLP for the Healthcare library and delivering hands-on projects in Healthcare and Life Science.

He is a seasoned data scientist with a strong background in every aspect of data science including machine learning, artificial intelligence, and big data with over ten years of experience. He’s also pursuing his Ph.D. in ML at Leiden University, Netherlands, and delivers graduate-level lectures in ML and Distributed Data Processing.

Veysel has broad consulting experience in Statistics, Data Science, Software Architecture, DevOps, Machine Learning, and AI to several start-ups, boot camps, and companies around the globe. He also speaks at Data Science & AI events, conferences and workshops, and has delivered more than a hundred talks at international as well as national conferences and meetups.


Jiri Dobes

Head of Solutions at John Snow Labs

Jiri Dobes is the Head of Solutions at John Snow Labs. He has been leading the development of machine learning solutions in healthcare and other domains for the past five years. Jiri is a PMP-certified project manager.

His previous experience includes delivering large projects in the power generation sector and consulting for the Boston Consulting Group and large pharma. Jiri holds a Ph.D. in mathematical modeling.



Online Event: April 4-5, 2023



Presented by