Benefits and challenges in de-identifying and linking unstructured records


April 6th at 2:35 PM – 3:05 PM ET

Register – Free

There are tremendous research benefits of linking de-identified patient records to get a holistic patient view especially for studies related to drug development and patient outcomes.

Today, most research data is focused just on structured dataset due to the complexity of de-identifying all records. In this talk, we will present benefits and learning from a de-identification pipeline using Spark NLP and tokenization.

About the speaker

Mark Ungerer

Head of Product at Datavant

Mark leads Datavant’s Product team on a mission to connect the world’s health data. With over 400 organizations using Datavant software to deidentify and tokenize over 100 billion patient records (and counting), Datavant’s products are the most widely used in the industry. Prior to Datavant, Mark developed enterprise data products that provided insights on the performance of millions of mobile apps.

Mark received a BSE in Operations Research & Financial Engineering from Princeton University with a certificate in Computer Science. Outside of work Mark can be found on the tennis court or running the hills of San Francisco.

About the speaker

Linda Chen

Strategic Partnerships Development Manager at John Snow Labs

Linda leads John Snow Labs sales and business development efforts. She has over two decades of experience in the role of General Manager, Program Management and Business Development for big data machine learning software companies, including Microsoft. 

Her current focus is to help healthcare organizations to understand and best leverage the power of state-of-the-art NLP and seeing them succeeding in uncovering insights from unstructured text. She also dedicates her time to inspire young girls to have confidence in pursuing STEM.



Sessions: April 6 – 7
Trainings: April 8 – 9


Presented by