From Unstructured Documents to a Longitudinal Health Record

In this talk, Timothy Laurent and Isaac Palka will show how Invitae uses NLP to transform unstructured medical documents into structured patient health records.

They will introduce LayoutLMv2, an ML model open-sourced by Microsoft, which is trained to interpret document text and layout. They’ll discuss how they trained this model on their own corpus of medical records labeled with a custom OCR labeling interface, how they scaled this model in the cloud, and how it fits into larger information extraction workflows (e.g. table comprehension, relation extraction).

Finally, they’ll show how the extracted entities are leveraged in their proprietary app to construct FHIR resources, and to generate high-grade, large-volume patient records that are searchable and interoperable.

About the speaker

Timothy Laurent

Principal Engineer – Machine Learning Invitae

Timothy Laurent is a Principal ML Engineer at Invitae. Tim has a background in biology and computer science and has helped to develop many components of Invitae’s “scalable genetics” software stack. Most recently, Tim works on developing natural language algorithms pipelines for clinical information extraction and on the underlying infrastructure that supports machine learning at Invitae.


Isaac Palka

Senior Director of Engineering at Invitae

Isaac Palka is an experienced software engineer, engineering leader, and entrepreneur. He is currently Senior Director of Engineering at Invitae, where he leads application development for their health platform.

He has deep experience in the healthcare space, having previously worked at Signify Health and Cigna Health insurance, and holds several patents. Isaac lives in NYC with his wife and daughter, and enjoys playing classical piano and electric guitar.



Sessions: October 4 – 6
Trainings: October 11 – 14


Presented by