Application of Spark NLP for Development of Multi-Modal Prediction Model from EHR

Different data sources such as structured data, clinical notes, laboratory measurements capture information about the human body at different time-scales.

There is no large-scale study that recognizes the multi-scale nature of these multi-modal data sources and demonstrates the value of such integration.


While the value for natural language clinical notes have been long recognized by the medical informatics community, their integration has always been difficult due to inherent technical challenges with extending core natural language processing techniques such as entity and relation extraction to domain specific problems.


This talk will present a novel approach for such integration where we extract information from natural language clinical notes into a structured data realm via Spark NLP and then demonstrate the impact of such integration on prediction performance via a self-supervised graph transformer approach.


About the speaker

Sutanay Choudhury

Chief Scientist at PNNL

Dr. Sutanay Choudhury (PI) is a Chief Scientist at PNNL with 10+ years of experience in large-scale graph analytics and machine-learning. His research focuses on learning high-fidelity representations of structure from complex data sources and development of methods for reasoning and prediction on such representations.

Dr. Choudhury serves as a research thrust lead for PNNL’s internal initiative on AI research. He has served as PI, Co-PI and contributor on multiple projects funded by the DoD, DHS, DOE and PNNL’s internal research programs developing multiple systems focusing on graph analytics and graph neural networks.

He developed StreamWorks, a streaming graph analytics system that received R&D100 award for novel applications in cyber-security.