Suicide Risk Prediction with Databricks, Spark NLP & NLP Lab
April 4th at 2:35 pm EST
Every year nearly 800,000 people around the world take their own lives. Suicide is the 12th leading cause of death in the USA, and the 3rd leading cause of death for teenagers and young adults. Many factors can increase the risk of suicide or protect against it. For example, people who have experienced violence, including child abuse, bullying, or sexual violence, have a higher suicide risk. Suicide is connected to other forms of injury, violence, substance use, stigma, discrimination, and several underlying diseases including depression and anxiety. Being related to family and community support and having easy access to health care may decrease suicidal thoughts and behaviors.
This project aims to train a predictive model to identify patients at risk for suicidal ideation and behavior. Since many of the key features for such a model are only available in unstructured clinical notes, we train a custom natural language processing models & pipelines for extracting and normalizing them. We will present the full workflow from data annotation to model training and validation, including seamless integration of John Snow Labs’ NLP Lab and Healthcare NLP in the Databricks platform. The code is freely available as a ready-to-run Databricks solution accelerator.

Amir Kermany
Global Technical Director, Healthcare and Life Sciences at Databricks
Amir is the Technical Industry Lead for Healthcare & Life Sciences at Databricks, where he focuses on developing advanced analytics solution accelerators to help health care and life sciences organizations in their data and AI journey.
Amir’s past positions include Sr. Data Scientist at Shopify, Sr. Staff Scientist at AncestryDNA, and Research Scholar in Human Genetics at the Howard Hughes Medical Institute. He holds a PhD in Mathematical Biology, MA.Sc. in Electrical Engineering and B.Sc. in Physics.

Jiri Dobes
Head of Solutions at John Snow Labs