Automatic mining of adverse drug reactions from social media posts and unstructured chats


It is estimated that adverse drug reactions (ADR) cost around $30 billion per year in the US only. Yet, most ADRs remain hidden – only around 5% of ADRs are reported to the regulator.

Marketing authorization holders, i.e., pharma companies, are required to monitor for suspected ADR in all own communication channels. It also includes web pages under their ownership, discussion of patient groups and special diseases groups, and mobile apps chats.

Many research studies are concerned with the use of general social media for adverse events mining – such as Twitter and Reddit.

We present a technology for ADR mining in social media posts and unstructured texts. The document is first classified for the presence of ADR. The adverse event is then extracted and related to the corresponding drug.

The presented system is based on Spark NLP/Spark NLP for Healthcare, the most widely used NLP library in the industry.

About the speaker
Yanshan Wang

Jiri Dobes

Head of Solutions at John Snow Labs

Jiri Dobes is the Head of Solutions at John Snow Labs. He has been leading the development of machine learning solutions in healthcare and other domains for the past five years.
Jiri is a PMP-certified project manager. His previous experience includes delivering large projects in the power generation sector and consulting for the Boston Consulting Group and large pharma.
Jiri holds a Ph.D. in mathematical modeling.


Sessions: April 5th – 6th 2022
Trainings: April 12th – 15th 2022


Presented by