Automated Classification and Entity Extraction from essential documents pertaining to Clinical Trials

An AI-based solution that delivers a future-proof model using transfer learning which can be used to convert source-agnostic unstructured data into structured data. It supports the classification of artifacts and sub-artifacts and extraction of metadata that are defined in TMF Reference Model.

The core pipeline comprises OCR based text extraction, language detection, layout & content-based document classifiers, more than 40 different DL based named entity recognition models, each of which is trained on a set of document types and extracting various target entities given the document type, handwritten text detection, handwritten date extraction, and artifact-based post-processing rules to automate the migration between different document management systems in an air-gapped network.

About the speaker

Nirjhar Sarkar

Technical Design Expert at Novartis

Nirjhar is a Technical Design Expert in the Regulatory Service Delivery at Novartis and is passionate about developing solutions that will reduce drug development time so that medicines can reach patients faster. He brings in extensive operational acumen, process improvement experience, and effective leadership skills.

He is pursuing his MS in Regulatory Affairs and Health Policy at Massachusetts College of Pharmacy and Health Sciences. Analytical by nature, he maintains strong interpersonal skills, an adaptive work method, a robust understanding of the industry, and a commitment to organizational integrity.


Veysel Kocaman

Principal Data Scientist and ML Engineer at John Snow Labs

Veysel is a Lead Data Scientist at John Snow Labs, improving the Spark NLP for the Healthcare library and delivering hands-on projects in Healthcare and Life Science. He is a seasoned data scientist with a strong background in every aspect of data science including machine learning, artificial intelligence, and big data with over ten years of experience.

He’s also pursuing his Ph.D. in ML at Leiden University, Netherlands, and delivers graduate-level lectures in ML and Distributed Data Processing.

Veysel has broad consulting experience in Statistics, Data Science, Software Architecture, DevOps, Machine Learning, and AI to several start-ups, boot camps, and companies around the globe. He also speaks at Data Science & AI events, conferences and workshops, and has delivered more than a hundred talks at international as well as national conferences and meetups.



Sessions: October 5 – 7
Trainings: October 4, 12 – 15


Presented by