How This NLP-Driven Literature Search Engine Can Help In Extracting Relevant COVID Information For Medical Innovation

With the growing risk of fast pace spread of COVID-19 across the globe, there is an extreme need for a potential way forward or approaches to break the chain if not cure.​

Currently, there is significant research & literature around the similar situation during previous epidemics spread which may not be specific for the current situation but still very valuable. ​This might provide us with the right approaches and improved policy measures that will aid us to fight this battle.​

We, like Natural Language Processing (NLP) researchers, hope to leverage this research, ideas, reports, or any data to find close to accurate and quickly actionable insights to control the spread via medical or non-pharma interventions.​With this, we hope to bring in our approach which can help community members to find the right literature using the methods of NLP, Deep Learning & Search. ​

As part of the R&D activity, we participated in one of the well renowned and prestigious TREC (Text REtrieval Challenge) organized by NIST (National Institute of Standards and Technology). This is TREC-COVID which aims at building a Pandemic Retrieval Test Collection Challenge to build and come up with a model for information retrieval from the CORD-19 dataset of literature articles.

As part of this activity, we developed an NLP & deep learning-enabled engine which can accept natural language/free text dynamic queries and retrieve top N articles from the offline repositories of PubMed Central, WHO, bioRxiv & medRxiv corpora. The algorithm also returns by highlighting the specific sentences/section where the answer can be found for the input query.

It also computes the confidence score associated with every hit to determine the score for each hit corresponding to the input query. The beta version of the solution is available and can be accessed using the link provided below.

As part of the further phases of this solution, we are working towards adding the functionality of Question-Answering (QnA) system that would fetch exact answers to questions, instead of longer sentences, paragraphs, or documents from where the user must find the answer. Also, depending upon the feedback from the Business & RnD users, the phase of the solution will incorporate the concepts of reinforcement learning in the coming future.

We hope to improve our QnA system using reinforcement learning techniques to dynamically improve the retrieval process of the engine. As the next step, this engine is planned to be integrated with other COVID-19 applications developed at Merck.

About the speaker

Prathamesh P Karmalkar

Senior Data Scientist – AI, NLP & Text Analytics at Merck Life Science, India

Prathamesh is an experienced professional, who brings all the skills and expertise in Artificial Intelligence, Natural Language Processing, Text Analytics, Deep Learning & Machine Learning.

His capability to do design thinking on applying AI & ML to complex business problems in the area of Healthcare, Life Sciences, Patient Safety, Medical Information, Manufacturing, R&D, etc. is quite impressive. Prathamesh has published & presented several papers in the area of AI, NLP, Machine Learning, Deep Learning, and Advanced Analytics.

He has also delivered several talks at various global conferences & symposiums. Prathamesh is honored with India’s Top 40 Under 40 Data Scientists Award by Analytics India Magazine. Prathamesh was selected as Member of Advisory & Technical Committee – International Conference on Next Generation Computing Technologies by Centre for Information Technology-UPES for 2015, 2016 & 2017 editions.

Also, Prathamesh played the Lead Role in Review Committee for NLU, Text Analytics & AI at “International Conference on Machine Intelligence and Data Science Applications (MIDAS) 2020”, in association with IET.

In his current role of Principal Data Scientist at Merck KGaA, he is bringing thought leadership & expertise to help set up the practice for NLP, Text Analytics, Deep Learning, Machine Learning & AI.