Umesh Nandal – ElsevierSenior Natural Language Processing Scientist
Umesh is the Senior Machine Learning (ML) & Natural Language Processing (NLP) scientist in Content & Innovation (C&I) department at Elsevier. The C&I team is focused on the content production pipelines that enable Elsevier to turn content into answers. Specifically, they combine NLP and ML methods with domain expertise in order to enrich content into data structures. This drives the analytics that Elsevier’s products require with the quality that Elsevier’s customers expect.
With a background in Chemistry and Computational Biology, Umesh is applying state-of-the-art methods in ML and NLP to improve or build new life science products of Elsevier that can help researchers in getting correct answers to their questions quickly.
Umesh has several years of experience in data analytics. Prior to joining Elsevier, he used various ML and computational approaches to analyse molecular data generated from high-throughput technologies to understand biological processes in healthy and diseased organisms. During his PhD, he intensively worked on the comparison of mouse models with humans by building mathematical models that can compare their biological networks.
At ConTech Forum Umesh will be discussing the following:
AI for text mining in chemistry
In commercial R&D projects, public disclosure of new chemical compounds often takes place in patents. Only a small proportion of these compounds are published in journals, usually a few years after the patent. Content databases such as Elsevier’s Reaxys provide such information mostly based on manual exertions, which are time-consuming and costly. Automatic text-mining approaches help overcome some of the limitations of the manual process. Different text-mining approaches exist to extract chemical entities from patents. Relevancy of a compound to a patent is based on the patent’s context. We have designed an automated system that extracts chemical entities from patents and classifies their relevance. BiLSTM-CRF model utilizing pre-trained word embedding, character-level word representations and contextualized ELMo word representations were used for Named Entity recognition. AI models trained on manual annotations have also been used for relevancy classification. Our system can extract chemical compounds from patents and classify their relevance with high performance.