Content Graphs: Multi-Task NLP Approach for Cataloging
At Chegg, students can get help with homework using Chegg Study, learn from flashcards for their tests, practice for exams and learn relevant concepts. This represents a large amount of academic content authored by experts and learners which when organized can enrich our search, recommendations, cognitive learning models, and personalization systems for better efficacy.
We have developed an SME curated Knowledge Graph with nodes representing a hierarchical scheme of courses and topics along the same lines as curriculums at educational institutions. This talk will give an overview of this taxonomy and how we utilized multi-task learning to create a Content Graph by automatically labeling content with one or more of these graph nodes. Taxonomy and content graph together form the backbone for how our students discover, navigate, consume, and excel.
The talk will dive into the details of the NLP model, outcome, challenges (like the vast hierarchical label space, classes with sparse training data, etc.), and strategies to overcome these challenges. It will also touch upon the knowledge graph, annotation collection, SOTA NLP techniques such as transformers, other adopted techniques like Siamese Networks, multi-task learning, and conclude with a brief on the current use cases that stemmed from this initiative.
Staff Data Scientist at Chegg Inc.
Sakshi Bhargava is a Staff Data Scientist at Chegg, a leading student-first connected learning platform. She focuses on research, developing and deploying scalable AI systems rooted in interdisciplinary fields (such as learning, data and computer sciences) to help accelerate education.
She has rich experience working on NLP efforts, including document classification, text generation, summarization, recommendation systems, language modeling, information retrieval, and semantic analysis.
Sakshi is also responsible for overseeing multiple projects across a broad array of ML work, integrating AI into multiple product/business units along with designing and coordinating processes.