Behavioral Testing | Testing NLP Models

Beyond Accuracy: Behavioral Testing of NLP Models with CheckList

We will present CheckList, a task-agnostic methodology and tool for testing NLP models inspired by principles of behavioral testing in software engineering. We will show a lot of fun bugs we discovered with CheckList, both in commercial models (Microsoft, Amazon, Google) and research models (BERT, RoBERTA for sentiment analysis, QQP, SQuAD).

We’ll also present comparisons between CheckList and the status quo, in a case study at Microsoft and a user study with researchers and engineers. We show that CheckList is a really helpful process and tool for testing and finding bugs in NLP models, both for practitioners and researchers.

About the speaker

Marco Túllio Ribeiro

Senior Researcher at Microsoft Research

Marco Tulio Ribeiro is a Senior Researcher at Microsoft Research.

His work is on facilitating the communication between humans and machine learning models, which includes interpretability, trust, debugging, feedback, robustness, testing, etc. He received his Ph.D. from the University of Washington.

Healthcare
NLP Summit 2021

Free & Online

Tue, Apr 6, 2021, 12:00 PM –
Thu, Apr 7, 2021, 3:00 PM EST

Register

NLP Summit 2021

Free & Online

Tue, Oct 5, 2021, 12:00 PM –
Thu, Oct 8, 2021, 4:00 PM EST

Register

Presented by