New Methods, Old Problems: Ethics and Bias in Modern Natural Language Processing
Major developments in NLP in the past few years have revolutionized the way we do everything from translation to fraud detection.
But, as with all machine learning technologies, these techniques are subject to the limitations and biases in the data from which they learn.
From the case of racist language infecting Microsoft’s Tay chatbot to the failure of Amazon’s resume ranking system due to gender bias, to the failures of Facebook’s algorithm to detect dangerous posts, there are a number of examples where these limitations can have serious consequences for companies and the general public.
In this talk, I will provide an overview of some of the issues to consider in training and using word embeddings and language models and some methods and tools to help address them.
Though no approach is perfect, without inclusive development processes and careful design and review, these methods can do more harm than good.
Data Scientist at Ciox Health
Ben Batorsky is a Data Scientist in Ciox Health’s Real World Data division. He received a Ph.D. in Policy Analysis and has been working in Data Science since 2014.
His major focus is Natural Language Processing and has presented at several conferences and teaches at Harvard Extension on the subject.
He has worked building data science capacity in government, industry, and academia. He is also deeply engaged in the open-source community and is a lead organizer of the Boston chapter of PyData.