Using unethical data to build a more ethical world

PDF / 947,582 Bytes
8 Pages / 595.276 x 790.866 pts Page_size
47 Downloads / 228 Views

OPINION PAPER

Using unethical data to build a more ethical world How CallMiner handles imperfections in speech recognition Jamie Brandon1 Received: 27 August 2020 / Accepted: 29 August 2020 © Springer Nature Switzerland AG 2020

Abstract Data scientists use data to train models. Those models calculate probabilities to capture patterns in the data. It’s difficult to build ethical models when the available training data contains racism, sexism, or other stereotypes. Contact center data, including calls, chats, texts, and emails, is no exception. Instead of building a model to automate decision-making processes, we use the unethical findings from our model as an insight. We discuss debiasing options for removing racism from the model but find that removing this bias removes a crucial insight that an analyst deserves to know. By leaving the model with all the biases learned from the training data, we can provide better analytics. Analysts can recommend solutions that start to dismantle the systemic racism present in our society. Debiasing is not always appropriate. Censoring the model makes it harder to identify what can be done to prevent racism in our procedures and society. Keywords Ethics · NLP · Word embeddings · Debiasing

1 Introduction When a model performs poorly, it’s easy to blame the data. After all, the model simply captures patterns from the training data, like quick restaurant service correlating with a positive review. That model might be used to predict whether a new, unlabeled review is positive or negative. Models can also be used descriptively, showing insights to what might be causing the positive or negative reviews. When poor performance occurs, someone might blame the data. Maybe there’s not enough data to differentiate between positive and negative. Maybe there should be a category for neutral sentiment too. Perhaps data instances were labeled incorrectly, skewing the classifier in the wrong direction. There are plenty of ways the data can be incomplete, inconsistent, or inaccurate. Dirty data affects model performance, but model performance should not be the sole indicator of success. A model with a 20% accuracy score should not be put into production. A model that makes racist decisions 80% of the time should not either. That is to say, models and data can * Jamie Brandon [email protected] 1

CallMiner, Waltham, USA

be ethically dirty too. The model trained on unethical data may carry harmful notions about race, gender, etc. despite performing well on a test set. When a model captures the unethical bias from the training data, it’s easy to accidentally perpetuate harmful stereotypes. As practitioners, we can do more to protect marginalized groups. It’s not enough to simply blame the data for an unethical model. Data scientists usually carry no intention of building an unethical model. The bias exists in the training data, so the model captures that pattern. For example, when selecting features to build a model, someone might include zip code. They know that a person’s residen

Data Loading...

Using unethical data to build a more ethical world

Recommend Documents

Methane Production in a More Saline World

Global Inequality in a more educated world

Using Customer Segmentation to Build a Hybrid Recommendation Model

Empowering Teachers to Build a Better World How Six Nations Support

Leveraging Linked Data to Build Hypermedia-Driven Web APIs

Effects of Organizational Embeddedness on Unethical Pro-organizational Behavior: Roles of Perceived Status and Ethical L

How to Build Optimally Secure PRFs Using Block Ciphers

Attacking with Bitcoin: Using Bitcoin to Build Resilient Botnet Armies

Using Expert Elicitation to Build Long-Term Projection Assumptions

Hierarchical Quasi-Neural Network Data Aggregation to Build a University Research and Innovation Management System

Data Protection in a Profiled World

Object-Based Goal Recognition Using Real-World Data