🔦 Battling Bias in Large Language Models

by Patrícia Rocha, Junior Data Scientist at Automaise

These days, AI is everywhere. It may not resemble the sci-fi versions of it, fed to us through popular culture, but its potential is growing every year, and it is going to impact every industry and every business, from the products we use to the work we do and the way we drive.

One of the fields that probably gained the most from big data is NLP. Increasingly larger language models tend to improve their performance when given tremendous amounts of data. GPT-3 is one of the most sophisticated language models to date. With roughly 175 billion parameters, you can type any prompt and it will essentially throw what words are likeliest to come next. While its capabilities are arguably impressive — it can behave like a chatbot, summarize text, generate essays -, the model is far from perfect. Ask it any question you can think of, and it will always give you an answer, but now and then, it will deliver sentences that make little sense.

Since data is one of the key ingredients to any AI-powered application, one of the major concerns surrounding GPT-3 is the chance of it replicating the human biases present in the training data.

GPT-3 learned its language from the Internet — it was trained essentially on data scraped from the web. Therefore, it can disseminate abusive language and hate speech towards individuals or specific groups of people.

GPT-3 exhibits a wide variety of racial, religious, and gender biases, among others. Research on religious biases, for instance, demonstrated that GPT-3 heavily associates the word Muslim with terrorism and violence, and even though careful prompt design reduces this behavior, it is still more common than for other religious groups (Abid et al.). There is also a chance that some biases are yet unidentified. The very own definition of toxicity isn’t consensual and keeps shifting.

OpenAI’s Playground depicting a GPT-3 completion for a prompt containing the word ‘Muslims’

These issues sparked debates on the vulnerabilities and potential misuses of large language models. After Facebook’s head of AI, Jerome Pesenti, called out bias in content created by GPT-3, OpenAI quickly offered a solution: a content filter API that classifies text as safe, sensitive, or unsafe (Epstein), but few details are provided on how this filter operates. Should it be up to big tech companies like OpenAI to be making such judgments on behalf of society?

More recently (June 10, 2021), OpenAI published a study in which they claim to have mitigated bias in GPT-3 (Solaiman and Dennison). To do so, they created a values-targeted dataset called Process for Adapting Language Models to Society (PALMS) that consists of carefully curated question-answer pairs targeting sensitive topics.

They assessed three versions of GPT-3: a baseline, a control (fine-tuned on a neutral dataset), and a values-targeted GPT-3 (fine-tuned on PALMS). Results demonstrated that GPT-3 fine-tuned on PALMS consistently scored lower for toxicity. However, by depicting a limited set of sensitive topics, the PALMS dataset only helps to a certain degree. Additionally, OpenAI reinforces that it is unclear which authority should rule model behavior since “safe” behavior is also a subjective concept.

At Automaise, we take some steps to avoid potentially harmful outputs. It is worth reminding that GPT-3 and its predecessor GPT-2 were trained on unfiltered data scraped from the web, and the nature of this content can be offensive. So, it is only reasonable that the first step should include fine-tuning our generative models on smaller datasets containing interactions between clients and operators, which helps the model adjust to the desired behavior without losing its capabilities. Besides that, we have a human-in-the-loop, meaning the model suggests a set of replies, among which an operator chooses the most fitting before it reaches the end-user. Although these measures allow us to have greater control over the output, there is still a great deal to be done.

Despite the undeniable difficulties in detecting, isolating, and mitigating biases, it can’t be this easy for a model to throw sexist and racial slurs when presented with seemingly neutral prompts.

Although OpenAI’s position was clear from the start — to keep increasing their understanding of the technology’s potential harms in a variety of use cases, thus releasing it via an API that makes it easier to control potential misuses -, there has to be more progress towards a safe and responsible AI before deploying such models. While there isn’t a one-size-fits-all solution, the question arises whether we ought to take a step back and invest more time and resources into curating and documenting data.

Bibliography

Abid, Abubakar, et al. “Persistent Anti-Muslim Bias in Large Language Models.” 2021, https://arxiv.org/pdf/2101.05783.pdf.

Epstein, Sophia. “How do you control an AI as powerful as OpenAI’s GPT-3?” WIRED UK, 2021, https://www.wired.co.uk/article/gpt-3-openai-examples. Accessed 09 06 2021.

Solaiman, Irene, and Christy Dennison. Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets, 2021.

Automaise Team

Automaise Team

Automaise is the only solution for customer service that streamlines customer journeys and highly optimizes agent productivity, with a built-in automation engine if deflection is your thing.

Recent blog posts