Paper Spotlight

🔦 Battling Bias in Large Language Models

13/06/2021
3 mins 24 secs

Home

Paper Spotlight

🔦 Battling Bias in Large Language Models

by Patrícia Rocha, Junior Data Scientist at Automaise

These days, AI is everywhere. It may not resemble the sci-fi versions of it, fed to us through popular culture, but its potential is growing every year, and it is going to impact every industry and every business, from the products we use to the work we do and the way we drive.

One of the fields that probably gained the most from big data is NLP. Increasingly larger language models tend to improve their performance when given tremendous amounts of data. GPT-3 is one of the most sophisticated language models to date. With roughly 175 billion parameters, you can type any prompt and it will essentially throw what words are likeliest to come next. While its capabilities are arguably impressive — it can behave like a chatbot, summarize text, generate essays -, the model is far from perfect. Ask it any question you can think of, and it will always give you an answer, but now and then, it will deliver sentences that make little sense.

Since data is one of the key ingredients to any AI-powered application, one of the major concerns surrounding GPT-3 is the chance of it replicating the human biases present in the training data.

GPT-3 learned its language from the Internet — it was trained essentially on data scraped from the web. Therefore, it can disseminate abusive language and hate speech towards individuals or specific groups of people.

GPT-3 exhibits a wide variety of racial, religious, and gender biases, among others. Research on religious biases, for instance, demonstrated that GPT-3 heavily associates the word Muslim with terrorism and violence, and even though careful prompt design reduces this behavior, it is still more common than for other religious groups (Abid et al.). There is also a chance that some biases are yet unidentified. The very own definition of toxicity isn’t consensual and keeps shifting.

1*cs3XfbcbJELJMMW i0Fs Q — OpenAI’s Playground depicting a GPT-3 completion for a prompt containing the word ‘*Muslims’*

These issues sparked debates on the vulnerabilities and potential misuses of large language models. After Facebook’s head of AI, Jerome Pesenti, called out bias in content created by GPT-3, OpenAI quickly offered a solution: a content filter API that classifies text as safe, sensitive, or unsafe (Epstein), but few details are provided on how this filter operates. Should it be up to big tech companies like OpenAI to be making such judgments on behalf of society?

More recently (June 10, 2021), OpenAI published a study in which they claim to have mitigated bias in GPT-3 (Solaiman and Dennison). To do so, they created a values-targeted dataset called Process for Adapting Language Models to Society (PALMS) that consists of carefully curated question-answer pairs targeting sensitive topics.

They assessed three versions of GPT-3: a baseline, a control (fine-tuned on a neutral dataset), and a values-targeted GPT-3 (fine-tuned on PALMS). Results demonstrated that GPT-3 fine-tuned on PALMS consistently scored lower for toxicity. However, by depicting a limited set of sensitive topics, the PALMS dataset only helps to a certain degree. Additionally, OpenAI reinforces that it is unclear which authority should rule model behavior since “safe” behavior is also a subjective concept.

At Automaise, we take some steps to avoid potentially harmful outputs. It is worth reminding that GPT-3 and its predecessor GPT-2 were trained on unfiltered data scraped from the web, and the nature of this content can be offensive. So, it is only reasonable that the first step should include fine-tuning our generative models on smaller datasets containing interactions between clients and operators, which helps the model adjust to the desired behavior without losing its capabilities. Besides that, we have a human-in-the-loop, meaning the model suggests a set of replies, among which an operator chooses the most fitting before it reaches the end-user. Although these measures allow us to have greater control over the output, there is still a great deal to be done.

Despite the undeniable difficulties in detecting, isolating, and mitigating biases, it can’t be this easy for a model to throw sexist and racial slurs when presented with seemingly neutral prompts.

Although OpenAI’s position was clear from the start — to keep increasing their understanding of the technology’s potential harms in a variety of use cases, thus releasing it via an API that makes it easier to control potential misuses -, there has to be more progress towards a safe and responsible AI before deploying such models. While there isn’t a one-size-fits-all solution, the question arises whether we ought to take a step back and invest more time and resources into curating and documenting data.

Bibliography

Abid, Abubakar, et al. “Persistent Anti-Muslim Bias in Large Language Models.” 2021, https://arxiv.org/pdf/2101.05783.pdf.

Epstein, Sophia. “How do you control an AI as powerful as OpenAI’s GPT-3?” WIRED UK, 2021, https://www.wired.co.uk/article/gpt-3-openai-examples. Accessed 09 06 2021.

Solaiman, Irene, and Christy Dennison. Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets, 2021.

Share this post

Paper Spotlight

Automaise Team

Automaise stands as the premier customer service solution, seamlessly orchestrating customer journeys while profoundly enhancing agent productivity. Automaise’s state-of-the-art no-code AI platform empowers businesses to take their business to the next level.

AI

21/05/2025
4m

No One Wants to Wait — and Brands Can’t Afford to Ask

We live in an age of instant expectations. Same-day delivery, real-time updates, everything just a click away. The idea of waiting is no longer just inconvenient — it’s become unacceptable.

AI

05/05/2025
9m

A revolução da Inteligência Artificial no sector bancário: da amplificação à personalização

O sector bancário enfrenta um dos momentos mais desafiadores e promissores da sua história. A Inteligência Artificial (IA), em particular as tecnologias generativas, emerge como um divisor de águas. Por Rogério Canhoto

News

10/04/2025
2m

Rogério Canhoto joins Automaise as Chief Revenue Officer to accelerate growth and international expansion

An expert in Artificial Intelligence and digital transformation, the new Chief Revenue Officer will drive Automaise’s growth and internationalization strategy.

Conversational AI

Agent Assist

Case Automation

AI Workflows

Meet Automaise's AI Agents!

Partners Portal

Automation Studio

AI Studio

Agent Portal

Start improving customer service now!

Ageas

Bizay

CTT

Sonae MC

Altice

Novobanco

Start improving customer service now!

Blog

News

Automaise Videos

Start improving customer service now!

Paper Spotlight

🔦 Battling Bias in Large Language Models

These days, AI is everywhere. It may not resemble the sci-fi versions of it, fed to us through popular culture, but its potential is growing every year, and it is going to impact every industry and every business, from the products we use to the work we do and the way we drive.

Paper Spotlight

Automaise Team

Read more articles

AI

No One Wants to Wait — and Brands Can’t Afford to Ask

AI

A revolução da Inteligência Artificial no sector bancário: da amplificação à personalização

News

Rogério Canhoto joins Automaise as Chief Revenue Officer to accelerate growth and international expansion

Automaise stands as the premier customer service automation solution, seamlessly orchestrating customer journeys while profoundly enhancing agent productivity. Automaise’s state-of-the-art no-code AI platform empowers businesses to take their business to the next level.

Follow Us

Products

Conversational AI

Automaise OS

Resources

Case Studies

Insurance/Banking

Retail

Contact Center/Telco/Logistics

Industry