How bias in AI can damage marketing data and what you can do about it
Mitigating bias in AI is essential for marketers who want to work with the best possible data. Here's what you need to know
Algorithms are at the heart of marketing and martech. They power the artificial intelligence used for data analysis, data collection, audience segmentation and much, much more. Marketers rely on the AI to provide neutral, reliable data. They don’t always do that.
We like to think of algorithms as sets of rules without bias or intent. In themselves, that’s exactly what they are. They don’t have opinions. But those rules are built on the suppositions and values of their creator. That’s one way bias gets into AI. The other and perhaps more important way is through the data it is trained on.
Dig deeper: Bard and ChatGPT will ultimately make the search experience better
For example, facial recognition systems are trained on sets of images of mostly lighter-skinned people. As a result they are notoriously bad at recognizing darker-skinned people. In one instance, 28 members of Congress, disproportionately people of color, were incorrectly matched with mugshot images. The failure of attempts to correct this has led some companies, most notably Microsoft, to stop selling these systems to police departments.
ChatGPT, Google’s Bard and other AI-powered chatbots are autoregressive language models using deep learning to produce text. That learning is trained on a huge data set, possibly encompassing everything posted on the internet during a given time period — a data set riddled with error, disinformation and, of course, bias.
Only as good as the data it gets
“If you give it access to the internet, it inherently has whatever bias exists,” says Paul Roetzer, founder and CEO of The Marketing AI Institute. “It’s just a mirror on humanity in many ways.”
The builders of these systems are aware of this.
“In [ChatGPT creator] OpenAI’s disclosures and disclaimers they say negative sentiment is more closely associated with African American female names than any other name set within there,” says Christopher Penn, co-founder and chief data scientist at TrustInsights.ai. “So if you have any kind of fully automated black box sentiment modeling and you’re judging people’s first names, if Letitia gets a lower score than Laura, you have a problem. You are reinforcing these biases.”
OpenAI’s best practices documents also says, “From hallucinating inaccurate information, to offensive outputs, to bias, and much more, language models may not be suitable for every use case without significant modifications.”
What’s a marketer to do?
Mitigating bias is essential for marketers who want to work with the best possible data. Eliminating it will forever be a moving target, a goal to pursue but not necessarily achieve.
“What marketers and martech companies should be thinking is, ‘How do we apply this on the training data that goes in so that the model has fewer biases to start with that we have to mitigate later?’” says Christopher Penn. “Don’t put garbage in, you don’t have to filter garbage out.”
There are tools to help eliminate bias. Here are five of the best known:
- What-If from Google is an open source tool to help detect the existence of bias in a model by manipulating data points, generating plots and specifying criteria to test if changes impact the end result.
- AI Fairness 360 from IBM is an open-source toolkit to detect and eliminate bias in machine learning models.
- Fairlearn from Microsoft designed to help with navigating trade-offs between fairness and model performance.
- Local Interpretable Model-Agnostic Explanations (LIME) created by researcher Marco Tulio Ribeiro lets users manipulate different components of a model to better understand and be able to point out the source of bias if one exists.
- FairML from MIT’s Julius Adebayo is an end-to-end toolbox for auditing predictive models by quantifying the relative significance of the model’s inputs.
“They are good when you know what you’re looking for,” says Penn. “They are less good when you’re not sure what’s in the box.”
Judging inputs is the easy part
For example, he says, with AI Fairness 360, you can give it a series of loan decisions and a list of protected classes — age, gender, race, etc. It can then identify any biases in the training data or in the model and sound an alarm when the model starts to drift in a direction that’s biased.
“When you’re doing generation it’s a lot harder to do that, particularly if you’re doing copy or imagery,” Penn says. “The tools that exist right now are mainly meant for tabular rectangular data with clear outcomes that you’re trying to mitigate against.”
The systems that generate content, like ChatGPT and Bard, are incredibly computing-intensive. Adding additional safeguards against bias will have a significant impact on their performance. This adds to the already difficult task of building them, so don’t expect any resolution soon.
Can’t afford to wait
Because of brand risk, marketers can’t afford to sit around and wait for the models to fix themselves. The mitigation they need to be doing for AI-generated content is constantly asking what could go wrong. The best people to be asking that are from the diversity, equity and inclusion efforts.
“Organizations give a lot of lip service to DEI initiatives,” says Penn, “but this is where DEI actually can shine. [Have the] diversity team … inspect the outputs of the models and say, ‘This is not OK or this is OK.’ And then have that be built into processes, like DEI has given this its stamp of approval.”
How companies define and mitigate against bias in all these systems will be significant markers of its culture.
“Each organization is going to have to develop their own principles about how they develop and use this technology,” says Paul Roetzer. “And I don’t know how else it’s solved other than at that subjective level of ‘this is what we deem bias to be and we will, or will not, use tools that allow this to happen.”
Related stories