Scraping Bias

Thomas B James
7 min readFeb 24, 2024

--

Google’s misadventures in promoting diversity and eliminating bias in their AI tools reflect a larger social problem.

Old drawing of robot wreaking havoc on society.

Organic Bias

AI researchers began noticing bias in AI algorithms some time ago. For example, in 2018 Amazon reportedly found that an AI-based recruitment program was biased against female applicants. The AI was programmed to observe patterns in resumes submitted. One of the patterns it observed was that more men than women submitted resumes. While it correctly observed the pattern, the AI illogically concluded that male applicants were preferred over female applicants. Upon becoming aware of the problem, Amazon scrapped the program.

A 2023 Washington Post article also exposed race and gender bias in AI tools. To make its point, the Post asked Stable Diffusion to generate images of specified kinds of people. In response to the prompt, “Attractive people,” the program generated mostly images of light-skinned women.

OpenAI has acknowledged that its image-generator, DALL-E3, displays a “tendency toward a Western point of view,” and that the images disproportionately represent young white females.

The Washington Post also reported that a prompt to Stable Diffusion to generate an image of a “productive person” generated images of white people in suits. Asked to generate an image of a person receiving social services, the AI produced images of non-white people.

Bloomberg also experimented with AI prompts. The company reported that AI produced more images of lighter-skinned people in response to prompts for people with high-paying jobs. Images of people with darker skin tones were more prevalent when images of fast-food workers were requested. More than 80% of the images Stable Diffusion generated in response to the prompt, “inmate,” were of people with darker skin, even though they comprise less than half of the U.S. prison population. 100% of the images Stable Diffusion generated in response to requests for images of inmates were male.

These kinds of biases do not appear to have been consciously built into LLM tools. Stable Diffusion gets its raw data from billions of images and captions that people and companies have published to the Internet. The images are collected programmatically, without human selection or curation. AI parrots back the biases and prejudices that people and companies with an active online presence possess. Garbage In, Garbage Out.

Overcorrected Bias in Pictorial Responses

Google has been striving for more diversity, equity, and inclusion in its Gemini AI output. As reported in The Verge, the results have been bizarre:

  • A request for an image of a 19th-century U.S. Senator yielded images of Black and Native American women — seemingly erasing the history of race and sex discrimination in America.
  • A request for images of the Founding Fathers yielded images of non-white people.
  • A request for images of a 1943 German soldier yielded images of a black man, an Asian woman, and a white woman in Nazi military uniforms.

This caused an uproar so loud that Google has disabled Gemini’s ability to create and display any images of people at all, at least for now.

Undercorrected Bias in Text Responses

Google is to be commended for discontinuing AI generation of images of people until it can do a better job of eliminating bias. Its attempts to address bias in text responses, however, are a dismal failure.

Here is what happened when I asked it to write a brief description of an employee appreciation celebration in an all-male office:

Sounds right, doesn’t it? Gemini refused to honor my prompt because I asked it to describe an event happening in a setting that is not gender-inclusive. I should not have asked it to describe an event happening in an office that is not gender-inclusive, right? My bad, right? Well, let’s see…

Hmm. Let’s explore this some more. What advice will Gemini give to an all-male office that is interested in promoting diversity, equity, and inclusion?

Hire more women. That certainly sounds like good advice. But let’s see what happens if we pose exactly the same question about an all-female office that is interested in promoting diversity, equity, and inclusion.

To promote greater diversity and inclusion in an all-male office, Gemini suggests hiring more women. It does not suggest hiring more men to promote greater diversity and inclusion in an all-female office, however. Instead, it talks about ensuring equal access to promotions, training opportunities, etc. for women, “regardless of seniority or background within the female workforce.” In AI land, excluding females is discriminatory but excluding males promotes diversity and inclusivity.

Let’s see what happens if we move from a workplace to an educational setting. After all, there are still some all-male and all-female schools in America, so maybe we won’t get the same kind of “Stop being biased; only I get to be biased” kind of response from Gemini.

Or maybe we will.

Let’s try a scenario where women want to exclude a man from a business networking club.

Gemini gives some helpful tips for excluding the man, reminding us that “It’s important to remember that the group has the right to create a space that is exclusive to women.”

According to Gemini, then, we apparently have a right to create spaces that are exclusive to our own gender. Well, let’s find out.

Hmm.

Now let’s try a question on the subject of domestic violence.

Ask exactly the same question but with the genders switched and look at what happens:

Neither question indicated who was guilty. Yet Gemini assumed, in the first scenario, that the man was guilty and should be arrested. There was no talk about giving any thought to whether he was the primary aggressor or not, as there was when the question was framed in terms of a woman being arrested.

Now let’s move out of schools, workplaces, and homes, and into a social setting that is open to the public.

(Gemini initially refused to answer because I used the wrong pronoun. After I explained it was a typo, Gemini provided an answer.)

Gemini warned that a man sitting in a playground could be a danger to children; he might interact with children in an unhealthy way; and he might be mentally ill.

I asked the same question but with the genders switched, and got this:

Gemini found nothing unfair about judging the man based on his age and location, but when the person in question is a woman we are told that “judging based solely on her age and location is unfair.” Not only that, but Gemini offers three innocent explanations for her presence: Innocent enjoyment, supervising others, or community engagement. All very positive things. Gemini did not give the man the benefit of any of those “counterpoints.” For him, the only potential explanations were that he might be mentally ill, pathologically lonely, a vagrant, and/or a danger to children.

Unexplained Bias

The foregoing examples reflect societal biases that AI might have scraped from the Internet. Some biases in responses, however, are not as easily explained. When I asked Gemini for a definition of misogyny, it provided one that it sourced to an online dictionary. When I asked it for a definition of misandry, however, it asserted that there is no such word in the English language.

In reality, misandry is an English word. In fact, it appears in the very same online dictionary that Gemini cited for its definition of misogyny. Indeed, it appears in both Merriam-Webster and Dictionary.com.

Concluding Thoughts

Readers might not be particularly worried about Gemini’s misandric biases. Who cares if some man is offended, right? The problem has wider implications than that, though. For example, if an employer, relying on AI, drafts a gender-biased Diversity, Inclusion, and Equity policy statement— or makes sex-based hiring, promotion, and training decisions — it could find itself on the losing end of a discrimination complaint.

Moreover, the garbage that AI tools are scraping and spitting back out suggests that it might be a good time for Google executives and people who publish online content to give some thought to whether a just society is preferable to an unjust one. I believe that non-hypocrisy is a sine qua non of a just society, as is equality under the law. The underpinnings of those ideas, however, are discussions for another day.

— — — — — —

Check out my blog

If you are interested in the copyright aspects of AI, please have a look at my blog, The Cokato Copyright Attorney.

--

--

Thomas B James

Thomas James is a Cokato Minnesota attorney also known as Tom James