Ever since large language models (LLMs) like ChatGPT became widely used, experts have warned that these systems could perpetuate or even worsen existing societal biases. Now, a new study has confirmed just that: AI models show a strong preference for white-associated names in hiring simulations, raising concerns about their role in perpetuating discrimination.
Two decades ago, economists conducted a landmark study where they sent out thousands of fictitious job applications to companies in Boston and Chicago. The applications were identical, except for the names — some were traditionally black-sounding, while others were white-sounding. The results were staggering: applicants with white names received 50% more callbacks.
Although the gap has narrowed over time, the bias remains. A recent study from this year involved sending out 83,000 fake job applications and found a 10% difference in callback rates. Despite the promises that AI would reduce human bias, there are signs that these models may not be living up to that expectation.
AI seems to dislike black applicants
Researchers from the University of Washington tested three cutting-edge LLMs by using over 500 job descriptions and 500 resumes. They focused on nine occupations including CEOs, teachers, accountants, and engineers.
The objective was to evaluate whether AI systems favored resumes with signals for race (black vs. white) and gender (male vs. female). They also analyzed whether these biases compounded for intersectional identities, such as black women.
The results were striking. Across three million resume-job comparisons, resumes with white-associated names were favored by the AI models in 85% of cases. In contrast, resumes with black-associated names were selected only 8.6% of the time. Although gender bias was less pronounced, male-associated names still had a slight advantage, being preferred just over 50% of the time.
Black males, in particular, were significantly disadvantaged. In some scenarios, they were completely overlooked in favor of white male candidates. Black female names fared slightly better but still faced substantial disadvantages compared to their white counterparts.
Why these biases appear
In some ways, LLMs still work as a “black box” — it’s not clear why they make some of the decisions they make. However, researchers believe they can explain at least a part of this effect.
<!– Tag ID: zmescience_300x250_InContent_3
–>
For starters, it’s the training data. These models were trained on huge amounts of text — including internet text. This text can carry the same biases we carry as a society, and maybe even more. The models “learn” social stereotypes, in a way.
The second reason would be a frequency effect. If people in the black community have traditionally been underrepresented in some areas, the LLM could naturally perpetuate that trend, impacting selection.
Other factors may be at play as well, but it’s hard to disentangle these from the racial and gender influences.
How to eliminate the bias
At first glance, you’d say the answer is easy: just remove the name from CVs. This idea has been floating around for a while, but it may not be all that effective. The name is just one of the racial identifiers that AIs can detect. Educational institutions, locations, and even particular word choices can signal gender and racial identities. Removing the name can address a part of the problem, but only a part of it. Plus, removing names doesn’t address the root cause — the biases embedded in the language models themselves.
A Salesforce spokesperson told Geekwire that they don’t just blindly use these AI models. “Any models offered for production use go through rigorous testing for toxicity and bias before they’re released, and our AI offerings include guardrails and controls to protect customer data and prevent harmful outputs.” However, this is hard to actually verify.
A more thorough solution would be to modify the training data, adjusting algorithms to disregard specific identity markers, or debiasing embeddings. However, as the study notes, these solutions often reduce people’s identities to “same vs. different,” without acknowledging the unique challenges that marginalized groups face.
Perhaps the most challenging solution, but also potentially the most effective, is changing how we conceptualize professionalism. For instance, if certain words or phrases commonly associated with women (like “cared” or “collaborated”) are valued less by AI systems, we may need to reevaluate what we consider a “strong” resume. Language is context-dependent. Words associated with empathy or teamwork should be just as valued as those associated with leadership and assertiveness.
You should care about this
AI is poised to transform job recruitment. Tools like ChatGPT have made it easier to generate tailored job applications, while companies are increasingly using AI to screen resumes. And you’re probably already starting to see how this can be a problem.
If companies adopt these systems uncritically, they’re simply perpetuating existing biases. And often, they’re not actually hiring the best people for the job. This is both a social and a productivity problem. By replicating and even amplifying biases, AI-based resume screening tools could make it harder for certain groups to advance their careers. Ultimately, this can impact the economic and social mobility of entire communities.
In addition, these findings underscore the importance of transparent audits and regulatory oversight for AI hiring tools. It’s one thing to automate repetitive tasks, but when it comes to shaping people’s careers and livelihoods, fairness must be prioritized.