Bias isn’t the only problem with credit scores—and no, AI can’t help

We already knew that biased data and biased algorithms skew automated decision-making in a way that disadvantages low-income and minority groups. For example, software used by banks to predict whether or not someone will pay back credit-card debt typically favors wealthier white applicants. Many researchers and a slew of start-ups are trying to fix the problem by making these algorithms more fair.

But in the biggest ever study of real-world mortgage data, economists Laura Blattner at Stanford University and Scott Nelson at the University of Chicago show that differences in mortgage approval between minority and majority groups is not just down to bias, but to the fact that minority and low-income groups have less data in their credit histories.

This means that when this data is used to calculate a credit score and this credit score used to make a prediction on loan default, then that prediction will be less precise. It is this lack of precision that leads to inequality, not just bias.

The implications are stark: fairer algorithms won’t fix the problem.

“It’s a really striking result,” says Ashesh Rambachan, who studies machine learning and economics at Harvard University, but was not involved in the study. Bias and patchy credit records have been hot issues for some time, but this is the first large-scale experiment that looks at loan applications of millions of real people.

Credit scores squeeze a range of socio-economic data, such as employment history, financial records, and purchasing habits, into a single number. As well as deciding loan applications, credit scores are now used to make many life-changing decisions, including decisions about insurance, hiring, and housing.

To work out why minority and majority groups were treated differently by mortgage lenders, Blattner and Nelson collected credit reports for 50 million anonymized US consumers, and tied each of those consumers to their socio-economic details taken from a marketing dataset, their property deeds and mortgage transactions, and data about the mortgage lenders who provided them with loans.

One reason this is the first study of its kind is that these datasets are often proprietary and not publicly available to researchers. “We went to a credit bureau and basically had to pay them a lot of money to do this,” says Blattner.

Noisy data

They then experimented with different predictive algorithms to show that credit scores were not simply biased but “noisy,” a statistical term for data that can’t be used to make accurate predictions. Take a minority applicant with a credit score of 620. In a biased system, we might expect this score to always overstate the risk of that applicant and that a more accurate score would be 625, for example. In theory, this bias could then be accounted for via some form of algorithmic affirmative action, such as lowering the threshold for approval for minority applications.

But Blattner and Nelson show that adjusting for bias had no effect. They found that a minority applicant’s score of 620 was indeed a poor proxy for her creditworthiness but that this was because the error could go both ways: a 620 might be 625, or it might be 615.

This difference may seem subtle, but it matters. Because the inaccuracy comes from noise in the data rather than bias in the way that data is used, it cannot be fixed by making better algorithms.

“It’s a self-perpetuating cycle,” says Blattner. “We give the wrong people loans and a chunk of the population never gets the chance to build up the data needed to give them a loan in the future.”

Blattner and Nelson then tried to measure how big the problem was. They built their own simulation of a mortgage lender’s prediction tool and estimated what would have happened if borderline applicants who had been accepted or rejected because of inaccurate scores had their decisions reversed. To do this they used a variety of techniques, such as comparing rejected applicants to similar ones who had been accepted, or looking at other lines of credit that rejected applicants had received, such as auto loans.

Putting all of this together, they plugged these hypothetical “accurate” loan decisions into their simulation and measured the difference between groups again. They found that when decisions about minority and low-income applicants were assumed to be as accurate as those for wealthier, white ones the disparity between groups dropped by 50%. For minority applicants, nearly half of this gain came from removing errors where the applicant should have been approved but wasn’t. Low income applicants saw a smaller gain because it was offset by removing errors that went the other way: applicants who should have been rejected but weren’t.

Blattner points out that addressing this inaccuracy would benefit lenders as well as underserved applicants. “The economic approach allows us to quantify the costs of the noisy algorithms in a meaningful way,” she says. “We can estimate how much credit misallocation occurs because of it.”

Righting wrongs

But fixing the problem won’t be easy. There are many reasons that minority groups have noisy credit data, says Rashida Richardson, a lawyer and researcher who studies technology and race at Northeastern University. “There are compounded social consequences where certain communities may not seek traditional credit because of distrust of banking institutions,” she says. Any fix will have to deal with the underlying causes. Reversing generations of harm will require myriad solutions, including new banking regulations and investment in minority communities: “The solutions are not simple because they must address so many different bad policies and practices.”

One option in the short term may be for the government simply to push lenders to accept the risk of issuing loans to minority applicants who are rejected by their algorithms. This would allow lenders to start collecting accurate data about these groups for the first time, which would benefit both applicants and lenders in the long run.

A few smaller lenders are starting to do this already, says Blattner: “If the existing data doesn’t tell you a lot, go out and make a bunch of loans and learn about people.” Rambachan and Richardson also see this as a necessary first step. But Rambachan thinks it will take a cultural shift for larger lenders. The idea makes a lot of sense to the data science crowd, he says. Yet when he talks to those teams inside banks they admit it’s not a mainstream view. “They’ll sigh and say there’s no way they can explain it to the business team,” he says. “And I’m not sure what the solution to that is.”

Blattner also thinks that credit scores should be supplemented with other data about applicants, such as bank transactions. She welcomes the recent announcement from a handful of banks, including JPMorgan Chase, that they will start sharing data about their customers’ bank accounts as an additional source of information for individuals with poor credit histories. But more research will be needed to see what difference this will make in practice. And watchdogs will need to ensure that greater access to credit does not go hand in hand with predatory lending behavior, says Richardson.

Many people are now aware of the problems with biased algorithms, says Blattner. She wants people to start talking about noisy algorithms too. The focus on bias—and the belief that it has a technical fix—means that researchers may be overlooking the wider problem.

Richardson worries that policymakers will be persuaded that tech has the answers when it doesn’t. “Incomplete data is troubling because detecting it will require researchers to have a fairly nuanced understanding of societal inequities,” she says. “If we want to live in an equitable society where everyone feels like they belong and are treated with dignity and respect, then we need to start being realistic about the gravity and scope of issues we face.”