Predictive policing is still racist—whatever data it uses

It’s no secret that predictive policing tools are racially biased. A number of studies have shown that racist feedback loops can arise if algorithms are trained on police data, such as arrests. But new research shows that training predictive tools in a way meant to lessen bias has little effect.

Arrest data biases predictive tools because police are known to arrest more people in Black and other minority neighborhoods, which leads algorithms to direct more policing to those areas, which leads to more arrests. The result is that predictive tools misallocate police patrols: some neighborhoods are unfairly designated crime hot spots while others are underpoliced.

In their defense, many developers of predictive policing tools say that they have started using victim reports to get a more accurate picture of crime rates in different neighborhoods. In theory, victim reports should be less biased because they aren’t affected by police prejudice or feedback loops.

But Nil-Jana Akpinar and Alexandra Chouldechova at Carnegie Mellon University show that the view provided by victim reports is also skewed. The pair built their own predictive algorithm using the same model found in several popular tools, including PredPol, the most widely used system in the US. They trained the model on victim report data for Bogotá, Colombia, one of very few cities for which independent crime reporting data is available at a district-by-district level.

When they compared their tool’s predictions against actual crime data for each district, they found that it made significant errors. For example, in a district where few crimes were reported, the tool predicted around 20% of the actual hot spots—locations with a high rate of crime. On the other hand, in a district with a high number of reports, the tool predicted 20% more hot spots than there really were.

For Rashida Richardson, a lawyer and researcher who studies algorithmic bias at the AI Now Institute in New York, these results reinforce existing work that highlights problems with data sets used in predictive policing. “They lead to biased outcomes that do not improve public safety,” she says. “I think many predictive policing vendors like PredPol fundamentally do not understand how structural and social conditions bias or skew many forms of crime data.”

So why did the algorithm get it so wrong? The problem with victim reports is that Black people are more likely to be reported for a crime than white. Richer white people are more likely to report a poorer Black person than the other way around. And Black people are also more likely to report other Black people. As with arrest data, this leads to Black neighborhoods being flagged as crime hot spots more often than they should be.

Other factors distort the picture too. “Victim reporting is also related to community trust or distrust of police,” says Richardson. “So if you are in a community with a historically corrupt or notoriously racially biased police department, that will affect how and whether people report crime.” In this case, a predictive tool might underestimate the level of crime in an area, so it will not get the policing it needs.

No quick fix

Worse, there’s still no obvious technical fix. Akpinar and Chouldechova tried to adjust their Bogotá model to account for the biases they observed but did not have enough data to make much difference—despite there being more district-level data for Bogotá than for any US city. “Ultimately, it is unclear if mitigating the bias in this case is any easier than previous efforts that worked to debias arrest-data-based systems,” says Akpinar.

What can be done? Richardson thinks that public pressure to dismantle racist tools and the policies behind them is the only answer. “It is just a question of political will,” she says. She notes that early adopters of predictive policing tools, like Santa Cruz, have announced they will no longer use them and that there have been scathing official reports on the use of predictive policing by the LAPD and Chicago PD. “But the responses in each city were different,” she says.

Chicago suspended the use of predictive policing but reinvested in a database for policing gangs, which Richardson says has many of the same problems.

“It is concerning that even when government investigations and reports find significant issues with these technologies, it is not enough for politicians and police officials to say it shouldn’t be used,” she says.