The complex math of counterfactuals could help Spotify pick your next favorite song

A new kind of machine-learning model built by a team of researchers at the music-streaming firm Spotify captures for the first time the complex math behind counterfactual analysis, a precise technique that can be used to identify the causes of past events and predict the effects of future ones.

The model, described earlier this year in the scientific journal Nature Machine Intelligence, could improve the accuracy of automated decision making, especially personalized recommendations, in a range of applications from finance to health care.

The basic idea behind counterfactuals is to ask what would have happened in a situation had certain things been different. It’s like rewinding the world, changing a few crucial details, and then hitting play to see what happens. By tweaking the right things, it’s possible to separate true causation from correlation and coincidence.

“Understanding cause and effect is super important for decision making,” says Ciaran Gilligan-Lee, leader of the Causal Inference Research Lab at Spotify, who co-developed the model. “You want to understand what impact a choice you take now will have on the future.”

In Spotify’s case, that might mean choosing what songs to show you or when artists should drop a new album. Spotify isn’t yet using counterfactuals, says Gilligan-Lee. “But they could help answer questions that we deal with every day.”

Counterfactuals are intuitive. People often make sense of the world by imagining how things would have played out if this had happened instead of that. But they are monstrous put into math.

“Counterfactuals are very strange-looking statistical objects,” says Gilligan-Lee. “They’re weird things to contemplate. You’re asking the likelihood of something occurring given that it didn’t occur.”

Gilligan-Lee and his coauthors started working together after reading about each other’s work in a MIT Technology Review story. They based their model on a theoretical framework for counterfactuals called twin networks.

Twin networks were invented in the 1990s by the computer scientists Andrew Balke and Judea Pearl. In 2011, Pearl won the Turing Award—computer science’s Nobel Prize—for his work on causal reasoning and artificial intelligence.

Pearl and Balke used twin networks to work through a handful of simple examples, says Gilligan-Lee. But applying the mathematical framework to larger and more complicated real-world cases by hand is hard.

That’s where machine learning comes in. Twin networks treat counterfactuals as a pair of probabilistic models: one representing the actual world, the other representing the fictional one. The models are linked in such a way that the model of the actual world constrains the model of the fictional one, keeping it the same in every way except for the facts you want to change.

Gilligan-Lee and his colleagues used the framework of twin networks as a blueprint for a neural network and then trained it to make predictions about how events would play out in the fictional world. The result is a general-purpose computer program for doing counterfactual reasoning. “It lets you answer any counterfactual question about a scenario that you want,” says Gilligan-Lee.

Dirty water

The Spotify team tested their model using several real-world case studies, including one looking at credit approval in Germany, one looking at an international clinical trial for stroke medication, and another looking at the safety of the water supply in Kenya.

In 2020 researchers investigated whether installing pipes and concrete containers to protect springs from bacterial contamination in a region of Kenya would reduce levels of childhood diarrhea. They found a positive effect. But you need to be sure what caused it, says Gilligan-Lee. Before installing concrete walls around wells across the country, you need to be sure that the drop in sickness was in fact caused by that intervention and not a side effect of it.

It’s possible that when researchers came in to do the study and install concrete walls around the wells, it made people more aware of the risks of contaminated water and they started boiling it at home. In that case, “education would be a cheaper way to scale up the intervention,” says Gilligan-Lee.

Gilligan-Lee and his colleagues ran this scenario through their model, asking whether children who got sick after drinking from an unprotected well in the actual world also got sick after drinking from a protected well in the fictional world. They found that changing just the detail of where the child drank and maintaining other conditions, such as how the water was treated at home, did not have a significant impact on the outcome, suggesting that the reduced levels of childhood diarrhea were not (directly) caused by installing pipes and concrete containers.

This replicates the result of the 2020 study, which also used counterfactual reasoning. But those researchers built a bespoke statistical model by hand just to ask that one question, says Gilligan-Lee. In contrast, the Spotify team’s machine-learning model is general purpose and can be used to ask multiple counterfactual questions about many different scenarios.

Spotify is not the only tech company racing to build machine-learning models that can reason about cause and effect. In the last few years, firms such as Meta, Amazon, LinkedIn, and TikTok’s owner ByteDance have also begun to develop the technology.

“Causal reasoning is critical for machine learning,” says Nailong Zhang, a software engineer at Meta. Meta is using causal inference in a machine-learning model that manages how many and what kinds of notifications Instagram should send its users to keep them coming back.

Romila Pradhan, a data scientist at Purdue University in Indiana, is using counterfactuals to make automated decision making more transparent. Organizations now use machine-learning models to choose who gets credit, jobs, parole, even housing (and who doesn’t). Regulators have started to require organizations to explain the outcome of many of these decisions to those affected by them. But reconstructing the steps made by a complex algorithm is hard.

Pradhan thinks counterfactuals can help. Let’s say a bank’s machine-learning model rejects your loan application and you want to know why. One way to answer that question is with counterfactuals. Given that the application was rejected in the actual world, would it have been rejected in a fictional world in which your credit history was different? What about if you had a different zip code, job, income, and so on? Building the ability to answer such questions into future loan approval programs, Pradhan says, would give banks a way to offer customers reasons rather than just a yes or no.

Counterfactuals are important because it’s how people think about different outcomes, says Pradhan: “They are a good way to capture explanations.”

They can also help companies predict people’s behavior. Because counterfactuals make it possible to infer what might happen in a particular situation, not just on average, tech platforms can use it to pigeonhole people with more precision than ever.

The same logic that can disentangle the effects of dirty water or lending decisions can be used to hone the impact of Spotify playlists, Instagram notifications, and ad targeting. If we play this song, will that user listen for longer? If we show this picture, will that person keep scrolling? “Companies want to understand how to give recommendations to specific users rather than the average user,” says Gilligan-Lee.