Algorithms Are No Better at Predicting Repeat Offenders Than Inexperienced Humans
This is telling.
Recidivism is the likelihood of a person convicted of a crime to offend again. Currently, this rate is determined by predictive algorithms. The outcome can affect everything from sentencing decisions to whether or not a person receives parole.
To determine how accurate these algorithms actually are in practice, a team led by Dartmouth College researchers Julia Dressel and Hany Farid conducted a study of a widely-used commercial risk assessment software known as Correctional Offender Management Profiling for Alternative Sanctions (COMPAS). The software determines whether or not a person will re-offend within two years following their conviction.
The study revealed that COMPAS is no more accurate than a group of volunteers with no criminal justice experience at predicting recidivism rates. Dressel and Farid crowdsourced a list of volunteers from a website, then randomly assigned them small lists of defendants. The volunteers were told each defendant’s sex, age, and previous criminal history then asked to predict whether they would re-offend within the next two years.
The accuracy of the human volunteer’s predictions included a mean of 62.1 percent and a median of 64.0 percent — very close to COMPAS’ accuracy, which is 65.2 percent.
Additionally, researchers found that even though COMPAS has 137 features, linear predictors with just two features (the defendant’s age and their number of previous convictions) worked just as well for predicting recidivism rates.
The Problem of Bias
One area of concern for the team was the potential for algorithmic bias. In their study, both human volunteers and COMPAS exhibited similar false positive rates when predicting recidivism for black defendants — even though they didn’t know the defendant’s race when they were making their predictions. The false positive rate for black defendants was 37 percent, whereas it was 27 percent for white defendants. These rates were fairly close to those from COMPAS: 40 percent for black defendants and 25 percent for white defendants.
In the paper’s discussion, the team pointed out that “differences in the arrest rate of black and white defendants complicate the direct comparison of false-positive and false-negative rates across race.” This is backed up by NAACP data which, for example, has found that “African Americans and whites use drugs at similar rates, but the imprisonment rate of African Americans for drug charges is almost 6 times that of whites.”
The authors noted that even though a person’s race was not explicitly stated, certain aspects of the data could potentially correlate to race, leading to disparities in the results. In fact, when the team repeated the study with new participants and did provide racial data, the results were about the same. The team concluded that “the exclusion of race does not necessarily lead to the elimination of racial disparities in human recidivism prediction.”
COMPAS has been used to evaluate over 1 million people since it was developed in 1998 (though its recidivism prediction component wasn’t included until 2000). With that context in mind, the study’s findings — that a group of untrained volunteers with little to no experience in criminal justice perform on par with the algorithm — were alarming.
The obvious conclusion would be that the predictive algorithm is simply not sophisticated enough and is long overdue to be updated. However, when the team was ready to validate their findings, they trained a more powerful nonlinear support vector machine (NL-SVM) with the same data. When it produced very similar results, the team faced backlash, as it was assumed they had trained the new algorithm too closely to the data.
Dressel and Farid said they specifically trained the algorithm on 80 percent of the data, then ran their tests on the remaining 20 percent in order to avoid so-called “over-fitting” — when an algorithm’s accuracy is affected because it’s become too familiar with the data.
The researchers concluded that perhaps the data in question is not linearly separable, which could mean that predictive algorithms, no matter how sophisticated, are simply not an effective method for predicting recidivism. Considering that defendants’ futures hang in the balance, the team at Dartmouth asserted that the use of such algorithms to make these determinations should be carefully considered.
As they stated in the study’s discussion, the results of their study show that to rely on an algorithm for that assessment is no different than putting the decision “in the hands of random people who respond to an online survey because, in the end, the results from these two approaches appear to be indistinguishable.”
“Imagine you’re a judge, and you have a commercial piece of software that says we have big data, and it says this person is high risk,” Farid told Wired, “Now imagine I tell you I asked 10 people online the same question, and this is what they said. You’d weigh those things differently.”
Predictive algorithms aren’t just used in the criminal justice system. In fact, we encounter them every day: from products advertised to us online to music recommendations on streaming services. But an ad popping up in our newsfeed is of far less consequence than the decision to convict someone of a crime.