For a number of years now, privacy law scholars have been writing, discussing, and worrying about the effect of big data on different aspects of our lives. Last year my own law school hosted a conference on big data, which covered government regulation of big data, its economic impact, and its effect on industries as diverse as health, education, and city planning. However, up until recently there has not been much discussion about the use of big data in the criminal law context. This is now starting to change, with a handful of articles addressing the inevitable future when courts begin to consider the use of big data in various aspects of the criminal justice system.
First, a definition: when people talk about big data, they are usually referring to the practice of accumulating extraordinarily large amounts of information from a variety of different sources and then processing that information to learn new information or provide valuable services. Private companies have been using big data for quite some time now. Retailers use it to determine customer behavior and affect shopping habits (As reported in a famous New York Times magazine cover story, Target uses large amounts of seemingly random purchasing data to determine that customers are pregnant, so that the store cab send the customers coupons for pregnancy and new baby items). Insurance companies rely on big data to try to determine who the safest drivers and healthiest people are. And all sorts of companies buy and sell this data to each other, seeking to mine it for information about their customers that they can use for economic advantage.
The two most intriguing aspects of big data as it relates to criminal law are (1) it can reveal otherwise unknowable information about individuals from public sources; and (2) it can predict future behavior. These two facts make it very likely that big data will revolutionize the criminal justice system over the next decade. Police have already been using massive amounts of data to help decide where to deploy resources, as exemplified by the famous crime mapping software found in police COMPSTAT programs. And the NSA’s massive metadata collection program, which is currently being reviewed by various district courts (see here and here), is another example of law enforcement trying to collect, analyze, and use big data to try to detect criminal activity–perhaps in violation of the Fourth Amendment. But as the amount of data about individuals grows and becomes more and more accessible, we will see big data being used at every stage of the criminal justice system.
The next use of big data will probably be with regard to Terry stops. Professor Andrew Ferguson of the University of the District of Columbia Law School wrote about this in a recent article in the University of Pennsylvania Law Review entitled “Big Data and Predictive Reasonable Suspicion.” As Professor Ferguson notes, Terry was originally developed (and has so far been applied) in a “small data” context, in which police officers use their own individual observations of the suspect, perhaps combined with their knowledge of the neighborhood, to develop reasonable suspicion for a stop. But the increasingly networked amount of information about individuals, combined with the speed at which law enforcement can now access this information, allows police to generate useful information about any individual they may see on the street. Professor Ferguson re-imagines Detective McDadden observing John Terry in a modern day setting:
He observes John Terry and, using facial recognition technology, identifies him and begins to investigate using big data. Detective McFadden learns through a database search that Terry has a prior criminal record, including a couple of convictions and a number of arrests. McFadden learns, through pattern–matching links, that Terry is an associate (a “hanger on”) of a notorious, violent local gangster—Billy Cox—who had been charged with several murders. McFadden also learns that Terry has a substance abuse problem and is addicted to drugs. These factors—all true, but unknown to the real Detective McFadden—are individualized and particularized to Terry. Alone, they may not constitute reasonable suspicion that Terry is committing or about to commit a particular crime. But in conjunction with Terry’s observed actions of pacing outside a store with two associates, the information makes the reasonable suspicion finding easier and, likely, more reliable.
Indeed, the standard of “reasonable suspicion” is so low that police officers may be able to use big data information to stop a suspect even though he was not engaged in any suspicious activity at the time, if a reliable algorithm predicts that he is at heightened risk for carrying a gun or narcotics.
Professor Ferguson notes a number of benefits from this use of big data, such as improved accuracy in Terry stops; the ability to use big data to allay suspicions and thus avoid an intrusive police/citizen encounter; and greater accountability for police actions. He also discusses the obvious dangers of widespread use of this data: the data may not be accurate; there will inevitably be false positives; and those who are poor or disenfranchised may be overrepresented in the “criminal propensity” data sets. Indeed, the entire idea of police making decisions about whom to stop based on a science that predicts future criminal activity has a dystopian science fiction feel to it. Professor Ferguson suggests some changes to both to legal doctrine and in how we collect and use big data in order to alleviate these concerns. He also notes that the “old-fashioned” method of relying on individual police officer’s observations–and unavoidably biased interpretations of those observations–is hardly a perfect system.
Other articles have begun to apply big data concepts to other aspects of the criminal justice system, such as parole decisions, analyzing criminal court rulings, and jury selection. But there are still more applications that have yet to be explored. What is the impact when police use big data analysis in search warrant application? What about prosecutors and defense attorneys predicting flight risks during bail hearings? What about judges predicting future dangerousness during sentencing hearings? And what about the criminal trial itself? The rules of Evidence allow a defendant to bring in opinion and reputation evidence to show that they are not the “type” of person who would have committed the crime in question; why not allow him to bring in far more accurate evidence based on big data about his unlikeliness to have committed the crime? The courts, no doubt, will be slow to accept this kind of information, and slower still to craft sensible rules for how to deal with it, but there is little doubt that the change will come.