We’ve all read the news stories: study after study shows that facial recognition algorithms are not always reliable, and that error rates spike significantly when involving faces of folks of color, especially Black women, as well as trans and nonbinary people. Yet this technology is widely used by law enforcement for identifying suspects in criminal investigations. By refusing to disclose the specifics of that process, law enforcement have effectively prevented criminal defendants from challenging the reliability of the technology that ultimately lead to their arrest.
This week, EFF, along with EPIC and NACDL, filed an amicus brief in State of New Jersey v. Francisco Arteaga, urging a New Jersey appellate court to allow robust discovery regarding law enforcement’s use of facial recognition technology. In this case, a facial recognition search conducted by the NYPD for NJ police was used to determine that Francisco Arteaga was a “match” of the perpetrator in an armed robbery. Despite the centrality of the match to the case, nothing was disclosed to the defense about the algorithm that generated it, not even the name of the software used. Mr. Arteaga asked for detailed information of the search process, with an expert testifying the necessity of that material, but the court denied those requests.
Comprehensive discovery regarding law enforcement’s facial recognition searches is crucial because, far from being an infallible tool, the process entails numerous steps, all of which have substantial risk of error. These steps include selecting the “probe” photo of the person police are seeking, editing the probe photo, choosing photo databases to which the edited probe photo is compared, the specifics of the algorithm that performs the search, and human review of the algorithm’s results.
Police analysts often select a probe photo from a video still or a cell phone camera, which are more likely to be low quality. The characteristics of the chosen image, including its resolution, clarity, face angle, lighting, etc. all impact the accuracy of the subsequent algorithmic search. Shockingly, analysts may also significantly edit the probe photo, using tools closely resembling those in Photoshop in order to remove facial expressions or insert eyes, combining face photographs of two different people even though only one is of the perpetrator, using the blur effect to add pixels into a low quality image, using the cloning tool or 3D modeling to add parts of a subject’s face not visible on the original photo. In one outrageous instance, when the original probe photo returned no potential matches by the algorithm, the analyst from the NYPD Facial Identification Section, who thought the subject looked like actor Woody Harrelson, ran another search using the celebrity’s photo instead. Needless to say, these changes significantly elevate the risk of misidentification.
The database of photos to which the probe photo is compared, which could include mugshots, DMV photos or other sources, can also impact the accuracy of the results depending on the population that makes up those databases. Mugshot databases will often include more photos of folks in over-policed communities and the resulting errors in the search is more likely to impact members of those groups.
The algorithms used by law enforcement are typically developed by private companies and are “black box” technology — it is impossible to know exactly how the algorithms reach their conclusions without looking at their source code. Each algorithm is developed by different designers, and trained using different datasets. The algorithms create “templates,” also known as “facial vectors,” of the probe photograph and the photographs in the database, but different algorithms will focus on different points of a face in creating those templates. Unsurprisingly, even when comparing the same probe photo to the same databases, different algorithms will produce different results.
Although human analysts will review the probe photo and candidate list generated by the algorithm for the match to be investigated, numerous studies have shown that humans are prone to misidentifying unfamiliar faces and are subject to the same biases present in facial recognition systems. Human review is also impacted by many other factors, including the analyst’s innate ability to analyze faces, motivation to find a match, fatigue from performing a repetitive task, time limitations, and cognitive and contextual biases.
Despite the grave risk of error, law enforcement remains reticent about its facial recognition systems. In filing this brief, EFF continues to advocate for transparency regarding law enforcement technology.