The Gender Shades Audit: How Facial Recognition Is Misleading and Harmful in the Era of Deep Learning and Computer Science Research
As a graduate student in computer science, Buolamwini was frustrated that commercial facial-recognition systems failed to identify her face in photographs and video footage. She hypothesized that this was due, in part, to the fact that dark-skinned faces were not represented in the data sets that were used to train the computer programs she was studying. Buolamwini collaborated with Gebru to conduct an audit of the commercial facial-analysis systems and to show how they perform differently depending on the skin colour of the person in the image. The work became known as the Gender Shades audit.
Still, the bottom-up method isn’t perfect. In particular, these systems are largely bounded by the data they’re provided. As the tech writer Rob Horning puts it, technologies of this kind “presume a closed system.” Microsoft had a 20 percent error rate for white women, while their face detection had a 20 percent error rate for dark-skinned women due to discrepancies in data. Training biases can affect performance, which is why technology ethicists began preaching the importance of dataset diversity. Garbage in, garbage out is the popular saying in Artificial Intelligence.
The ImageNet data set, a large-scale collection of images that is considered the gold standard in computer vision, has had a pivotal role in positioning computer-vision research at the core of the ‘deep-learning revolution’ of the past decade. The proliferation of facial recognition technology in public spaces is a serious misuse of privacy and allows worrying surveillance practices. New Algorithms, even if they’re designed on the basis of diverse image sets, are still at risk of being used for inherently harmful and oppressive purposes.
How do we start building tools for neural networks? A consideration of what we’ve learned from Generative Adversarial Networks (GANs)
As we look closer at how this proposal might affect both our tools and our relationship with them, the shadows of this seemingly convenient solution begin to take shape.
Computer vision has existed in some form for over a century. Researchers attempted to build tools top down, manually defining rules, to identify a desired class of images. These rules would be converted into a computational formula, then programmed into a computer to help it search for pixel patterns that corresponded to those of the described object. It was difficult to make sense of the many subjects, angles and lighting conditions that could make a photo.
Over time, an increase in publicly available images made a more bottom-up process via machine learning possible. Mass aggregated of labeled data are fed into a system. The algorithm takes the data and learns to discriminate between the desired categories. The technique is much more flexible because it doesn’t rely on rules that could change depending on the conditions. The machine can identify relevant similarities between images of a given class without being told explicitly what those similarities are by training itself on a variety of inputs.
The same maxim applies to image generators as well as large datasets to train themselves in the art of representation. Most facial generators today employ Generative Adversarial Networks (or GANs) as their foundational architecture. At their core, GANs work by having two networks, a Generator and a Discriminator, in play with each other. While the Generator produces images from noise inputs, a Discriminator attempts to sort the generated fakes from the real images provided by a training set. Over time, this “adversarial network” enables the Generator to improve and create images that a Discriminator is unable to identify as a fake. The initial inputs serve as the anchor to this process. Over time tens of thousands of images were needed to produce realistic results indicating the importance of a diverse training set in the development of these tools.