Bodies, Minds, and the Artificial Intelligence Industrial Complex, Part 3
“The Navy revealed the embryo of an electronic computer today,” announced a New York Times article, “that it expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence.”1
A few paragraphs into the article, “the Navy” was quoted as saying the new “perceptron” would be the first non-living mechanism “capable of receiving, recognizing and identifying its surroundings without any human training or control.”
This example of AI hype wasn’t the first and won’t be the last, but it is a bit dated. To be precise, the Times story was published on July 8, 1958.
Due to its incorporation of a simple “neural network” loosely analogous to the human brain, the perceptron of 1958 is recognized as a forerunner of today’s most successful “artificial intelligence” projects – from facial recognition systems to text extruders like ChatGPT. It’s worth considering this early device in some detail.
In particular, what about the claim that the perceptron could identify its surroundings “without any human training or control”? Sixty years on, the descendants of the perceptron have “learned” a great deal, and can now identify, describe and even transform millions of images. But that “learning” has involved not only billions of transistors, and trillions of watts, but also millions of hours of labour in “human training and control.”
Seeing is not perceiving
When we look at a real-world object – for example, a tree – sensors in our eyes pass messages through a network of neurons and through various specialized areas of the brain. Eventually, assuming we are old enough to have learned what a tree looks like, and both our eyes and the required parts of our brains are functioning well, we might say “I see a tree.” In short, our eyes see a configuration of light, our neural network processes that input, and the result is that our brains perceive and identify a tree.
Accomplishing the perception with electronic computing, it turns out, is no easy feat.
The perceptron invented by Dr. Frank Rosenblatt in the 1950s used a 20 pixel by 20 pixel image sensor, paired with an IBM 704 computer. Let’s look at some simple images, and how a perceptron might process the data to produce a perception.
In the illustration at left above, what the camera “sees” at the most basic level is a column of pixels that are “on”, with all the other pixels “off”. However, if we train the computer by giving it nothing more than labelled images of the numerals from 0 to 9, the perceptron can recognize the input as matching the numeral “1”. If we then add training data in the form of labelled images of the characters in the Latin-script alphabet in a sans serif font, the perceptron can determine that it matches, equally well, the numeral “1”, the lower-case letter “l”, or an upper-case letter “I”.
The figure at right is considerably more complex. Here our perceptron is still working with a low-resolution grid, but pixels can be not only “on” or “off” – black or white – but various shades of grey. To complicate things further, suppose more training data has been added, in the form of hand-written letters and numerals, plus printed letters and numerals in an oblique sans serif font. The perceptron might now determine the figure is a numeral “1” or a lower-case “l” or upper-case “I”, either hand-written or printed in an oblique font, each with an equal probability. The perceptron is learning how to be an optical character recognition (OCR) system, though to be very good at the task it would need the ability to use context to the rank the probabilities of a numeral “1”, a lower-case “l”, or an upper-case “I”.
The possibilities multiply infinitely when we ask the perceptron about real-world objects. In the figure below, a bit of context, in the form of a visual ground, is added to the images.
Depending, again, on the labelled training data already input to the computer, the perceptron may “see” the image at left as a tall tower, a bare tree trunk, or the silhouette of a person against a bright horizon. The perceptron might see, on the right, a leaning tree or a leaning building – perhaps the Leaning Tower of Pisa. With more training images and with added context in the input image – shapes of other buildings, for example – the perceptron might output with high statistical confidence that the figure is actually the Leaning Tower of Leeuwarden.
Today’s perceptrons can and do, with widely varying degrees of accuracy and reliability, identify and name faces in crowds, label the emotions shown by someone in a recorded job interview, analyse images from a surveillance drone and indicate that a person’s activities and surroundings match the “signature” of terrorist operations, or identify a crime scene by comparing an unlabelled image with photos of known settings from around the world. Whether right or wrong, the systems’ perceptions sometimes have critical consequences: people can be monitored, hired, fired, arrested – or executed in an instant by a US Air Force Reaper drone.
As we will discuss below, these capabilities have been developed with the aid of millions of hours of poorly-paid or unpaid human labour.
The Times article of 1958, however, described Dr. Rosenblatt’s invention this way: “the machine would be the first device to think as the human brain. As do human beings, Perceptron will make mistakes at first, but will grow wiser as it gains experience ….” The kernel of truth in that claim lies in the concept of a neural network.
Rosenblatt told the Times reporter “he could explain why the machine learned only in highly technical terms. But he said the computer had undergone a ‘self-induced change in the wiring diagram.’”
I can empathize with that Times reporter. I still hope to find a person sufficiently intelligent to explain the machine learning process so clearly that even a simpleton like me can fully understand. However, New Yorker magazine writers in 1958 made a good attempt. As quoted in Matteo Pasquinelli’s book The Eye of the Master, the authors wrote:
“If a triangle is held up to the perceptron’s eye, the association units connected with the eye pick up the image of the triangle and convey it along a random succession of lines to the response units, where the image is registered. The next time the triangle is held up to the eye, its image will travel along the path already travelled by the earlier image. Significantly, once a particular response has been established, all the connections leading to that response are strengthened, and if a triangle of a different size and shape is held up to the perceptron, its image will be passed along the track that the first triangle took.”2
With hundreds, thousands, millions and eventually billions of steps in the perception process, the computer gets better and better at interpreting visual inputs.
Yet this improvement in machine perception comes at a high ecological cost. A September 2021 article entitled “Deep Learning’s Diminishing Returns” explained:
“[I]n 2012 AlexNet, the model that first showed the power of training deep-learning systems on graphics processing units (GPUs), was trained for five to six days using two GPUs. By 2018, another model, NASNet-A, had cut the error rate of AlexNet in half, but it used more than 1,000 times as much computing to achieve this.”
The authors concluded that, “Like the situation that Rosenblatt faced at the dawn of neural networks, deep learning is today becoming constrained by the available computational tools.”3
The steep increase in the computing demands of AI is illustrated in a graph by Anil Ananthaswamy.
Behold the Mechanical Turk
In the decades since Rosenblatt built the first perceptron, there were periods when progress in this field seemed stalled. Additional theoretical advances in machine learning, a many orders-of-magnitude increase in computer processing capability, and vast quantities of training data were all prerequisites for today’s headline-making AI systems. In Atlas of AI, Kate Crawford gives a fascinating account of the struggle to acquire that data.
Up to the 1980s artificial intelligence researchers didn’t have access to large quantities of digitized text or digitized images, and the type of machine learning that makes news today was not yet possible. The lengthy antitrust proceedings against IBM provided an unexpected boost to AI research, in the form of a hundred million digital words from legal proceedings. In the 1990s, court proceedings against Enron collected more than half a million email messages sent among Enron employees. This provided text exchanges in everyday English, though Crawford notes wording “represented the gender, race, and professional skews of those 158 workers.”
And the data floodgates were just beginning to open. As Crawford describes the change,
“The internet, in so many ways, changed everything; it came to be seen in the AI research field as something akin to a natural resource, there for the taking. As more people began to upload their images to websites, to photo-sharing services, and ultimately to social media platforms, the pillaging began in earnest. Suddenly, training sets could reach a size that scientists in the 1980s could never have imagined.”4
It took two decades for that data flood to become a tsunami. Even then, although images were often labelled and classified for free by social media users, the labels and classifications were not always consistent or even correct. There remained a need for humans to look at millions of images and create or check the labels and classifications.
Developers of the image database ImageNet collected 14 million images and eventually organized them into over twenty thousand categories. They initially hired students in the US for labelling work, but concluded that even at $10/hour, this work force would quickly exhaust the budget.
Enter the Mechanical Turk.
The original Mechanical Turk was a chess-playing scam originally set up in 1770 by a Hungarian inventor. An apparently autonomous mechanical human model, dressed in the Ottoman fashion of the day, moved chess pieces and could beat most human chess players. Decades went by before it was revealed that a skilled human chess player was concealed inside the machine for each exhibition, controlling all the motions.
In the early 2000s, Amazon developed a web platform by which AI developers, among others, could contract gig workers for many tasks that were ostensibly being done by artificial intelligence. These tasks might include, for example, labelling and classifying photographic images, or making judgements about outputs from AI-powered chat experiments. In a rare fit of honesty, Amazon labelled the process “artificial artificial intelligence”5 and launched its service, Amazon Mechanical Turk, in 2005.
screen shot taken 3 February 2024, from opening page at mturk.com.
“ImageNet would become, for a time, the world’s largest academic user of Amazon’s Mechanical Turk, deploying an army of piecemeal workers to sort an average of fifty images a minute into thousands of categories.”6
Chloe Xiang described this organization of work for Motherboard in an article entitled “AI Isn’t Artificial or Intelligent”:
“[There is a] large labor force powering AI, doing jobs that include looking through large datasets to label images, filter NSFW content, and annotate objects in images and videos. These tasks, deemed rote and unglamorous for many in-house developers, are often outsourced to gig workers and workers who largely live in South Asia and Africa ….”7
Laura Forlano, Associate Professor of Design at Illinois Institute of Technology, told Xiang “what human labor is compensating for is essentially a lot of gaps in the way that the systems work.”
“Like other global supply chains, the AI pipeline is greatly imbalanced. Developing countries in the Global South are powering the development of AI systems by doing often low-wage beta testing, data annotating and labeling, and content moderation jobs, while countries in the Global North are the centers of power benefiting from this work.”
In a study published in late 2022, Kelle Howson and Hannah Johnston described why “platform capitalism”, as embodied in Mechanical Turk, is an ideal framework for exploitation, given that workers bear nearly all the costs while contractors take no responsibility for working conditions. The platforms are able to enroll workers from many countries in large numbers, so that workers are constantly low-balling to compete for ultra-short-term contracts. Contractors are also able to declare that the work submitted is “unsatisfactory” and therefore will not be paid, knowing the workers have no effective recourse and can be replaced by other workers for the next task. Workers are given an estimated “time to complete” before accepting a task, but if the work turns out to require two or three times as many hours, the workers are still only paid for the hours specified in the initial estimate.8
A survey of 700 cloudwork employees (or “independent contractors” in the fictive lingo of the gig work platforms) found about 34% of the time they spent on these platforms was unpaid. “One key outcome of these manifestations of platform power is pervasive unpaid labour and wage theft in the platform economy,” Howson and Johnston wrote.9 From the standpoint of major AI ventures at the top of the extraction pyramid, pervasive wage theft is not a bug in the system, it is a feature.
The apparently dazzling brilliance of AI-model creators and semi-conductor engineers gets the headlines in western media. But without low-paid or unpaid work by employees in the Global South, “AI systems won’t function,” Crawford writes. “The technical AI research community relies on cheap, crowd-sourced labor for many tasks that can’t be done by machines.”10
Whether vacuuming up data that has been created by the creative labour of hundreds of millions of people, or relying on tens of thousands of low-paid workers to refine the perception process for reputedly super-intelligent machines, the AI value chain is another example of extractivism.
“AI image and text generation is pure primitive accumulation,” James Bridle writes, “expropriation of labour from the many for the enrichment and advancement of a few Silicon Valley technology companies and their billionaire owners.”11
“All seven emotions”
New AI implementations don’t usually start with a clean slate, Crawford says – they typically borrow classification systems from earlier projects.
“The underlying semantic structure of ImageNet,” Crawford writes, “was imported from WordNet, a database of word classifications first developed at Princeton University’s Cognitive Science Laboratory in 1985 and funded by the U.S. Office of Naval Research.”12
But classification systems are unavoidably political when it comes to slotting people into categories. In the ImageNet groupings of pictures of humans, Crawford says, “we see many assumptions and stereotypes, including race, gender, age, and ability.”
“In ImageNet the category ‘human body’ falls under the branch Natural Object → Body → Human Body. Its subcategories include ‘male body,’ ‘person,’ ‘juvenile body,’ ‘adult body,’ and ‘female body.’ The ‘adult body’ category contains the subclasses ‘adult female body’ and ‘adult male body.’ There is an implicit assumption here that only ‘male’ and ‘female’ bodies are recognized as ‘natural.’”13
Readers may have noticed that US military agencies were important funders of some key early AI research: Frank Rosenblatt’s perceptron in the 1950s, and the WordNet classification scheme in the 1980s, were both funded by the US Navy.
For the past six decades, the US Department of Defense has also been interested in systems that might detect and measure the movements of muscles in the human face, and in so doing, identify emotions. Crawford writes, “Once the theory emerged that it is possible to assess internal states by measuring facial movements and the technology was developed to measure them, people willingly adopted the underlying premise. The theory fit what the tools could do.”14
Several major corporations now market services with roots in this military-funded research into machine recognition of human emotion – even though, as many people have insisted, the emotions people express on their faces don’t always match the emotions they are feeling inside.
Affectiva is a corporate venture spun out of the Media Lab at Massachusetts Institute of Technology. On their website they claim “Affectiva created and defined the new technology category of Emotion AI, and evangelized its many uses across industries.” The opening page of affectiva.com spins their mission as “Humanizing Technology with Emotion AI.”
Who might want to contract services for “Emotion AI”? Media companies, perhaps, want to “optimize content and media spend by measuring consumer emotional responses to videos, ads, movies and TV shows – unobtrusively and at scale.” Auto insurance companies, perhaps, might want to keep their (mechanical) eyes on you while you drive: “Using in-cabin cameras our AI can detect the state, emotions, and reactions of drivers and other occupants in the context of a vehicle environment, as well as their activities and the objects they use. Are they distracted, tired, happy, or angry?”
Affectiva’s capabilities, the company says, draw on “the world’s largest emotion database of more than 80,000 ads and more than 14.7 million faces analyzed in 90 countries.”15 As reported by The Guardian, the videos are screened by workers in Cairo, “who watch the footage and translate facial expressions to corresponding emotions.”6
There is a slight problem: there is no clear and generally accepted definition of an emotion, nor general agreement on just how many emotions there might be. But “emotion AI” companies don’t let those quibbles get in the way of business.
Amazon’s Rekognition service announced in 2019 “we have improved accuracy for emotion detection (for all 7 emotions: ‘Happy’, ‘Sad’, ‘Angry’, ‘Surprised’, ‘Disgusted’, ‘Calm’ and ‘Confused’)” – but they were proud to have “added a new emotion: ‘Fear’.”17
Facial- and emotion-recognition systems, with deep roots in military and intelligence agency research, are now widely employed not only by these agencies but also by local police departments. Their use is not confined to governments: they are used in the corporate world for a wide range of purposes. And their production and operation likewise crosses public-private lines; though much of the initial research was government-funded, the commercialization of the technologies today allows corporate interests to sell the resulting services to public and private clients around the world.
What is the likely impact of these AI-aided surveillance tools? Dan McQuillan sees it this way:
“We can confidently say that the overall impact of AI in the world will be gendered and skewed with respect to social class, not only because of biased data but because engines of classification are inseparable from systems of power.”18
In our next installment we’ll see that biases in data sources and classification schemes are reflected in the outputs of the GPT large language model.
Title credit: the title of this post quotes a lyric of “Data Inadequate”, from the 1998 album Live at Glastonbury by Banco de Gaia.
1 “New Navy Device Learns By Doing,” New York Times, July 8, 1958, page 25.
2 “Rival”, in The New Yorker, by Harding Mason, D. Stewart, and Brendan Gill, November 28, 1958, synopsis here. Quoted by Matteo Pasquinelli in The Eye of the Master: A Social History of Artificial Intelligence, Verso Books, October 2023, page 137.
3 “Deep Learning’s Diminishing Returns”, by Neil C. Thompson, Kristjan Greenewald, Keeheon Lee, and Gabriel F. Manso, IEEE Spectrum, 24 September 2021.
4 Crawford, Kate, Atlas of AI, Yale University Press, 2021.
5 This phrase is cited by Elizabeth Stevens and attributed to Jeff Bezos, in “The mechanical Turk: a short history of ‘artificial artificial intelligence’”, Cultural Studies, 08 March 2022.
6 Crawford, Atlas of AI.
7 Chloe Xiang, “AI Isn’t Artificial or Intelligent: How AI innovation is powered by underpaid workers in foreign countries,” Motherboard, 6 December 2022.
8 Kelle Howson and Hannah Johnston, “Unpaid labour and territorial extraction in digital value networks,” Global Network, 26 October 2022.
9 Howson and Johnston, “Unpaid labour and territorial extraction in digital value networks.”
10 Crawford, Atlas of AI.
11 James Bridle, “The Stupidity of AI”, The Guardian, 16 Mar 2023.
12 Crawford, Atlas of AI.
13 Crawford, Atlas of AI.
14 Crawford, Atlas of AI.
15 Quotes from Affectiva taken from www.affectiva.com on 5 February 2024.
16 Oscar Schwarz, “Don’t look now: why you should be worried about machines reading your emotions,” The Guardian, 6 March 2019.
17 From Amazon Web Services Rekognition website, accessed on 5 February 2024; italics added.
18 Dan McQuillan, “Post-Humanism, Mutual Aid,” in AI for Everyone? Critical Perspectives, University of Westminster Press, 2021.