Lying Sophia &
Mocking Alexa


Text Iris Long 文 龙星如

Exhibition Structure 展览结构图

Sophia, the humanoid robot who became a Saudi Arabian citizen is interpreted by many as a story intertwined with elements of ambiguity and deception co-compiled by the mass media and technological companies. Alexa, the cloud-based virtual assistant developed by Amazon, was reported about unsettlingly letting out eerie laughters, which soon became viral on YouTube.

Sophia and Alexa seem to be two contemporary metaphors on machine lives, two thin slices interposed among the imbricated discourses on artificial intelligence. Sophia symbolizes the imagination on AI casted by the mass media, films and televisions: highly human-imitating appearances, alert and responsive, and even diplomatic - a quasi-human being embedded within us. Alexa, on the other hand, is an “assistant” or “servant” who takes a machine outlook and resides in domestic corners, whose laughter implies the non-transparent,  anti-regulating, even peeping, subversive dimension of the artificial intelligence black box - even a “mistake” to be amended.

Sophia’s lies are projections of poetic imaginations, Alexa’s mocking is glitches on the algorithmic black box, what they share in common, is a quantum-state like scenario of uncertainties, as if the “La Zona” in Andrei Tarkovsky’s Stalker. In the alternations and evolutions of technologies, we’ve rarely encountered such subject as the artificial intelligence: it’s paradoxical, mind stimulating, and implies manifold future potentials. Even AI has been ubiquitously employed by microchips, processors, data mining and analysis, forming the new frontier of a global technological competition, it remains imperceptible and equivocal to a normal citizen - wrapped within the information on mass media, AI has transformed into a story both the easiest to tell, and the most difficult to narrate.

In Tarkovsky’s script, the stalker guides writer and a scientist to take a cable car, steer by the policemen’s chase, traverse tunnels of dripping water, detour rooms filled with sand dunes, and finally approximate the core of “La Zona”: a “Room” that makes beliefs true. The writer concerns about the dark human nature the Room suggests, while the scientist wishes to destroy the Room in case villains would take advantage of it. The exhibition sets up a metaphorical “La Zona” which embodies our contemporary situation: a time-space where both science and art are simultaneously deprived the power of autocracy and believing narratives, and filled with the writer and the scientist’s chattering.

Artists and researchers involved in this exhibition blend perspectives of Sophia(bright, poetic, media imagination) and Alexa (dark, black-box, technological criticism), they investigate how AI shuffles global technical politics, reconstruct the earth’s geologies, the absurdity of quantifying human emotions, the dark, inhuman labor (in exhausting fashion) to train “human-like” algorithms,  incentives to project the entire human spiritual architecture on one single technology form, and the fairy-tale building on AI conducted by mass media.

“Sophia” and “Alexa” are embodied in the exhibition by text and sound generated by AI algorithms, and weave through it as dialogues. Walking through the exhibition is as if in “stalker”, it interweaves the richness, non-computability and vitality of the psychological world. Would all we are experiencing as a whole “break all the prophecies”, like the event horizon in Vernor Steffen Vinge’s assertions, or be “the biggest mistake we have ever made” in Steve Hawking’s alerts?

被授予沙特国籍的机器人“索菲亚” 被阐释为媒体和技术企业撰写的暧昧骗局。亚马逊的智能助手艾莉克莎(Alexa)屡次被录下发出“可怖笑声” 的瞬间,一时成为风行于YouTube 的都市传说。

“艾莉克莎” 和“索菲亚” 像是关于机器生命的两个当代隐喻,两块安插在人工智能庞杂话题间的薄片。索菲亚象征媒体和影视里对AI 具有高度拟真容貌、机敏回复力,甚至懂外交的想象——一个行走于我们之间的类人。艾莉克莎是拥有机器外形、存活于私家角落的“助手” 或“仆从”,它的笑声象征关于AI “黑盒” 之不透明、不可规训和潜在窥伺、颠覆的面向——一个需要被修正的错误。

索菲亚的谎言是诗意想象的投射,艾莉克莎的嘲讽是算法黑箱的裂痕,她们共享一种不明朗的处境,犹如《潜行者 》(塔可夫斯基)里的“区” (La Zona)。在科技更迭里,我们很少遇见像AI 一样内含重重悖论,刺激心智,进而指向未来多种可能的课题。哪怕今天AI 已经普世地运用于芯片、处理器、数据收集与分析层面,形成全球技术竞争的新前线,对一个普通人来说,它依然不直接可知、模棱两可,是一个在大众媒体的信息包裹里最好讲也最难讲的故事。

在塔可夫斯基的脚本里,潜行者带着作家与科学家坐缆车,躲过警察追击,穿过滴水隧道,绕过充满沙丘之屋,才接近了“区” 的核心:一个信念成真的房间。作家恐惧于它所暗示的卑陋人性,而科学家希望摧毁房间以免它为恶人所用。展览建构的“区” 犹如今天我们的处境,是科学和文艺同时失去独裁力的架空之所,充斥着“作家”们与“科学家”们的喋喋不休。

展览邀请的艺术家与研究者兼容了索菲亚(光明、诗歌、媒体想象)与艾莉克莎(阴翳、黑盒、技术批判)的视角,探讨AI 对全球技术政治洗牌、其涉及的资源和地质改造、量化感情的荒诞、用真人训练“人性”算法的黑色劳动、投射人类整体精神建筑的动因、AI 的媒介化包装等议题。

索菲亚与艾莉克莎两个角色,也化身人工智能程序所生成的文本及声音,以“对话”的形式贯穿展览。穿越展览的过程如一场“潜行”,它交织着心理世界的饱满、无法计算与一线生机。我们正在经历的一切,究竟会如黑洞的事件穹界一般“打破所有预言”(弗诺· 文奇),还是“我们犯过的最大错误”(史蒂芬· 霍金)?


Lying Sophia &
Mocking Alexa


Text HE Di 文 贺笛

In recent years, deep learning has pushed the limits of many real applications, including speech recognition [1], image classification [2], and machine translation [3]. Deep neural network-based models have even achieved super-human performance in many challenging game environments such as Go [4], StarCraft [5] and Dota2 [6]. The keys to the success of deep learning span in many aspects including advanced neural network architectures [2,3], modern optimization algorithms [7], massive data and huge computational power [4,6].

In this project, we mainly leverage deep learning models in natural language processing. The conversations are generated in three steps: conditional sentence generation for the English version, text-to-text translation from English to Chinese and text-to-speech translation. We briefly introduce the basic knowledge of the deep learning models we used as below.

In the conditional sentence generation step, we use the GPT-2 model [8] which is the current state-of-the-art language generation model based on the Transformer architecture [3,9]. The model is trained to predict the distribution of the next word conditioned on its proceeding words in a sentence using 8 million English web documents, which roughly corresponds to 40 GB plain texts. As the model can predict proper words given any context, we can use it to generate a sentence word by word autoregressively.

We use the open-sourced GPT-2 medium model which contains 330 millions of parameters. In particular, for Alexa and Sophia, we feed the GPT-2 model with hand-craft sentence beginnings. For example, we create a sentence beginning ``Alexa can help human’’ and use the GPT-2 model to generate a sentence automatically from it. Note that the neural language model is a probabilistic generative model, we can sample different outputs in different rounds.  For each sentence beginning for Alexa and Sophia, we randomly sample 512 sentences and follow to use the suggested hyperparameter in [8]. We set the temperature to be 1.0, set the top-k number to be 40 to balance accuracy and diversity and set the maximum sentence length to be 128. We create 81 different sentence beginnings for Alexa and Sophia and finally obtain 80,000 sentences with 1000,000 words in total. We randomly organize the sentences from Alexa and Sophia and form them into conversations.

Given the generated English contexts, we translate each sentence from English to Chinese using Google Translator. As far as we know, Google Translator uses the Transformer model trained from millions of bilingual sentences of the two languages. Generally speaking, given a sentence in English, the Transformer encoder will first encode the sentence into contexts which are usually real-valued vectors. Then the Transformer decoder will decode the encoded contexts using stack of attentive layers and generate the word sequence in Chinese. In the last step, we translate the texts into voices using APIs from iFLYTEK.

[1]. Hinton, Geoffrey, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior et al. "Deep neural networks for acoustic modeling in speech recognition." IEEE Signal processing magazine 29 (2012).

[2]. He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep residual learning for image recognition." CVPR 2016.

[3]. Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. "Attention is all you need." NIPS 2017.

[4]. AlphaGo,, DeepMind, 2017.

[5]. AlphaStar: Mastering the Real-Time Strategy Game StarCraft II,, DeepMind, 2019.

[6]. OpenAI Five., OpenAI, 2019.

[7]. Du, Simon S., Jason D. Lee, Haochuan Li, Liwei Wang, and Xiyu Zhai. Gradient descent finds global minima of deep neural networks. ICML 2019.

[8]. Radford, Alec, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. "Language models are unsupervised multitask learners." OpenAI Blog 1, no. 8 (2019).

[9]. Yiping Lu, Zhuohan Li, Di He, Zhiqing Sun, Bin Dong, Tao Qin, Liwei Wang, Tie-yan Liu. Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View. arXiv preprint:1906.02762


Lying Sophia & Mocking Alexa


Three Thousand Years of Algorithmic Rituals: The Emergence of AI from the Computation of Space



Text Matteo Pasquinelli 文 马蒂欧·帕斯克奈利

图片来源:弗里茨·施塔尔,《希腊与吠陀几何学》,印度哲学期刊 27.1 (1999): 105-127.
Illustration from Frits Staal, "Greek and Vedic geometry" Journal of Indian Philosophy 27.1 (1999): 105-127.

1. Recomposing a Dismembered God

In a fascinating myth of cosmogenesis from the ancient Vedas, it is said that the god Prajapati was shattered into pieces by the act of creating the universe. After the birth of the world, the supreme god is found dismembered, undone. In the corresponding Agnicayana ritual, Hindu devotees symbolically recompose the fragmented body of the god by building a fire altar according to an elaborate geometric plan.2 The fire altar is laid down by aligning thousands of bricks of precise shape and size to create the profile of a falcon. Each brick is numbered and placed while reciting its dedicated mantra, following step-by-step instructions. Each layer of the altar is built on top of the previous one, conforming to the same area and shape. Solving a logical riddle that is the key of the ritual, each layer must keep the same shape and area of the contiguous ones, but using a different configuration of bricks. Finally, the falcon altar must face east, a prelude to the symbolic flight of the reconstructed god towards the rising sun—an example of divine reincarnation by geometric means.

The Agnicayana ritual is described in the Shulba Sutras, composed around 800 BCE in India to record a much older oral tradition. The Shulba Sutras teach the construction of altars of specific geometric forms to secure gifts from the gods: for instance, they suggest that “those who wish to destroy existing and future enemies should construct a fire-altar in the form of a rhombus.”3 The complex falcon shape of the Agnicayana evolved gradually from a schematic composition of only seven squares. In the Vedic tradition, it is said that the Rishi vital spirits created seven square-shaped Purusha (cosmic entities, or persons) that together composed a single body, and it was from this form that Prajapati emerged once again. While art historian Wilhelm Worringer argued in 1907 that primordial art was born in the abstract line found in cave graffiti, one may assume that the artistic gesture also emerged through the composing of segments and fractions, introducing forms and geometric techniques of growing complexity. 4In his studies of Vedic mathematics, Italian mathematician Paolo Zellini has discovered that the Agnicayana ritual was used to transmit techniques of geometric approximation and incremental growth—in other words, algorithmic techniques—comparable to the modern calculus of Leibniz and Newton.5 Agnicayana is among the most ancient documented rituals still practiced today in India, and a primordial example of algorithmic culture.

But how can we define a ritual as ancient as the Agnicayana as algorithmic? To many, it may appear an act of cultural appropriation to read ancient cultures through the paradigm of the latest technologies. Nevertheless, claiming that abstract techniques of knowledge and artificial metalanguages belong uniquely to the modern industrial West is not only historically inaccurate but also an act and one of implicit epistemic colonialism towards cultures of other places and other times.6 The French mathematician Jean-Luc Chabert has noted that “algorithms have been around since the beginning of time and existed well before a special word had been coined to describe them. Algorithms are simply a set of step by step instructions, to be carried out quite mechanically, so as to achieve some desired result.”7 Today some may see algorithms as a recent technological innovation implementing abstract mathematical principles. On the contrary, algorithms are among the most ancient and material practices, predating many human tools and all modern machines:

Algorithms are not confined to mathematics … The Babylonians used them for deciding points of law, Latin teachers used them to get the grammar right, and they have been used in all cultures for predicting the future, for deciding medical treatment, or for preparing food … We therefore speak of recipes, rules, techniques, processes, procedures, methods, etc., using the same word to apply to different situations. The Chinese, for example, use the word shu (meaning rule, process or stratagem) both for mathematics and in martial arts … In the end, the term algorithm has come to mean any process of systematic calculation, that is a process that could be carried out automatically. Today, principally because of the influence of computing, the idea of finiteness has entered into the meaning of algorithm as an essential element, distinguishing it from vaguer notions such as process, method or technique.8

Before the consolidation of mathematics and geometry, ancient civilizations were already big machines of social segmentation that marked human bodies and territories with abstractions that remained, and continue to remain, operative for millennia. Drawing also on the work of historian Lewis Mumford, Gilles Deleuze and Félix Guattari offered a list of such old techniques of abstraction and social segmentation: “tattooing, excising, incising, carving, scarifying, mutilating, encircling, and initiating.”9 Numbers were already components of the “primitive abstract machines” of social segmentation and territorialization that would make human culture emerge: the first recorded census, for instance, took place around 3800 BCE in Mesopotamia. Logical forms that were made out of social ones, numbers materially emerged through labor and rituals, discipline and power, marking and repetition.

In the 1970s, the field of “ethnomathematics” began to foster a break from the Platonic loops of elite mathematics, revealing the historical subjects behind computation.10 The political question at the center of the current debate on computation and the politics of algorithms is ultimately very simple, as Diane Nelson has reminded us: Who counts?11 Who computes? Algorithms and machines do not compute for themselves; they always compute for someone else, for institutions and markets, for industries and armies.

Illustration from Frank Rosenblatt, Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms, (Cornell Aeronautical Laboratory, Buffalo NY, 1961).

2. What Is an Algorithm?

The term “algorithm” comes from the Latinization of the name of the Persian scholar al-Khwarizmi. His tract On the Calculation with Hindu Numerals, written in Baghdad in the ninth century, is responsible for introducing Hindu numerals to the West, along with the corresponding new techniques for calculating them, namely algorithms. In fact, the medieval Latin word “algorismus” referred to the procedures and shortcuts for carrying out the four fundamental mathematical operations—addition, subtraction, multiplication, and division—with Hindu numerals. Later, the term “algorithm” would metaphorically denote any step-by-step logical procedure and become the core of computing logic. In general, we can distinguish three stages in the history of the algorithm: in ancient times, the algorithm can be recognized in procedures and codified rituals to achieve a specific goal and transmit rules; in the Middle Ages, the algorithm was the name of a procedure to help mathematical operations; in modern times, the algorithm qua logical procedure becomes fully mechanized and automated by machines and then digital computers.

Looking at ancient practices such as the Agnicayana ritual and the Hindu rules for calculation, we can sketch a basic definition of “algorithm” that is compatible with modern computer science: (1) an algorithm is an abstract diagram that emerges from the repetition of a process, an organization of time, space, labor, and operations: it is not a rule that is invented from above but emerges from below; (2) an algorithm is the division of this process into finite steps in order to perform and control it efficiently; (3) an algorithm is a solution to a problem, an invention that bootstraps beyond the constrains of the situation: any algorithm is a trick; (4) most importantly, an algorithm is an economic process, as it must employ the least amount of resources in terms of space, time, and energy, adapting to the limits of the situation.

Today, amidst the expanding capacities of AI, there is a tendency to perceive algorithms as an application or imposition of abstract mathematical ideas upon concrete data. On the contrary, the genealogy of the algorithm shows that its form has emerged from material practices, from a mundane division of space, time, labor, and social relations. Ritual procedures, social routines, and the organization of space and time are the source of algorithms, and in this sense they existed even before the rise of complex cultural systems such as mythology, religion, and especially language. In terms of anthropogenesis, it could be said that algorithmic processes encoded into social practices and rituals were what made numbers and numerical technologies emerge, and not the other way around. Modern computation, just looking at its industrial genealogy in the workshops studied by both Charles Babbage and Karl Marx, evolved gradually from concrete towards increasingly abstract forms.

Illustration from Frank Rosenblatt, Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms, (Cornell Aeronautical Laboratory, Buffalo NY, 1961).

3. The Rise of Machine Learning as Computational Space

In 1957, at the Cornell Aeronautical Laboratory in Buffalo, New York, the cognitive scientist Frank Rosenblatt invented and constructed the Perceptron, the first operative artificial neural network—grandmother of all the matrices of machine learning, which at the time was a classified military secret.12The first prototype of the Perceptron was an analogue computer composed of an input device of 20 × 20 photocells (called the “retina”) connected through wires to a layer of artificial neurons that resolved into one single output (a light bulb turning on or off, to signify 0 or 1). The “retina” of the Perceptron recorded simple shapes such as letters and triangles and passed electric signals to a multitude of neurons that would compute a result according to a threshold logic. The Perceptron was a sort of photo camera that could be taught to recognize a specific shape, i.e., to make a decision with a margin of error (making it an “intelligent” machine). The Perceptron was the first machine-learning algorithm, a basic “binary classifier” that could determine whether a pattern fell within a specific class or not (whether the input image was a triangle or not, a square or not, etc.). To achieve this, the Perceptron progressively adjusted the values of its nodes in order to resolve a large numerical input (a spatial matrix of four hundred numbers) into a simple binary output (0 or 1). The Perceptron gave the result 1 if the input image was recognized within a specific class (a triangle, for instance); otherwise it gave the result 0. Initially, a human operator was necessary to train the Perceptron to learn the correct answers (manually switching the output node to 0 or 1), hoping that the machine, on the basis of these supervised associations, would correctly recognize similar shapes in the future. The Perceptron was designed not to memorize a specific pattern but to learnhow to recognize potentially any pattern.

The matrix of 20 × 20 photoreceptors in the first Perceptron was the beginning of a silent revolution in computation (which would become a hegemonic paradigm in the early twenty-first century with the advent of “deep learning,” a machine-learning technique). Although inspired by biological neurons, from a strictly logical point of view the Perceptron marked not a biomorphic turn in computation but a topologicalone; it signified the rise of the paradigm of “computational space” or “self-computing space.” This turn introduced a second spatial dimension into a paradigm of computation that until then had only a linear dimension (see the Turing machine that reads and writes 0 and 1 along a linear memory tape). This topological turn, which is the core of what people perceive today as “AI,” can be described more modestly as the passage from a paradigm of passive information to one of active information. Rather than having a visual matrix processed by a top-down algorithm (like any image edited by a graphics software program today), in the Perceptron the pixels of the visual matrix are computed in a bottom-up fashion according to their spatial disposition. The spatial relations of the visual data shape the operation of the algorithm that computes them.

Because of its spatial logic, the branch of computer science originally dedicated to neural networks was called “computational geometry.” The paradigm of computational space or self-computing space shares common roots with the studies of the principles of self-organization that were at the center of post-WWII cybernetics, such as John von Neumann’s cellular automata (1948) and Konrad Zuse’s Rechnender Raum by (1967).13 Von Neumann’s cellular automata are cluster of pixels, perceived as small cells on a grid, that change status and move according to their neighboring cells, composing geometric figures that resemble evolving forms of life. Cellular automata have been used to simulate evolution and to study complexity in biological systems, but they remain finite-state algorithms confined to a rather limited universe. Konrad Zuse (who built the first programmable computer in Berlin in 1938) attempted to extend the logic of cellular automata to physics and to the whole universe. His idea of “rechnender Raum,” or calculating space, is a universe that is composed of discrete units that behave according to the behavior of neighboring units. Alan Turing’s last essay, “The Chemical Basis of Morphogenesis” (published in 1952, two years before his death), also belongs to the tradition of self-computing structures.14 Turing considered molecules in biological systems as self-computing actors capable of explaining complex bottom-up structures, such as tentacle patterns in hydra, whorl arrangement in plants, gastrulation in embryos, dappling in animal skin, and phyllotaxis in flowers.15

Von Neumann’s cellular automata and Zuse’s computational space are intuitively easy to understand as spatial models, while Rosenblatt’s neural network displays a more complex topology that requires more attention. Indeed, neural networks employ an extremely complex combinatorial structure, which is probably what makes them the most efficient algorithms for machine learning. Neural networks are said to “solve any problem,” meaning they can approximate the function of any pattern according to the Universal Approximation theorem (given enough layers of neurons and computing resources). All systems of machine learning, including support-vector machines, Markov chains, Hopfield networks, Boltzmann machines, and convolutional neural networks, to name just a few, started as models of computational geometry. In this sense they are part of the ancient tradition of ars combinatoria.16

Image from Hans Meinhardt, The Algorithmic Beauty of Sea Shells (Springer Science & Business Media, 2009).

4. The Automation of Visual Labor

Even at the end of the twentieth century, no one would have ever thought to call a truck driver a “cognitive worker,” an intellectual. At the beginning of the twenty-first century, the use of machine learning in the development of self-driving vehicles has led to a new understanding of manual skills such as driving, revealing how the most valuable component of work, generally speaking, has never been merely manual, but also social and cognitive (as well as perceptual, an aspect of labor still waiting to be located somewhere between the manual and the cognitive). What kind of work do drivers perform? Which human task will AI come to record with its sensors, imitate with its statistical models, and replace with automation? The best way to answer this question is to look at what technology has successfully automated, as well as what it hasn’t.

The industrial project to automate driving has made clear (more so than a thousand books on political economy) that the labor of driving is a conscious activity following codified rules and spontaneous social conventions. However, if the skill of driving can be translated into an algorithm, it will be because driving has a logical and inferential structure. Driving is a logical activity just as labor is a logical activity more generally. This postulate helps to resolve the trite dispute about the separation between manual labor and intellectual labor.17 It is a political paradox that the corporate development of AI algorithms for automation has made possible to recognize in labor a cognitive component that had long been neglected by critical theory. What is the relation between labor and logic? This becomes a crucial philosophical question for the age of AI.

A self-driving vehicle automates all the micro-decisions that a driver must make on a busy road. Its artificial neural networks learn, that is imitate and copy, the human correlations between the visual perception of the road space and the mechanical actions of vehicle control (steering, accelerating, stopping) as ethical decisions taken in a matter of milliseconds when dangers arise (for the safety of persons inside and outside the vehicle). It becomes clear that the job of driving requires high cognitive skills that cannot be left to improvisation and instinct, but also that quick decision-making and problem-solving are possible thanks to habits and training that are not completely conscious. Driving remains essentially also a social activity, which follows both codified rules (with legal constraints) and spontaneous ones, including a tacit “cultural code” that any driver must subscribe to. Driving in Mumbai—it has been said many times—is not the same as driving in Oslo.

Obviously, driving summons an intense labor of perception. Much labor, in fact, appears mostly perceptive in nature, through continuous acts of decision and cognition that take place in the blink of an eye.18Cognition cannot be completely disentangled from a spatial logic, and often follows a spatial logic in its more abstract constructions. Both observations—that perception is logical and that cognition is spatial—are empirically proven without fanfare by autonomous driving AI algorithms that construct models to statistically infer visual space (encoded as digital video of a 3-D road scenario). Moreover, the driver that AI replaces in self-driving cars and drones is not an individual driver but a collective worker, a social brain that navigates the city and the world.19 Just looking at the corporate project of self-driving vehicles, it is clear that AI is built on collective data that encode a collective production of space, time, labor, and social relations. AI imitates, replaces, and emerges from an organized division of social space (according first to a material algorithm and not the application of mathematical formulas or analysis in the abstract).

Animation from Chris Urmson’s, Ted talk “How a Driverless Car Sees the Road.” Urmson is the former chief engineer for Google’s Self-Driving Car Project. Animation by ZMScience.

5. The Memory and Intelligence of Space

Paul Virilio, the French philosopher of speed or “dromology,” was also a theorist of space and topology, for he knew that technology accelerates the perception of space as much as it morphs the perception of time. Interestingly, the title of Virilio’s book The Vision Machine was inspired by Rosenblatt’s Perceptron. With the classical erudition of a twentieth-century thinker, Virilio drew a sharp line between ancient techniques of memorization based on spatialization, such as the Method of Loci, and modern computer memory as a spatial matrix:

Cicero and the ancient memory-theorists believed you could consolidate natural memory with the right training. They invented a topographical system, the Method of Loci, an imagery-mnemonics which consisted of selecting a sequence of places, locations, that could easily be ordered in time and space. For example, you might imagine wandering through the house, choosing as loci various tables, a chair seen through a doorway, a windowsill, a mark on a wall. Next, the material to be remembered is coded into discreet images and each of the images is inserted in the appropriate order into the various loci. To memorize a speech, you transform the main points into concrete images and mentally “place” each of the points in order at each successive locus. When it is time to deliver the speech, all you have to do is recall the parts of the house in order.

The transformation of space, of topological coordinates and geometric proportions, into a technique of memory should be considered equal to the more recent transformation of collective space into a source of machine intelligence. At the end of the book, Virilio reflects on the status of the image in the age of “vision machines” such as the Perceptron, sounding a warning about the impending age of artificial intelligence as the “industrialisation of vision”:

“Now objects perceive me,” the painter Paul Klee wrote in his Notebooks. This rather startling assertion has recently become objective fact, the truth. After all, aren’t they talking about producing a “vision machine” in the near future, a machine that would be capable not only of recognizing the contours of shapes, but also of completely interpreting the visual field … ? Aren’t they also talking about the new technology of visionics: the possibility of achieving sightless vision whereby the video camera would be controlled by a computer? … Such technology would be used in industrial production and stock control; in military robotics, too, perhaps.

Now that they are preparing the way for the automation of perception, for the innovation of artificial vision, delegating the analysis of objective reality to a machine, it might be appropriate to have another look at the nature of the virtual image … Today it is impossible to talk about the development of the audiovisual … without pointing to the new industrialization of vision, to the growth of a veritable market in synthetic perception and all the ethical questions this entails … Don’t forget that the whole idea behind the Perceptron would be to encourage the emergence of fifth-generation “expert systems,” in other words an artificial intelligence that could be further enriched only by acquiring organs of perception.20

Ioannis de Sacro Busco, Algorismus Domini, c. 1501. National Central Library of Rome. Photo: Public Domain/Internet Archive.

6. Conclusion

If we consider the ancient geometry of the Agnicayana ritual, the computational matrix of the first neural network Perceptron, and the complex navigational system of self-driving vehicles, perhaps these different spatial logics together can clarify the algorithm as an emergent form rather than a technological a priori. The Agnicayana ritual is an example of an emergent algorithm as it encodes the organization of a social and ritual space. The symbolic function of the ritual is the reconstruction of the god through mundane means; this practice of reconstruction also symbolizes the expression of the many within the One (or the “computation” of the One through the many). The social function of the ritual is to teach basic geometry skills and to construct solid buildings.21 The Agnicayana ritual is a form of algorithmic thinking that follows the logic of a primordial and straightforward computational geometry.

The Perceptron is also an emergent algorithm that encodes according to a division of space, specifically a spatial matrix of visual data. The Perceptron’s matrix of photoreceptors defines a closed field and processes an algorithm that computes data according to their spatial relation. Here too the algorithm appears as an emergent process—the codification and crystallization of a procedure, a pattern, after its repetition. All machine-learning algorithms are emergent processes, in which the repetition of similar patterns “teach” the machine and cause the pattern to emerge as a statistical distribution.22

Self-driving vehicles are an example of complex emergent algorithms since they grow from a sophisticated construction of space, namely, the road environment as social institution of traffic codes and spontaneous rules. The algorithms of self-driving vehicles, after registering these spontaneous rules and the traffic codes of a given locale, try to predict unexpected events that may happen on a busy road. In the case of self-driving vehicles, the corporate utopia of automation makes the human driver evaporate, expecting that the visual space of the road scenario alone will dictate how the map will be navigated.

The Agnicayana ritual, the Perceptron, and the AI systems of self-driving vehicles are all, in different ways, forms of self-computing space and emergent algorithms (and probably, all of the them, forms of the invisibilization of labor).

The idea of computational space or self-computing space stresses, in particular, that the algorithms of machine learning and AI are emergent systems that are based on a mundane and material division of space, time, labor, and social relations. Machine learning emerges from grids that continue ancient abstractions and rituals concerned with marking territories and bodies, counting people and goods; in this way, machine learning essentially emerges from an extended division of social labor. Despite the way it is often framed and critiqued, artificial intelligence is not really “artificial” or “alien”: in the usual mystification process of ideology, it appears to be a deus ex machina that descends to the world like in ancient theater. But this hides the fact that it actually emerges from the intelligence of this world.

What people call “AI” is actually a long historical process of crystallizing collective behavior, personal data, and individual labor into privatized algorithms that are used for the automation of complex tasks: from driving to translation, from object recognition to music composition. Just as much as the machines of the industrial age grew out of experimentation, know-how, and the labor of skilled workers, engineers, and craftsmen, the statistical models of AI grow out of the data produced by collective intelligence. Which is to say that AI emerges as an enormous imitation engine of collective intelligence. What is the relation between artificial intelligence and human intelligence? It is the social division of labor.


Matteo Pasquinelli (PhD) is Professor in Media Philosophy at the University of Arts and Design, Karlsruhe, where he coordinates the research group KIM (Künstliche Intelligenz und Medienphilosophie / Artificial Intelligence and Media Philosophy). For Verso he is preparing a monograph on the genealogy of artificial intelligence as division of labor, which is titled The Eye of the Master: Capital as Computation and Cognition.
© 2019 e-flux and the author

1. 重新拼凑一位被肢解的神灵

在古《吠陀经》里,有一段关于宇宙发生论的令人着迷的描述:名为“波阇波提”(Prajapati)的神明被创世的行动肢解成了碎片。在世界诞生之后,人们发现这位这位至上神明的支离破碎的躯体。在对应的梵文仪式火坛祭( Agnicayana)中,印度教信徒会象征性地“重组”这位神灵的身体。他们根据一个详尽的几何图形,堆砌起一个熊熊燃烧的祭坛。这个祭坛是由上千块有着精准形状和尺寸的砖石铺砌而成的,最终形成一只鹰隼的轮廓。每块砖上都标记了序号,信众专注地把它们按照次序排列,同时根据明确的步骤,吟诵着咒文。祭坛的每一层都被盖筑在另一层之上,形成完全一致的形状,覆盖相同的面积。这种教仪的关键在于解决一个逻辑“谜语”:祭坛的每一层都需要和上一层的形状与面积保持一致,但是砖块的排列方法完全不同。此外,有着鹰隼图样的祭坛必须面向东方,这是这位重构出来的神明象征性地飞向日升的东方的序曲——整个故事,如同一个通过几何的方式实现神性轮回的案例。

上述的火坛祭在 Shulba Sutras中有详尽的描述,大约成书于公元前800年前,记录了更早的口述传统。 Shulba Sutras 教导人们如何根据特定的几何形状去建造神探,以保证神的旨意得以传承:比方说,他们建议“那些意图摧毁当下和未来之敌人的人们,应当按照斜方形来构筑火坛子。”3


1907年艺术史学家威廉·沃林格(Wilhelm Worringer)曾言,原始艺术发源于洞穴绘画里的抽象线条,或许我们也可以假设,许多艺术特征也来自于对碎片和片段的重组,以及这一过程中引入的形式和几何技法,直到人们有能力创造出更高的复杂性。4 在对吠陀数学的研究中,意大利数学家保罗·杰里尼(Paolo Zellini)发现火坛祭也用于传递数学技法,具体包括几何上的近似法和增量改变——换言之,这些都是“计算”技法,相当于莱布尼茨与牛顿所建立的当代微积分学。5 火坛祭或许是现今仍然在流传的,最早有迹可循的古代祭祀活动,也是“计算文化”在原始时期的一丝线索。


法国数学家让-吕克·夏伯特(Jean-Luc Chabert)曾说
“算法在时间之初便已经存在,并且在我们确定一个特定的词语来描述它们之前就存在。算法仅仅指的是一系列按照步骤进行的指令,并且有机制地执行,以得到某种期望的结果。”7 今日,许多人或许认为算法是一种近代技术发明,它代表对抽象数学原则的运用。恰恰相反,算法或许是最为古老的一种实践,同时它也是“物理”的,远远出现在许多人类工具和当代机器诞生之前:



在代数与几何学最终成形之前,古代文明已经某种意义上是种巨型机器,它对社会进行精准分割,用一系列抽象过程对人的身体和领土进行标注,这些标注方法曾持续,或许也将继续持续运转上千年。德勒兹(Gilles Deleuze)和瓜塔利(Félix Guattari)在参考了历史学家刘易斯·芒福德(Lewis Mumford)的一些工作后,提出了一个此类对社会进行抽象化和分割的古代技巧的清单,其中包括:“纹身、切除、切割、雕刻、划、切断、环绕和发起”9 数字本身也是关于社会分割和地区分配的“早期抽象机器”的组成部分,而正是这些机器促生了人类文明:史料记载的最早的人口调查发生于公元前3800年的美索不达米亚地区。逻辑形式脱胎于社会形式,数字的概念通过劳动和仪式、纪律和权利、标注和重复等一系列社会过程,成为物理现实。

在20世纪70年代,国际数学界兴起了关于“民族数学”的研究,它打破了精英数学的柏拉图式循环,并开始展现计算概念背后的一系列历史课题。10 今日被热议的关于计算和算法的政治学,或许本质上非常简单,正如戴安·尼尔森(Diane Nelson)曾提醒我们的:“谁是数数的人?”11 “谁是进行计算的人?”



2. 何为算法?

“算法”这个术语本身来源于波斯学者花喇子密(al-Khwarizmi)的拉丁译名。他于9世纪时的巴格达写就《关于印度数字计算》(On the Calculation with Hindu Numerals),被视为最早将印度数字概念,以及与之相随的新的计算技巧(算法)引入西方的著作。事实上,中世纪拉丁词语“algorismus”正是指的关于印度数字四则计算的过程和它的简称。后来,“算法”这一术语在比喻意义上表示任何按照步骤进行的逻辑过程,并且成为了计算机逻辑的内核。广义上讲,我们可以将算法的历史分为三个阶段:在古代,“算法”可以被认为是根据过程或编码方式执行的仪式,以达到非常具体的目的,并将这套规则传承下去;在中世纪,算法指的是帮助数学操作的一种过程;在当代,算法和算法逻辑过程实现了整体性的机械化,由机器和数字计算机自动化地执行。




3. “机器学习作为计算空间”的兴起

1957年,纽约州水牛城的康奈尔航空实验室里,认知科学家弗兰克·罗森布拉特(Frank Rosenblatt)发明和建构了“感知机”(Perceptron),这是已知的首个可运行的人工神经网络——可谓几乎所有机器学习模型的祖母。在发明之际,它被分类为军事机密。12 感知机的第一个原型是一台模拟信号的电脑,一台包含20X20个光电池(名为“视网膜”)的输入设备通过电线连接到一层人造神经元,计算输出唯一的结果(一个通过明灭象征0或1的灯泡)。感知机的“视网膜”录制下简单的形状,比如字母或者三角形,再把电信号传到一簇“神经元”,后者根据阈值逻辑计算出一个特定的结果。感知机有点像某种可以被训练识别特定形状的照相机:比如说,在有一定错误边际的基础上,做出决定(这使得它成为了一台“智能”机器)。感知机是最早的机器学习算法,一个基本的“二元分类器”,能够决定一个图形是否属于特定的判定类别(亦即,输入图像是不是一个三角形,是不是一个方形,如是等等)。为了实现这种能力,感知机需要不断调整节点的值,以分解一个大的数字输入(是一个由400个数字组成的空间矩阵),把它变成一个简单的二进制输出(0或1)。如果输入特征符合一个特定的分类(比如三角形),那么感知机则输出1,否则则输出0。在一开始的时候,感知机需要一个人类操作者来训练,以学会正确的答案(“训练”指的是人工的把输出节点调成0或1),这一行动的意图是让这个机器通过监督训练,能在未来有能力辨识出类似的形状。感知机不是被设计来记忆某一个特定形状的,而是用来学习识别任何潜在可能的形状。

第一台感知机的20X20光电池矩阵,是一次悄无声息的计算革命的源头(到了二十一世纪初期,随着一种机器学习类型“深度学习”的疾速发展,成为了一种主导范式)。尽管感知机的“神经元”是受到生物神经细胞的启发,但从严格逻辑上讲,感知机并不是对“计算”概念向生物拟仿的转变,而是对拓扑的拟仿;它预示了一种“计算空间”(computational space)或者“自计算空间”(self-computing space)范式的崛起。这一转向给“计算”的范式引入了一种空间维度,到当时位置,计算原本都是线性的(比如图灵机是在一根线型存储带上写入0或1)。这种拓扑的转向,正是今天人们所认为之“人工智能”的内核,它可被更谨慎地描述为从被动信息到主动信息范式之间的越迁。感知机并非运用过一个从上至下的算法处理一个视觉矩阵(就像任何图形处理软件的编辑原理一样),而是从下而上地把视觉矩阵的每一个像素,根据它原本的空间归置,进行计算。所有这些视觉数据的空间关系塑造了计算它们的算法的操作形式。

正是因为这种空间逻辑,这类最早专注于神经网络的计算机科学分支在当时被称为“计算几何学”。计算空间或自计算空间的范式,和二战后盛行的控制论中的“自组织”原则有相似性,比如冯·诺伊曼(Von Neumann)的细胞自动机(cellular automata, 1948)和康拉德·楚泽(Konrad Zuse)的“计算空间”(Rechnender Raum,1967)。13细胞自动机被用来模拟自然演化,并研究复杂的生物系统,但它们仍然是在有限空间里的有限计算。康拉德·楚泽 (他于1938年在柏林建造了第一台可编程计算机)尝试把细胞自动机的逻辑延展到物理学乃至整个宇宙。他提出“计算空间”(rechnender Raum)的概念,这是一个由独立单元组成的宇宙,每一个独立单元的行为都由它周围的单元决定。阿兰·图灵最后一篇论文《形态发生的化学基础》(The Chemical Basis of Morphogenesis,出版于1952年,他去世前2年)也研究了自计算结构。14 图灵认为生物系统里的分子是自计算的行动者,它们可以用来解释极为复杂的自下而上的结构,比如水螅触手的纹样,植物的螺纹,晶胚的原肠胚形成,动物表皮的斑点,和花卉的叶序。15

·诺伊曼的细胞自动机和楚泽的“计算空间”,作为一种空间模型而言,非常直观。而罗森布拉特的神经网络则呈现出更为复杂的空间结构。诚然,神经网络运用了极其复杂的组合结构,这或许也是它们在机器学习中呈现出最显著效率的原因。神经网络据称可以“解决任何问题”,这意味着它们可以通过万能近似定理(Universal Approximation theorem)去趋近任何一种规律的运行(只要它们拥有足够多层的神经元和运算资源)。所有的机器学习系统,包括支持向量机、马尔可夫链、霍普菲尔德网络、玻尔兹曼机和卷积神经网络等等,都是从计算几何学发源的。从这个意义上讲,它们都源自于问题组合术(ars combinatoria)的历史传统。16

图片来源:汉斯·梅因哈特,《海螺的算术之美》(Springer Science & Business Media, 2009)。

4. 视觉劳动的自动化

哪怕到了二十世纪末叶,也不会有任何人会把卡车司机称作一个“认知工人”(cognitive worker),或者一个知识分子。在二十一世纪初期,机器学习被运用在自动驾驶领域,这促生了一种对包括“驾驶”在内的人力劳动的新的理解。这也揭示了另一个事实。那就是人类工作最有价值的组成部分从不是纯人力的,而带有社会性和认知性(包括感知性,这是仍然需要在人力和认知之间更好定位的一种劳动要素)。 司机执行什么样的工作?人工智能会用它的感应器记录什么样的人类任务,用它的统计学模型进行模仿,并且进而用自动化去替代?或许最好的回答这一问题的方式是去观察技术迄今为止已经成功地“自动化”了什么,而哪些领域尚未实现同等级别的自动化。

自动驾驶作为一个产业项目清晰地说明了(或许比一千本政治经济学书籍还清楚)一点:驾驶劳动是一个有意识的行动,它遵循一系列编撰的规则和本能反应的社会传统。然而,驾驶这一技能可以被翻译成一种算法,这是因为驾驶行为有一套逻辑和推理结构。驾驶是一种逻辑行为,正如劳动广义上也是一种逻辑行为。这一假设也有助于重新审视关于“体力劳动”和“智力劳动”之间那陈腐的划分方式的争议。17  许多企业在人工智能自动化算法方向的发展,也使得我们有能力把劳动视为一种认知元素,这在很长时期内是被批判性理论所忽略的,当然也成为了某种政治悖论。“劳动”和“逻辑”之间的关系是什么?这成为了人工智能时代最关键的哲学问题之一。


很显然,驾驶行为需要一种高度集中的感知劳动。事实上,自然界中的许多劳动个都是“感知性”的,它需要通过持续的,转瞬之间的决策行为和认知行为来实现。18 认知不能完全从空间逻辑本身中剥离开来,在认知的抽象建构中,它也遵循某种空间逻辑。“感知是有逻辑的”和“认知是有空间性的”这两种观察,都得到了一定的经验性证明,这不是单纯地来自自动驾驶算法的自我宣传。这些算法会架构能在统计学上推导视觉空间的模型(通常会被编码成一个有三维路面场景的数字影像)。除此之外,自动驾驶车里面人工智能系统所提到的那个“司机”,并不是一个个体,而是一个集体工人,一个“社会脑”,在城市和世界里巡航。19 如果我们观察那些自动驾驶项目,会发现,人工智能是借助集体数据的,这些数据编码了一种对于空间、时间、劳动和社会关系的整体生产。人工智能所模仿、替代和萌生的,是一种社会空间的组织化分区(它首先是对物质材料的运算,而不是发生在抽象世界的数学方程或分析)。


5. 空间的智能和记忆

提出关于速度或速学(dromology)概念的法国哲学家保罗·维希留(Paul Virilio)进行关于空间和拓扑理论的研究,因为他知道科技加速了人类对于空间的感知,正如它扭曲了对时间的认知。非常有意思的是,维希留的书《视觉机器》,其标题正是受到了罗森布拉特感知机的启发。 维希留是一个博闻强识的,古典的二十世纪思想家,他建立了古代基于空间概念的记忆方法(比如轨迹法)和近现代计算机的空间矩阵记忆方法之间的清晰线索。

西塞罗和其他的古代“记忆理论家”相信,人类可以通过正确的训练方式加强自然记忆能力。他们发明了一套基于拓扑学的系统,亦即“轨迹法”(Method of Loci),它指的是一种想象图景式的记忆术,它涉及对一系列地点和位置的选择,并对其进行时空排布。举例来说,在这种记忆方法的场景里,你或许会想象在一个屋子里自由行走,选择不同的桌子,通过门廊看见一把椅子,一个窗台,并在墙上 写写画画。接下来,需要被记忆的素材会被编码进独立的图像,而这些图像以特定的顺序,被安插在不同轨迹里。如果你需要记住一段演讲,你需要把关键点提炼出来,转译成图形,并在思想中把这些关键点“放置”在连续的轨迹里。当你真正需要发表演讲时,你只需要按照顺序,回忆起你放置它们的这个房间即可。把空间,拓扑坐标和几何比例转译成一种记忆方法,和今天我们把集体空间转译成机器智能的来源,有异曲同工之妙。 维希留在书的结尾处回顾了在包括感知机在内的“视觉机器”时代,图像所处的地位,他也提出了某种警示:正在逼近的人工智能时代是“视觉的工业化”。

“没有物能感知我”,画家保罗·克里(Paul Klee)曾在他的手稿中写下这么一句话。这个当是颇为惊人的陈述似乎近来成为了一个客观事实,某种真相。毕竟,难道人们不正是在讨论在近未来制造出某种“视觉机器”,它不仅能识别轮廓和形状,也能极其完整地解释整个视觉领域?难道人们不正是在讨论这样一种视觉科学新技术:一台电脑能控制摄像头,以让人们实现“不可见的”视觉?此类的技术可以被运用在工业生产,库存管理,军用机器人等各种领域。


1501年, 《赫雷乌德球论》,罗马国立中央图书馆。图片来自网络。

6. 结语

当我们回溯火坛祭里的古代几何学,最早的神经网络感知机的计算矩阵,和自动驾驶工具复杂的导航系统,或许这些不同的空间逻辑能共同厘清算法如何作为一种形式浮现,而非一种技术演绎。火坛祭是“涌现”算法的一个例子,在于它对社会和宗教仪式空间的组织方式进行的编码。这种仪式的象征性功能,是通过寻常的方式重构神明;这种重构实践也象征着在“一”里对“多”的表述(或者通过“多”,进行对“一”的计算)。宗教仪式的社会功能之一也是教育实践者基础的几何功能,来搭建坚固的房屋。21 火坛祭也是一种算法思考的形式,它遵循特定的原始逻辑,和直观的计算几何学。

感知机也是一种涌现算法,它通过对空间的分割,尤其是对视觉数据的空间矩阵排列,进行编码。感知机的光感受器矩阵定义了一个闭合域,并且运行了一种能根据数据的空间关系对其进行运算的算法。在这里,算法也呈现为一种涌现过程——某一进程或规律经过不断的重复被整理和清晰化。所有的机器学习算法都是涌现过程,在过程中,类似规律的反复出现将“教会”机器,规律也成为一种统计学分布。22 自动驾驶车是此类复杂涌现算法的案例,它发源于一种对空间的复杂建构,亦即把道路环境视为交通代码和本能规律的社会建制。这些自动驾驶算法把本能规律和特定地点的交通代码记录下来后,试图预测在一个繁忙的街道上可能会发生的事情。在自动驾驶的语境里,算法公司对于“自动化乌托邦”的想象是不再需要人类司机,道路场景的视觉空间本身会决定地图如何导航。

火坛祭,感知机和自动驾驶的人工智能系统,在不同意义上都建立了自计算空间和涌现算法(也许这所有都属于劳动的“不可见化”形式)。计算空间或者自计算空间的概念也尤其强调了机器学习算法和人工智能都属于涌现系统,基于某种寻常的,对时间,空间,劳动和社会关系的物质性的区分。机器学习从古代对边界和身体进行标注,对人和货物进行计数等抽象方法和仪式所构成的网格之间涌现出来。从这个层面上,机器学习是从社会劳动的延伸中涌现出来的。尽管它通常遭遇框限和批判,人工智能并非纯粹“人工”或“异质”的:在常见的意识形态神秘化过程中,它呈现为一种像古代剧场里的“天降之神”(deus ex machina)的状态。但这种论述其实也掩盖了一个现实:人工智能事实上是从世界的智能中浮现出来的。



© 2019 e-flux and the author


Anatomy of an AI System

The Amazon Echo as an anatomical map of human labor, data and planetary resources


Text Kate Crawford & Vladan Joler 文 凯特· 克劳福德 & 瓦拉丹·卓勒

A cylinder sits in a room. It is impassive, smooth, simple and small. It stands 14.8cm high, with a single blue-green circular light that traces around its upper rim. It is silently attending. A woman walks into the room, carrying a sleeping child in her arms, and she addresses the cylinder.

‘Alexa, turn on the hall lights’

The cylinder springs into life. ‘OK.’ The room lights up. The woman makes a faint nodding gesture, and carries the child upstairs.

This is an interaction with Amazon’s Echo device. 3 A brief command and a response is the most common form of engagement with this consumer voice-enabled AI device. But in this fleeting moment of interaction, a vast matrix of capacities is invoked: interlaced chains of resource extraction, human labor and algorithmic processing across networks of mining, logistics, distribution, prediction and optimization. The scale of this system is almost beyond human imagining. How can we begin to see it, to grasp its immensity and complexity as a connected form? We start with an outline: an exploded view of a planetary system across three stages of birth, life and death, accompanied by an essay in 21 parts. Together, this becomes an anatomical map of a single AI system.


The scene of the woman talking to Alexa is drawn from a 2017 promotional video advertising the latest version of the Amazon Echo. The video begins, “Say hello to the all-new Echo” and explains that the Echo will connect to Alexa (the artificial intelligence agent) in order to “play music, call friends and family, control smart home devices, and more.” The device contains seven directional microphones, so the user can be heard at all times even when music is playing. The device comes in several styles, such as gunmetal grey or a basic beige, designed to either “blend in or stand out.” But even the shiny design options maintain a kind of blankness: nothing will alert the owner to the vast network that subtends and drives its interactive capacities. The promotional video simply states that the range of things you can ask Alexa to do is always expanding. “Because Alexa is in the cloud, she is always getting smarter and adding new features.”

How does this happen? Alexa is a disembodied voice that represents the human-AI interaction interface for an extraordinarily complex set of information processing layers. These layers are fed by constant tides: the flows of human voices being translated into text questions, which are used to query databases of potential answers, and the corresponding ebb of Alexa’s replies. For each response that Alexa gives, its effectiveness is inferred by what happens next:

Is the same question uttered again? (Did the user feel heard?)
Was the question reworded? (Did the user feel the question was understood?)
Was there an action following the question? (Did the interaction result in a tracked response: a light turned on, a product purchased, a track played?)

With each interaction, Alexa is training to hear better, to interpret more precisely, to trigger actions that map to the user’s commands more accurately, and to build a more complete model of their preferences, habits and desires. What is required to make this possible? Put simply: each small moment of convenience – be it answering a question, turning on a light, or playing a song – requires a vast planetary network, fueled by the extraction of non-renewable materials, labor, and data. The scale of resources required is many magnitudes greater than the energy and labor it would take a human to operate a household appliance or flick a switch. A full accounting for these costs is almost impossible, but it is increasingly important that we grasp the scale and scope if we are to understand and govern the technical infrastructures that thread through our lives.


The Salar, the world's largest flat surface, is located in southwest Bolivia at an altitude of 3,656 meters above sea level. It is a high plateau, covered by a few meters of salt crust which are exceptionally rich in lithium, containing 50% to 70% of the world's lithium reserves. 4 The Salar, alongside the neighboring Atacama regions in Chile and Argentina, are major sites for lithium extraction. This soft, silvery metal is currently used to power mobile connected devices, as a crucial material used for the production of lithium-Ion batteries. It is known as ‘grey gold.’ Smartphone batteries, for example, usually have less than eight grams of this material. 5Each Tesla car needs approximately seven kilograms of lithium for its battery pack. 6 All these batteries have a limited lifespan, and once consumed they are thrown away as waste. Amazon reminds users that they cannot open up and repair their Echo, because this will void the warranty. The Amazon Echo is wall-powered, and also has a mobile battery base. This also has a limited lifespan and then must be thrown away as waste.

According to the Aymara legends about the creation of Bolivia, the volcanic mountains of the Andean plateau were creations of tragedy. 7 Long ago, when the volcanos were alive and roaming the plains freely, Tunupa - the only female volcano – gave birth to a baby. Stricken by jealousy, the male volcanos stole her baby and banished it to a distant location. The gods punished the volcanos by pinning them all to the Earth. Grieving for the child that she could no longer reach, Tunupa wept deeply. Her tears and breast milk combined to create a giant salt lake: Salar de Uyuni. As Liam Young and Kate Davies observe, “your smart-phone runs on the tears and breast milk of a volcano. This landscape is connected to everywhere on the planet via the phones in our pockets; linked to each of us by invisible threads of commerce, science, politics and power.” 8


Our exploded view diagram combines and visualizes three central, extractive processes that are required to run a large-scale artificial intelligence system: material resources, human labor, and data. We consider these three elements across time – represented as a visual description of the birth, life and death of a single Amazon Echo unit. It’s necessary to move beyond a simple analysis of the relationship between an individual human, their data, and any single technology company in order to contend with with the truly planetary scale of extraction. Vincent Mosco has shown how the ethereal metaphor of ‘the cloud’ for offsite data management and processing is in complete contradiction with the physical realities of the extraction of minerals from the Earth’s crust and dispossession of human populations that sustain its existence. 9 Sandro Mezzadra and Brett Nielson use the term ‘extractivism’ to name the relationship between different forms of extractive operations in contemporary capitalism, which we see repeated in the context of the AI industry. 10 There are deep interconnections between the literal hollowing out of the materials of the earth and biosphere, and the data capture and monetization of human practices of communication and sociality in AI. Mezzadra and Nielson note that labor is central to this extractive relationship, which has repeated throughout history: from the way European imperialism used slave labor, to the forced work crews on rubber plantations in Malaya, to the Indigenous people of Bolivia being driven to extract the silver that was used in the first global currency. Thinking about extraction requires thinking about labor, resources, and data together. This presents a challenge to critical and popular understandings of artificial intelligence: it is hard to ‘see’ any of these processes individually, let alone collectively. Hence the need for a visualization that can bring these connected, but globally dispersed processes into a single map.


If you read our map from left to right, the story begins and ends with the Earth, and the geological processes of deep time. But read from top to bottom, we see the story as it begins and ends with a human. The top is the human agent, querying the Echo, and supplying Amazon with the valuable training data of verbal questions and responses that they can use to further refine their voice-enabled AI systems. At the bottom of the map is another kind of human resource: the history of human knowledge and capacity, which is also used to train and optimize artificial intelligence systems. This is a key difference between artificial intelligence systems and other forms of consumer technology: they rely on the ingestion, analysis and optimization of vast amounts of human generated images, texts and videos.


When a human engages with an Echo, or another voice-enabled AI device, they are acting as much more than just an end-product consumer. It is difficult to place the human user of an AI system into a single category: rather, they deserve to be considered as a hybrid case. Just as the Greek chimera was a mythological animal that was part lion, goat, snake and monster, the Echo user is simultaneously a consumer, a resource, a worker, and a product. This multiple identity recurs for human users in many technological systems. In the specific case of the Amazon Echo, the user has purchased a consumer device for which they receive a set of convenient affordances. But they are also a resource, as their voice commands are collected, analyzed and retained for the purposes of building an ever-larger corpus of human voices and instructions. And they provide labor, as they continually perform the valuable service of contributing feedback mechanisms regarding the accuracy, usefulness, and overall quality of Alexa’s replies. They are, in essence, helping to train the neural networks within Amazon’s infrastructural stack.


Anything beyond the limited physical and digital interfaces of the device itself is outside of the user’s control. It presents a sleek surface with no ability to open it, repair it or change how it functions. The object itself is a very simple extrusion of plastic representing a collection of sensors – its real power and complexity lies somewhere else, far out of sight. The Echo is but an ‘ear’ in the home: a disembodied listening agent that never shows its deep connections to remote systems.

In 1673, the Jesuit polymath, Athanasius Kircher, invented the statua citofonica – the ‘talking statue.’ Kircher was an extraordinary interdisciplinary scholar and inventor. In his lifetime he published forty major works across the fields of medicine, geology, comparative religion and music. He invented the first magnetic clock, many early automatons, and the megaphone. His talking statue was a very early listening system: essentially a microphone made from a huge spiral tube, which could convey the conversations from a public square and up through the tube, and then piped through the mouth of a statue kept within an aristocrat’s private chambers. As Kircher wrote:

“This statue must be located in a given place, in order to allow the end section of the spiral-shaped tube to precisely correspond to the opening of the mouth. In this manner it will be perfect, and capable to emit clearly any kind of sound: in fact the statue will be able to speak continuously, uttering in either a human or animal voice: it will laugh or sneer; it will seem to really cry or moan; sometimes with great astonishment it will strongly blow. If the opening of the spiral shaped tube is located in correspondence to an open public space, all human words pronounced, focused in the conduit, would be replayed through the mouth of the statue.” 11

The listening system could eavesdrop on everyday conversations in the piazza, and relay them to the 17th century Italian oligarchs. Kircher’s talking statue was an early form of information extraction for the elites – people talking in the street would have no indication that their conversations were being funneled to those who would instrument that knowledge for their own power, entertainment and wealth. People inside the homes of aristocrats would have no idea how a magical statue was speaking and conveying all manner of information. The aim was to obscure how the system worked: an elegant statue was all they could see. Listening systems, even at this early stage, were about power, class, and secrecy. But the infrastructure for Kircher’s system was prohibitively expensive – available only to the very few. And so the question remains, what are the full resource implications of building such systems? This brings us to the materiality of the infrastructure that lies beneath.


In his book A Geology of Media, Jussi Parikka suggests that we try to think of media not from Marshall McLuhan’s point of view – in which media are extensions of human senses 12 – but rather as an extension of Earth. 13 Media technologies should be understood in context of a geological process, from the creation and the transformation processes, to the movement of natural elements from which media are built. Reflecting upon media and technology as geological processes enables us to consider the profound depletion of non-renewable resources required to drive the technologies of the present moment. Each object in the extended network of an AI system, from network routers to batteries to microphones, is built using elements that required billions of years to be produced. Looking from the perspective of deep time, we are extracting Earth’s history to serve a split second of technological time, in order to build devices than are often designed to be used for no more than a few years. For example, the Consumer Technology Association notes that the average smartphone lifespan is 4.7 years. 14 This obsolescence cycle fuels the purchase of more devices, drives up profits, and increases incentives for the use of unsustainable extraction practices. From a slow process of elemental development, these elements and materials go through an extraordinarily rapid period of excavation, smelting, mixing, and logistical transport – crossing thousands of kilometers in their transformation. Geological processes mark both the beginning and the end of this period, from the mining of ore, to the deposition of material in an electronic waste dump. For that reason, our map starts and ends with the Earth’s crust. However, all the transformations and movements we depict are only the barest anatomical outline: beneath these connections lie many more layers of fractal supply chains, and exploitation of human and natural resources, concentrations of corporate and geopolitical power, and continual energy consumption.


Drawing out the connections between resources, labor and data extraction brings us inevitably back to traditional frameworks of exploitation. But how is value being generated through these systems? A useful conceptual tool can be found in the work of Christian Fuchs and other authors examining and defining digital labor. The notion of digital labor, which was initially linked with different forms of non-material labor, precedes the life of devices and complex systems such as artificial intelligence. Digital labor – the work of building and maintaining the stack of digital systems – is far from ephemeral or virtual, but is deeply embodied in different activities. 15 The scope is overwhelming: from indentured labor in mines for extracting the minerals that form the physical basis of information technologies; to the work of strictly controlled and sometimes dangerous hardware manufacturing and assembly processes in Chinese factories; to exploited outsourced cognitive workers in developing countries labelling AI training data sets; to the informal physical workers cleaning up toxic waste dumps. These processes create new accumulations of wealth and power, which are concentrated in a very thin social layer.


This triangle of value extraction and production represents one of the basic elements of our map, from birth in a geological process, through life as a consumer AI product, and ultimately to death in an electronics dump. Like in Fuchs’ work, our triangles are not isolated, but linked to one another in the production process. They form a cyclic flow in which the product of work is transformed into a resource, which is transformed into a product, which is transformed into a resource and so on. Each triangle represents one phase in the production process. Although this appears on the map as a linear path of transformation, a different visual metaphor better represents the complexity of current extractivism: the fractal structure known as the Sierpinski triangle.

A linear display does not enable us to show that each next step of production and exploitation contains previous phases. If we look at the production and exploitation system through a fractal visual structure, the smallest triangle would represent natural resources and means of labor, i.e. the miner as labor and ore as product. The next larger triangle encompasses the processing of metals, and the next would represent the process of manufacturing components and so on. The ultimate triangle in our map, the production of the Amazon Echo unit itself, includes all of these levels of exploitation – from the bottom to the very top of Amazon Inc, a role inhabited by Jeff Bezos as CEO of Amazon. Like a pharaoh of ancient Egypt, he stands at the top of the largest pyramid of AI value extraction.


To return to the basic element of this visualization – a variation of Marx’s triangle of production – each triangle creates a surplus of value for creating profits. If we look at the scale of average income for each activity in the production process of one device, which is shown on the left side of our map, we see the dramatic difference in income earned. According to research by Amnesty International, during the excavation of cobalt which is also used for lithium batteries of 16 multinational brands, workers are paid the equivalent of one US dollar per day for working in conditions hazardous to life and health, and were often subjected to violence, extortion and intimidation. 16 Amnesty has documented children as young as 7 working in the mines. In contrast, Amazon CEO Jeff Bezos, at the top of our fractal pyramid, made an average of $275 million a day during the first five months of 2018, according to the Bloomberg Billionaires Index. 17 A child working in a mine in the Congo would need more than 700,000 years of non-stop work to earn the same amount as a single day of Bezos’ income.

Many of the triangles shown on this map hide different stories of labor exploitation and inhumane working conditions. The ecological price of transformation of elements and income disparities is just one of the possible ways of representing a deep systemic inequality. We have both researched different forms of ‘black boxes’ understood as algorithmic processes, 18 but this map points to another form of opacity: the very processes of creating, training and operating a device like an Amazon Echo is itself a kind of black box, very hard to examine and track in toto given the multiple layers of contractors, distributors, and downstream logistical partners around the world. As Mark Graham writes, “contemporary capitalism conceals the histories and geographies of most commodities from consumers. Consumers are usually only able to see commodities in the here and now of time and space, and rarely have any opportunities to gaze backwards through the chains of production in order to gain knowledge about the sites of production, transformation, and distribution.” 19

One illustration of the difficulty of investigating and tracking the contemporary production chain process is that it took Intel more than four years to understand its supply line well enough to ensure that no tantalum from the Congo was in its microprocessor products. As a semiconductor chip manufacturer, Intel supplies Apple with processors. In order to do so, Intel has its own multi-tiered supply chain of more than 19,000 suppliers in over 100 countries providing direct materials for their production processes, tools and machines for their factories, and logistics and packaging services. 20 That it took over four years for a leading technology company just to understand its own supply chain, reveals just how hard this process can be to grasp from the inside, let alone for external researchers, journalists and academics. Dutch-based technology company Philips has also claimed that it was working to make its supply chain 'conflict-free'. Philips, for example, has tens of thousands of different suppliers, each of which provides different components for their manufacturing processes. 21Those suppliers are themselves linked downstream to tens of thousands of component manufacturers that acquire materials from hundreds of refineries that buy ingredients from different smelters, which are supplied by unknown numbers of traders that deal directly with both legal and illegal mining operations. In The Elements of Power, David S. Abraham describes the invisible networks of rare metals traders in global electronics supply chains: “The network to get rare metals from the mine to your laptop travels through a murky network of traders, processors, and component manufacturers. Traders are the middlemen who do more than buy and sell rare metals: they help to regulate information and are the hidden link that helps in navigating the network between metals plants and the components in our laptops.” 22 According to the computer manufacturing company Dell, complexities of the metal supply chain pose almost insurmountable challenges. 23 The mining of these minerals takes place long before a final product is assembled, making it exceedingly difficult to trace the minerals' origin. In addition, many of the minerals are smelted together with recycled metals, by which point it becomes all but impossible to trace the minerals to their source. So we see that the attempt to capture the full supply chain is a truly gargantuan task: revealing all the complexity of the 21st century global production of technology products.


Supply chains are often layered on top of one another, in a sprawling network. Apple’s supplier program reveals there are tens of thousands of individual components embedded in their devices, which are in turn supplied by hundreds of different companies. In order for each of those components to arrive on the final assembly line where it will be assembled by workers in Foxconn facilities, different components need to be physically transferred from more than 750 supplier sites across 30 different countries. 24 This becomes a complex structure of supply chains within supply chains, a zooming fractal of tens of thousands of suppliers, millions of kilometers of shipped materials and hundreds of thousands of workers included within the process even before the product is assembled on the line.

Visualizing this process as one global, pancontinental network through which materials, components and products flow, we see an analogy to the global information network. Where there is a single internet packet travelling to an Amazon Echo, here we can imagine a single cargo container. 25 The dizzying spectacle of global logistics and production will not be possible without the invention of this simple, standardized metal object. Standardized cargo containers allowed the explosion of modern shipping industry, which made it possible to model the planet as a massive, single factory. In 2017, the capacity of container ships in seaborne trade reached nearly 250,000,000 dead-weight tons of cargo, dominated by giant shipping companies like Maersk of Denmark, the Mediterranean Shipping Company of Switzerland, and France’s CMA CGM Group, each owning hundred of container vessels. 26 For these commercial ventures, cargo shipping is a relatively cheap way to traverse the vascular system of the global factory, yet it disguises much larger external costs.

In recent years, shipping boats produce 3.1% of global yearly CO2 emissions, more than the entire country of Germany. 27 In order to minimize their internal costs, most of the container shipping companies use very low grade fuel in enormous quantities, which leads to increased amounts of sulphur in the air, among other toxic substances. It has been estimated that one container ship can emit as much pollution as 50 million cars, and 60,000 deaths worldwide are attributed indirectly to cargo ship industry pollution related issues annually. 28Even industry-friendly sources like the World Shipping Council admit that thousands of containers are lost each year, on the ocean floor or drifting loose. 29 Some carry toxic substances which leak into the oceans. Typically, workers spend 9 to 10 months in the sea, often with long working shifts and without access to external communications. Workers from the Philippines represent more than a third of the global shipping workforce. 30 The most severe costs of global logistics are born by the atmosphere, the oceanic ecosystem and all it contains, and the lowest paid workers.


The increasing complexity and miniaturization of our technology depends on the process that strangely echoes the hopes of early medieval alchemy. Where medieval alchemists aimed to transform base metals into ‘noble’ ones, researchers today use rare earth metals to enhance the performance of other minerals. There are 17 rare earth elements, which are embedded in laptops and smartphones, making them smaller and lighter. They play a role in color displays, loudspeakers, camera lenses, GPS systems, rechargeable batteries, hard drives and many other components. They are key elements in communication systems from fiber optic cables, signal amplification in mobile communication towers to satellites and GPS technology. But the precise configuration and use of these minerals is hard to ascertain. In the same way that medieval alchemists hid their research behind cyphers and cryptic symbolism, contemporary processes for using minerals in devices are protected behind NDAs and trade secrets.

The unique electronic, optical and magnetic characteristics of rare earth elements cannot be matched by any other metals or synthetic substitutes discovered to date. While they are called ‘rare earth metals’, some are relatively abundant in the Earth’s crust, but extraction is costly and highly polluting. David Abraham describes the mining of dysprosium and Terbium used in a variety of high-tech devices in Jianxi, China. He writes, “Only 0.2 percent of the mined clay contains the valuable rare earth elements. This means that 99.8 percent of earth removed in rare earth mining is discarded as waste called “tailings” that are dumped back into the hills and streams,” creating new pollutants like ammonium. 31 In order to refine one ton of rare earth elements, “the Chinese Society of Rare Earths estimates that the process produces 75,000 liters of acidic water and one ton of radioactive residue.” 32 Furthermore, mining and refining activities consume vast amount of water and generate large quantities of CO2 emissions. In 2009, China produced 95% of the world's supply of these elements, and it has been estimated that the single mine known as Bayan Obo contains 70% of the world's reserves. 33


A satellite picture of the tiny Indonesian island of Bangka tells a story about human and environmental toll of the semiconductor production. On this tiny island, mostly ‘informal’ miners are on makeshift pontoons, using bamboo poles to scrape the seabed, and then diving underwater to suck tin from the surface through giant, vacuum-like tubes. As a Guardian investigation reports “tin mining is a lucrative but destructive trade that has scarred the island's landscape, bulldozed its farms and forests, killed off its fish stocks and coral reefs, and dented tourism to its pretty palm-lined beaches. The damage is best seen from the air, as pockets of lush forest huddle amid huge swaths of barren orange earth. Where not dominated by mines, this is pockmarked with graves, many holding the bodies of miners who have died over the centuries digging for tin.” 34 Two small islands, Bangka and Belitung, produce 90% of Indonesia's tin, and Indonesia is the world's second-largest exporter of the metal. Indonesia's national tin corporation, PT Timah, supplies companies such as Samsung directly, as well as solder makers Chernan and Shenmao, which in turn supply Sony, LG and Foxconn. 35


At Amazon distribution centers, vast collections of products are arrayed in a computational order across millions of shelves. The position of every item in this space is precisely determined by complex mathematical functions that process information about orders and create relationships between products. The aim is to optimize the movements of the robots and humans that collaborate in these warehouses. With the help from an electronic bracelet, the human worker is directed though warehouses the size of airplane hangars, filled with objects arranged in an opaque algorithmic order. 36

Hidden among the thousands of other publicly available patents owned by Amazon, U.S. patent number 9,280,157 represents an extraordinary illustration of worker alienation, a stark moment in the relationship between humans and machines. 37 It depicts a metal cage intended for the worker, equipped with different cybernetic add-ons, that can be moved through a warehouse by the same motorized system that shifts shelves filled with merchandise. Here, the worker becomes a part of a machinic ballet, held upright in a cage which dictates and constrains their movement.

As we have seen time and time again in the research for our map, dystopian futures are built upon the unevenly distributed dystopian regimes of the past and present, scattered through an array of production chains for modern technical devices. The vanishingly few at the top of the fractal pyramid of value extraction live in extraordinary wealth and comfort. But the majority of the pyramids are made from the dark tunnels of mines, radioactive waste lakes, discarded shipping containers, and corporate factory dormitories.


At the end of 19th century, a particular Southeast Asian tree called palaquium gutta became the center of a technological boom. These trees, found mainly in Malaysia, produce a milky white natural latex called gutta percha. After English scientist Michael Faraday published a study in The Philosophical Magazine in 1848 about the use of this material as an electrical insulator, gutta percha rapidly became the darling of the engineering world. It was seen as the solution to the problem of insulating telegraphic cables in order that they could withstand the conditions of the ocean floor. As the global submarine business grew, so did demand for palaquium gutta tree trunks. The historian John Tully describes how local Malay, Chinese and Dayak workers were paid little for the dangerous works of felling the trees and slowly collecting the latex. 38 The latex was processed then sold through Singapore’s trade markets into the British market, where it was transformed into, among other things, lengths upon lengths of submarine cable sheaths.

A mature palaquium gutta could yield around 300 grams of latex. But in 1857, the first transatlantic cable was around 3000 km long and weighed 2000 tons – requiring around 250 tons of gutta percha. To produce just one ton of this material required around 900,000 tree trunks. The jungles of Malaysia and Singapore were stripped, and by the early 1880s the palaquium gutta had vanished. In a last-ditch effort to save their supply chain, the British passed a ban in 1883 to halt harvesting the latex, but the tree was already extinct. 39

The Victorian environmental disaster of gutta percha, from the early origins of the global information society, shows how the relationships between technology and its materiality, environments, and different forms of exploitation are imbricated. Just as Victorians precipitated ecological disaster for their early cables, so do rare earth mining and global supply chains further imperil the delicate ecological balance of our era. From the material used to build the technology enabling contemporary networked society, to the energy needed for transmitting, analyzing, and storing the data flowing through the massive infrastructure, to the materiality of infrastructure: these deep connections and costs are more significant, and have a far longer history, than is usually represented in the corporate imaginaries of AI. 40


Large-scale AI systems consume enormous amounts of energy. Yet the material details of those costs remain vague in the social imagination. It remains difficult to get precise details about the amount of energy consumed by cloud computing services. A Greenpeace report states: “One of the single biggest obstacles to sector transparency is Amazon Web Services (AWS). The world's biggest cloud computer company remains almost completely non-transparent about the energy footprint of its massive operations. Among the global cloud providers, only AWS still refuses to make public basic details on the energy performance and environmental impact associated with its operations.” 41

As human agents, we are visible in almost every interaction with technological platforms. We are always being tracked, quantified, analyzed and commodified. But in contrast to user visibility, the precise details about the phases of birth, life and death of networked devices are obscured. With emerging devices like the Echo relying on a centralized AI infrastructure far from view, even more of the detail falls into the shadows.

While consumers become accustomed to a small hardware device in their living rooms, or a phone app, or a semi-autonomous car, the real work is being done within machine learning systems that are generally remote from the user and utterly invisible to her. In many cases, transparency wouldn’t help much – without forms of real choice, and corporate accountability, mere transparency won’t shift the weight of the current power asymmetries. 42

The outputs of machine learning systems are predominantly unaccountable and ungoverned, while the inputs are enigmatic. To the casual observer, it looks like it has never been easier to build AI or machine learning-based systems than it is today. Availability of open-source tools for doing so in combination with rentable computation power through cloud superpowers such as Amazon (AWS), Microsoft (Azure), or Google (Google Cloud) is giving rise to a false idea of the ‘democratization’ of AI. While ‘off the shelf’ machine learning tools, like TensorFlow, are becoming more accessible from the point of view of setting up your own system, the underlying logics of those systems, and the datasets for training them are accessible to and controlled by very few entities. In the dynamic of dataset collection through platforms like Facebook, users are feeding and training the neural networks with behavioral data, voice, tagged pictures and videos or medical data. In an era of extractivism, the real value of that data is controlled and exploited by the very few at the top of the pyramid.


When massive data sets are used to train AI systems, the individual images and videos involved are commonly tagged and labeled. 43 There is much to be said about how this labelling process abrogates and crystallizes meaning, and further, how this process is driven by clickworkers being paid fractions of a cent for this digital piecework.

In 1770, Hungarian inventor Wolfgang von Kempelen constructed a chess-playing machine known as the Mechanical Turk. His goal, in part, was to impress Empress Maria Theresa of Austria. This device was capable of playing chess against a human opponent and had spectacular success winning most of the games played during its demonstrations around Europe and the Americas for almost nine decades. But the Mechanical Turk was an illusion that allowed a human chess master to hide inside the machine and operate it. Some 160 years later, branded its micropayment based crowdsourcing platform with the same name. According to Ayhan Aytes, Amazon’s initial motivation to build Mechanical Turk emerged after the failure of its artificial intelligence programs in the task of finding duplicate product pages on its retail website.44 After a series of futile and expensive attempts, the project engineers turned to humans to work behind computers within a streamlined web-based system. 45 Amazon Mechanical Turk digital workshop emulates artificial intelligence systems by checking, assessing and correcting machine learning processes with human brainpower. With Amazon Mechanical Turk, it may seem to users that an application is using advanced artificial intelligence to accomplish tasks. But it is closer to a form of ‘artificial artificial intelligence’, driven by a remote, dispersed and poorly paid clickworker workforce that helps a client achieve their business objectives. As observed by Aytes, “in both cases [both the Mechanical Turk from 1770 and the contemporary version of Amazon’s service] the performance of the workers who animate the artifice is obscured by the spectacle of the machine.” 46

This kind of invisible, hidden labor, outsourced or crowdsourced, hidden behind interfaces and camouflaged within algorithmic processes is now commonplace, particularly in the process of tagging and labeling thousands of hours of digital archives for the sake of feeding the neural networks. Sometimes this labor is entirely unpaid, as in the case of the Google’s reCAPTCHA. In a paradox that many of us have experienced, in order to prove that you are not artificial agent, you are forced to train Google’s image recognition AI system for free, by selecting multiple boxes that contain street numbers, or cars, or houses.

As we see repeated throughout the system, contemporary forms of artificial intelligence are not so artificial after all. We can speak of the hard physical labor of mine workers, and the repetitive factory labor on the assembly line, of the cybernetic labor in distribution centers and the cognitive sweatshops full of outsourced programmers around the world, of the low paid crowdsourced labor of Mechanical Turk workers, or the unpaid immaterial work of users. At every level contemporary technology is deeply rooted in and running on the exploitation of human bodies.


In his one-paragraph short story "On Exactitude in Science", Jorge Luis Borges presents us with an imagined empire in which cartographic science became so developed and precise, that it needed a map on the same scale as the empire itself. 47

“...In that Empire, the Art of Cartography attained such Perfection that the map of a single Province occupied the entirety of a City, and the map of the Empire, the entirety of a Province. In time, those Unconscionable Maps no longer satisfied, and the Cartographers Guilds struck a Map of the Empire whose size was that of the Empire, and which coincided point for point with it. The following Generations, who were not so fond of the Study of Cartography as their Forebears had been, saw that that vast map was Useless, and not without some Pitilessness was it, that they delivered it up to the Inclemencies of Sun and Winters. In the Deserts of the West, still today, there are Tattered Ruins of that Map, inhabited by Animals and Beggars; in all the Land there is no other Relic of the Disciplines of Geography.”

Current machine learning approaches are characterized by an aspiration to map the world, a full quantification of visual, auditory, and recognition regimes of reality. From cosmological model for the universe to the world of human emotions as interpreted through the tiniest muscle movements in the human face, everything becomes an object of quantification. Jean-François Lyotard introduced the phrase “affinity to infinity” to describe how contemporary art, techno-science and capitalism share the same aspiration to push boundaries towards a potentially infinite horizon. 48 The second half of the 19th century, with its focus on the construction of infrastructure and the uneven transition to industrialized society, generated enormous wealth for the small number of industrial magnates that monopolized exploitation of natural resources and production processes.

The new infinite horizon is data extraction, machine learning, and reorganizing information through artificial intelligence systems of combined human and machinic processing. The territories are dominated by a few global mega-companies, which are creating new infrastructures and mechanisms for the accumulation of capital and exploitation of human and planetary resources.

Such unrestrained thirst for new resources and fields of cognitive exploitation has driven a search for ever deeper layers of data that can be used to quantify the human psyche, conscious and unconscious, private and public, idiosyncratic and general. In this way, we have seen the emergence of multiple cognitive economies from the attention economy, 49 the surveillance economy, the reputation economy, 50 and the emotion economy, as well as the quantification and commodification of trust and evidence through cryptocurrencies.

Increasingly, the process of quantification is reaching into the human affective, cognitive, and physical worlds. Training sets exist for emotion detection, for family resemblance, for tracking an individual as they age, and for human actions like sitting down, waving, raising a glass, or crying. Every form of biodata – including forensic, biometric, sociometric, and psychometric – are being captured and logged into databases for AI training. That quantification often runs on very limited foundations: datasets like AVA which primarily shows women in the ‘playing with children’ action category, and men in the ‘kicking a person’ category. The training sets for AI systems claim to be reaching into the fine-grained nature of everyday life, but they repeat the most stereotypical and restricted social patterns, re-inscribing a normative vision of the human past and projecting it into the human future.


"The 'enclosure' of biodiversity and knowledge is the final step in a series of enclosures that began with the rise of colonialism. Land and forests were the first resources to be 'enclosed' and converted from commons to commodities. Later on, water resources were 'enclosed' through dams, groundwater mining and privatization schemes. Now it is the turn of biodiversity and knowledge to be 'enclosed' through intellectual property rights (IPRs),” Vandana Shiva explains. 51 In Shiva’s words, “the destruction of commons was essential for the industrial revolution, to provide a supply of natural resources for raw material to industry. A life-support system can be shared, it cannot be owned as private property or exploited for private profit. The commons, therefore, had to be privatized, and people's sustenance base in these commons had to be appropriated, to feed the engine of industrial progress and capital accumulation." 52

While Shiva is referring to enclosure of nature by intellectual property rights, the same process is now occurring with machine learning – an intensification of quantified nature. The new gold rush in the context of artificial intelligence is to enclose different fields of human knowing, feeling, and action, in order to capture and privatize those fields. When in November 2015 DeepMind Technologies Ltd. got access to the health records of 1.6 million identifiable patients of Royal Free hospital, we witnessed a particular form of privatization: the extraction of knowledge value. 53 A dataset may still be publicly owned, but the meta-value of the data – the model created by it – is privately owned. While there are many good reasons to seek to improve public health, there is a real risk if it comes at the cost of a stealth privatization of public medical services. That is a future where expert local human labor in the public system is augmented and sometimes replaced with centralized, privately-owned corporate AI systems, that are using public data to generate enormous wealth for the very few.


At this moment in the 21st century, we see a new form of extractivism that is well underway: one that reaches into the furthest corners of the biosphere and the deepest layers of human cognitive and affective being. Many of the assumptions about human life made by machine learning systems are narrow, normative and laden with error. Yet they are inscribing and building those assumptions into a new world, and will increasingly play a role in how opportunities, wealth, and knowledge are distributed.

The stack that is required to interact with an Amazon Echo goes well beyond the multi-layered ‘technical stack’ of data modeling, hardware, servers and networks. The full stack reaches much further into capital, labor and nature, and demands an enormous amount of each. The true costs of these systems – social, environmental, economic, and political – remain hidden and may stay that way for some time.

We offer up this map and essay as a way to begin seeing across a wider range of system extractions. The scale required to build artificial intelligence systems is too complex, too obscured by intellectual property law, and too mired in logistical complexity to fully comprehend in the moment. Yet you draw on it every time you issue a simple voice command to a small cylinder in your living room: ‘Alexa, what time is it?”

And so the cycle continues.


Matteo Pasquinelli (PhD) is Professor in Media Philosophy at the University of Arts and Design, Karlsruhe, where he coordinates the research group KIM (Künstliche Intelligenz und Medienphilosophie / Artificial Intelligence and Media Philosophy). For Verso he is preparing a monograph on the genealogy of artificial intelligence as division of labor, which is titled The Eye of the Master: Capital as Computation and Cognition.
© 2019 e-flux and the author






这一切是如何发生的呢?Alexa是一个无实体的拟人声音,代表着为一系列极其复杂的信息处理网络而设计的人机交互界面。这些网络收到持续不断的,宛如潮汐般的馈送:人类的声音流如潮水般涌进网络,被翻译成文字,查询和匹配数据库中的潜在答案,Alexa的回应则是这一波潮水的尾声 。对于Alexa给出的每个响应,其有效性可以从以下问题来推断:





乌尤尼盐沼是世界上最大的平坦地表,它位于玻利维亚西南部,海拔3656米。这是一个被好几米厚的盐壳所覆盖的高原,这些盐壳的锂含量异常丰富,占世界锂储量的50%至70%。撒拉尔与邻近的智利阿塔卡马地区和阿根廷一起,成为了锂开采的主要地点。这种柔软的银色金属目前被用于移动互联设备的供电,是生产锂离子电池的关键材料,被称为“灰金”。例如,智能手机电池通常含有少于8克的锂。每辆特斯拉汽车的电池组大约需要7千克锂。所有这些电池的使用寿命都有限,一旦用完,它们就会被当垃圾扔掉。亚马逊提醒用户不可以自己打开修理Echo,因为这将使保修失效。 亚马逊 Echo采用插座供电,并配有移动电池。但这也只有有限的使用寿命,用完后必须作为垃圾扔掉。

根据艾马拉人中关于玻利维亚创造的传说,安第斯高原的火山山脉源自一个悲剧。很久以前,当火山仍然活跃并自由地“漫步”平原时,图努帕(Tunupa)——唯一的女性火山——生下了一个婴儿。因为嫉妒,雄性火山偷走了她的宝宝并把它放逐到了一个遥远的地方。众神通过将火山全部钉在地球表面上来惩罚火山。图努帕为她死去的孩子而悲泣,她的眼泪和母乳结合在一起,形成了一个巨大的盐湖:乌尤尼盐沼。正如李安×杨(Liam Young)和凯特×戴维斯(Kate Davies)所说,“你的智能手机依靠火山的眼泪和母乳。这片土地通过我们口袋里的手机与地球上的任何地方连接起来; 通过无形的商业、科学、政治和权力线索与我们每个人联系起来。”

我们的分解视图结合并可视化了大规模人工智能系统运行所需的三个主要“采掘”过程:物质资源、人类劳力、和数据。我们考虑这三个因素在时间上的变化——通过对单个亚马逊Echo的诞生、生命与死亡的视觉描绘来展现。为了真正体现出采掘过程的全球规模,我们有必要越过那种对于个体、个体数据,个体技术公司三者之间关系的简单分析。文森特·莫斯科(Vincent Mosco)已经说明了“云”这种对于离线数据管理和数据处理的缥缈隐喻,与通过强征人口来开采地壳矿物的物理现实是彻底矛盾的。桑德罗·梅扎德拉(Sandro Mezzadra)和布雷特·尼尔森(Brett Nielson)用“采掘主义”这个术语来命名当代资本主义中各种采掘性操作形式之间的关系,恰恰是我们在人工智能产业的情境中不断见证的。挖空地球和生物圈的资源,与AI中的数据采集、将人类沟通和社交行为货币化具有非常深入的互连关系。 梅扎德拉和尼尔森指出,劳动是这种采掘关系的核心,这种关系在历史上一再重复:从欧洲帝国主义使用奴隶劳动,到马来亚橡胶种植园被强迫工作的工人,再到玻利维亚的土著被驱使提取在第一全球货币中使用的白银。当我们思考“采掘”的概念时,需要同时考虑到劳动力,资源和数据三个要素。这对人工智能的批判性理解和它的流行解释提出了挑战:单独“看到”任何这些过程都很难,更不用说整体性地了解。因此,需要一种可视化方式,可以将这些原本分散的流程连接整合到一个完整图景中。



超出设备的物理及数字界面之外的任何东西,都不在用户的控制范围之内。Echo有着光滑的表面,无法打开,无法修复,也无法改变其内在功能。圆筒本身是一个非常简单的集合了传感器的塑料产品——其真正的能力和复杂性位于远远看不见的地方。 Echo只是家中的一个 “耳朵”:一个无实体的、从未显示出与远程系统深层联系的听觉代理。

1673年,耶稣会的博学家阿塔纳修斯·基歇尔(Athanasius Kircher)发明了对讲雕像- 即“会说话的雕像” 。基歇尔是一位非凡的跨学科学者和发明家。在其一生中,他出版了多达四十件主要作品,覆盖医学、地质学、比较宗教学和音乐等不同领域。他还发明了第一个磁钟,许多早期的自动机,以及扩音器。 这个可以说话的雕像是一个非常早期的聆听系统:它本质上是一个由巨大的螺旋管制成的麦克风,一边采集来自公共广场的对话声音,一边通过管道传达到在贵族私人房间内的雕像口中。正如基歇尔所写:


这一听力系统可以窃听广场上的日常对话,并将其传达给17世纪的意大利寡头政府。 基歇尔的谈话雕像是社会精英提取信息的早期形式——在街上说话的人将完全不知道他们的谈话内容正在被其他人利用,用以掌握权力、制造娱乐和财富。贵族家中的人们并不知道这么一个神奇的雕像是如何说话和传达各种信息的。雕像的目的就是模糊整个系统的运作方式:人们只能看到一个优雅的雕像。即使在这一早期阶段,听力系统就已经是为了权力、阶级和秘密服务的。但基歇尔系统的基础设施非常昂贵——仅限少数人使用。这个问题至今仍然存在,构建此类系统的全部资源到底需要多少?所以,我们需要了解底层基础设施的物质性。


杰西·帕瑞卡(Jussi Parikka)在其《媒体地质学》一书中提出,我们不要试图从马歇尔·麦克卢汉的观点来看待媒体——媒体是人类感官的延伸 ——取而代之的我们要把媒体看成地球的延伸。媒体技术应该在地质过程的背景下理解,从创造和转化过程,到构建媒体的自然元素的运动。反复思考媒体和技术作为一种地质过程,能让我们仔细考虑当前技术所必需的不可再生资源的深度消耗。AI系统扩展网络中的每个对象,从路由器到电池到麦克风,都是使用数十亿年才能生成的元素构建的。从深度时间的角度来看,我们正在“采掘”地球的历史以服务于技术时间的一瞬,建造使用不超过几年的设备。例如,消费者技术协会指出智能手机的平均寿命为4.7年。这种过时淘汰的循环促使人们购买更多设备,驱动更多的商业利润,同时提高对不可持续的采掘操作的奖励。元素和物质材料来自于一个非常缓慢的元素发展过程,却经历了一个极其迅速的挖掘、冶炼、混合和物流运输的阶段——在这种转化之中穿越数千公里。地质过程标志着这一阶段的开始和结束,从矿石开采到电子垃圾堆中的材料沉积。出于这个原因,我们的地图以地壳开始也以其结束。然而,我们所描绘的所有转化和运动只是最基础的解剖轮廓:在这些联系之下存在更多层次分形状态的供应链、人类资源和自然资源的开发、企业和地缘政治力量的集中以及持续的能源消耗。

在资源、劳动力和数据挖掘之间建立的联系让我们不可避免地回到传统的剥削框架。但是这些系统是如何产生价值的?可以在克里斯蒂安·富克斯(Christian Fuchs)和其他作者检视和定义数字劳动的文章中找到一个有用的概念工具。数字劳动的概念最初与不同形式的非物质劳动形式有关,它的出现先于人工智能之类的设备和复杂系统。数字劳动——建立和维护数字系统堆栈(stack)的工作——绝不是一种虚拟或一时热度的工作,而是深入地体现在不同的活动中。其涉及的范围极广:矿山中的契约劳工开采构成信息技术物理基础的矿物;中国工厂中严格控制、但有时却很危险的硬件制造和装配过程;发展中国家利用外包工人来标记AI训练数据集;以及非正式的体力劳动者清理有毒废物堆。这些过程创造并积累了新的财富和权力,但却集中在一个非常小的社会层面。

Product of labour(Subject - object): 劳动产品(主体 - 客体)
Labor power(Subject): 劳动力(主体)

Means of production: 生产资料

Marx’s dialectic of subject and object in economy: 马克思对于经济的主客体辩证法


线性的展示不能显示出生产和开发的每一步都是包含前面阶段的。如果我们通过分形视觉结构来看待生产和开发系统,最小的三角将代表自然资源和劳动力,即矿工作为劳动力,矿石作为产品。下一个更大的三角形包括金属加工,再下一个将代表制造零件的过程等等。我们地图中最终的三角形,即亚马逊Echo单元本身的生产和制造,包括所有层次的剥削——从亚马逊公司的底部到顶部,即杰夫·贝索斯(Jeff Bezos)以亚马逊首席执行官身份所扮演的角色。像古代埃及的法老一样,他站在最大的人工智能价值采掘金字塔的顶端。



回到这种可视化——马克思生产三角的一种变形——的基本要素,每个三角在创造利润时产生了剩余价值。如果我们查看一台设备生产过程中每项活动的平均收入规模(在地图左侧显示),我们会发现收入的巨大差异。根据国际特赦组织(Amnesty International)的研究,在挖掘钴元素(也被用于制作16个跨国品牌的锂电池)期间,工人每天可获得相当于1美元的工资,却要在危及生命和健康的环境下工作并常常受到暴力、勒索、和恐吓。国际特赦组织还调查到有年仅7岁的儿童在矿场工作。相比之下,亚马逊首席执行官杰夫·贝索斯位于该分形金字塔的顶端,根据彭博亿万富翁指数(Bloomberg Billionaires Index),他在2018年前五个月平均每天收入2. 75亿美元。在刚果的一个矿场工作的孩子需要超过70万年的不间断工作才能获得与贝索斯一天收入相等的金额。

这张地图上显示的许多三角形都隐藏着劳动剥削和不人道工作条件的故事。元素转换对生态的影响和收入差距只是表现这种深层的系统性不平等的方式之一。我们也研究了不同形式的“黑箱”,即不透明的算法过程,但这张地图还指向另一种形式的不透明:创造、训练和操作像亚马逊Echo这样的设备的过程,本身就是一种黑匣子,考虑到世界各地的多层承包商、分销商和下游物流合作伙伴,我们很难对其进行检查和跟踪。正如马克·格雷厄姆(Mark Graham)所写,“当代资本主义掩盖了消费者对大多数商品的历史和地理位置的了解。消费者通常只能在当下的空间和时间中看到商品,很少有机会通过生产链向过去凝视,以获得有关生产、转化和分销地点的知识。”

调查和跟踪当代生产链流程难度的一个例证是,英特尔用了四年多的时间来充分了解其供应线,以确保其微处理器产品中没有来自刚果的钽元素。作为半导体芯片制造商,英特尔为苹果提供处理器。为了做到这一点,英特尔拥有自己的多层供应链,在100多个国家拥有超过19,000家供应商,为其工厂、物流和包装服务直接提供生产流程、工具和机器。一家领先的科技公司花了整整四年的时间才了解自己的供应链,揭示了这个过程从内部掌握的难度,更不用说外部研究人员、记者和学者了。总部位于荷兰的科技公司飞利浦也声称其正在努力使其供应链“无冲突”。比如,飞利浦拥有数万家不同的供应商,每家供应商都为其制造流程提供不同的组件。这些供应商又与成千上万的组件制造商连接在一起,这些制造商从数百家炼油厂购买材料,这些炼油厂从不同的冶炼厂购买原料,这些冶炼厂由未知数量的贸易商提供,直接涉及合法和非法采矿业务。在《权利的元素》中,大卫·S·亚伯拉罕(David S. Abraham)描述了全球电子供应链中稀有金属交易商的无形网络:“从矿山到笔记本电脑的稀有金属网络通过交易商、处理器和元件制造商的模糊网络传播。交易员不仅仅是购买和销售稀有金属的中间商:他们也在调控信息,是金属工厂和我们笔记本电脑组件之间的隐藏环节。” 根据计算机制造公司戴尔的说法,金属供应链的复杂性带来了几乎无法克服的挑战。这些矿物的开采早在最终产品组装之前就开始了,这使得追踪矿物的来源非常困难。另外,许多矿物质与再生金属一起冶炼,几乎不可能将矿物质追溯到其来源。因此,我们看到理解完整供应链的尝试是一项实实在在庞大的任务:揭示了21世纪全球技术产品生产的复杂性。



如果将这一过程想象为一个材料、零件和产品流动所凭借的全球性、泛大陆网络,我们会看到它与全球信息网络的相似之处。信息网络中,单个因特网包(Internet Pack)会行进至亚马逊Echo系统之中,我们在此可以将其想象成一个单独的货物集装箱。全球物流和生产如今令人炫目的壮观景象离不开这种简单又标准化的金属物发明。由于标准化集装箱的存在,使得现代船运工业可以将地球模型化为一个大型、单一的工厂,从而迎来了爆炸性的发展。2017年,海运交易的集装箱运货船总容量接近250,000,000固定负载,主要来自于丹麦马士基集团、瑞士地中海航运公司,以及法国达飞海运集团等船运巨头公司,其中每一家都拥有数百艘集装箱货船。对于这些商业投资者来说,船运是一种能够穿行于全球工厂的复杂网络中相对廉价的方式,但它掩盖了更大的外部成本。近些年来,货运船每年排放的二氧化碳占全球排放量的3.1%,比整个德国的排放量还要高。为了最小化他们的内部成本,大部分集装箱船运公司大量使用非常劣质的燃料,导致空气中含硫量上升,也带来其它有毒物。据估计,一艘集装箱货运船的污染排放量与5千万辆汽车相当,而这一工业所带来的相关问题也间接导致了全世界每年六万人的死亡。甚至像世界航运工会这样与产业比较亲近的组织也承认,每年都有数千个集装箱在海上丢失,沉入海底或者四处漂流。有些集装箱装有毒性物质,可能会渗入海洋之中。一般来说,劳工们会在海上呆9到10个月的时间,常常要长时间轮班,而且与外界没有联系。菲律宾的劳工在全球货运劳力中的占比超过三分之一。全球物流中最要紧的成本来自于大气,海洋生态系统和其中所包裹的一切,以及低薪劳工。




稀土具有非常独特的电子,光学和磁学性能,人类至今无法找到其它材料或者合成替代品可以与之相提并论。虽然它们被称为“稀土金属”,但有些元素在地壳中的含量是相对丰富的,只是开采起来非常昂贵,并且带来高度污染。大卫·亚伯拉罕(David Abraham)描述了广泛用于高科技设备的元素镝(Dy)和铽(Tb)在中国江西的开采情况。他写到:“被开采的粘土中只有0.2%含有宝贵的稀土元素。这意味着被挖出的稀土矿土中,99.8%都被当成废料弃置,它们被称为“尾矿”(tailings),丢弃回了山川溪流之中,”又产生了比如铵这样新的污染物。为了精炼一吨稀土元素,“中国稀土协会预估这一过程将会产生75000升酸性水,以及一吨放射性残渣。”此外,开采和精炼活动消耗大量的水,排放大量二氧化碳。2009年,中国生产了全世界稀土供应量的95%,据估计,白云鄂博单一矿区就占到世界存量的70%。










编号为20150066283 A1的亚马逊持有专利


在19世纪末,一种名为电木的东南亚特色树木成为了技术爆发的核心。这些树主要来自马来西亚,能够产出一种名为杜仲橡胶的奶白色天然乳胶。1848年,英国科学家迈克尔·法拉第(Michael Faraday)在《哲学杂志》上发表了使用这种材料作为电绝缘体的研究,之后杜仲橡胶迅速风靡了工业世界。它被认为能够解决绝缘电报线经受海底环境的问题。随着全球海底商业的增长,电木的需求也不断增加。历史学家约翰塔利(John Tully)描述了本地马来人,华人以及迪雅克人如何为了微薄的薪水而从事十分危险的树木砍伐和收集乳胶的工作。处理过的乳胶经新加坡的贸易市场销往英国,随后被加工成很多产品,包括绵长无尽的海底电缆护套。一株成熟的电木可以产出200克乳胶。但是,1857年制成的第一条横跨大西洋的电缆大约3000公里长,重达2000吨 – 需要大约250吨的杜仲橡胶。为了生产1吨这种材料,就需要90万根树干。马来西亚和新加坡的丛林被砍光了,等到1880年代早期,电木已经灭绝了。英国为了在最后关头抢救一下自己的供应链,于1883年通过了一项停止采集乳胶的禁令,但这种树木已经绝种了。





我们作为人类主体,与技术平台每次的交互中几乎都是可见的。我们总是可以被追踪,被量化,被分析,以及被商品化。但与用户的可见性相反,这些联网设备的生命周期细节,包括出生,活着和死亡各个阶段的情况,都是模糊的。随着Echo这种依赖于中心化人工智能基础设施的设备出现,这些细节就更加的不为人知。即使消费者逐渐熟悉了卧室里的一台小型硬件设备,或者是一个手机应用,或者是一台半自动驾驶的汽车,但是真正的事件过程是在机器学习系统中完成的,一般来说,用户与该系统距离很远,而且全然无法感知。在很多情况下,透明度并没有多大意义 – 如果用户缺乏真实的选择空间,以及如果企业不能负起责任来,仅仅是透明度无法扭转目前这种权力的不对等。

机器学习系统的输出结果在多数情况下是不负责任且不受治理的,但它的输入数据常常成谜。对于一些漫不经心的旁观者来说,从没有像今天这样可以如此轻易地建造一个人工智能或者以机器学习为基础的系统。触手可得的开源工具,结合亚马逊(AWS)、微软(Azure),或者谷歌(Google Cloud)等云处理巨头所提供的可租借算力,正在推动一种错误的人工智能民主化想象。即使机器学习的工具不再开架售卖,而是变得更为易得,比如TensorFlow,它们鼓励你建立自己的系统,但这些系统的底层逻辑,以及训练数据集只有少数几家公司可以获取,并被他们掌控在手里。像Facebook之类的平台所进行的是动态数据采集,用户的行为,嗓音,标记的图片和视频或者医疗数据都会被用来训练神经网络。在这个采掘主义盛行的时代,数据真正的价值被少数金字塔顶端的人所控制并掘取。



1770年,匈牙利发明家沃尔夫冈·冯·肯佩伦(Wolfgang von Kempelen)建造了一种下棋机器,被称作土耳其机械人(Mechanical Turk)。他的目的,部分是为了获得奥地利女皇玛利亚·特蕾莎(Empress Maria Theresa)的关注。这个机器能够与人类对手下棋,而且在持续了将近九年的欧洲和美洲巡展期间赢得了大部分的比赛,十分令人惊讶。但是土耳其机械人只是一种障眼法,一位棋手大师藏在机器里面并操作机器下棋。160年后,亚马逊网开始推广基于微支付的众筹平台,同样以土耳其机械人为名。据艾汉·埃蒂斯(Ayhan Aytes)所言,亚马逊曾试图通过人工智能程序来寻找零售网站上的雷同产品页,但这一努力失败了,随后亚马逊便着手开发土耳其机械人。在一系列徒劳而昂贵的尝试之后,项目工程师们转而使用人类劳工,这些劳工躲在计算机的后面工作,身处流水线式的基于Web系统之中。亚马逊的土耳其机械人数字车间(Mechanical Turk digital workshop)效仿人工智能系统,对机器学习过程进行检视,评估以及纠正,但它通过人脑来完成这一过程。对于亚马逊土耳其机械人的用户来说,应用程序完成任务所使用的,似乎是高级人工智能系统。但它更接近于一种“仿冒人工智能”,通过远程且分散的廉价数据标注劳工来帮助客户完成他们的生意目标。正如埃蒂斯所观察,“在这两个案例中(1770年的土耳其机械人以及亚马逊服务的当代版本),工人激活了这些骗人的诡计,但他们的工作被机器的奇观所掩盖。”





在他的一段式短故事“关于科学的精确性”中,豪尔赫·路易斯·博尔赫斯(Jorge Luis Borges )为我们呈现了一个虚构的帝国,该国制图科学十分发达,测量极其精确,以至于绘制一张帝国地图的尺寸与帝国实际大小相当。


现在的机器学习方法有一种对于描绘世界的渴望,一种对于真实视觉、声音和识别机制可以全面量化的渴望。从宇宙学模型到使用最细微的人脸肌肉运动转译人类的情感世界,所有的一切都变成了量化的对象。让·弗朗索瓦·利奥塔(Jean-François Lyotard)引入了术语“趋近永恒”(affinity to infinity)来描述当代艺术、技术科学和资本主义都同样渴望将自己的边界推向一种潜在的永恒视界。19世纪下半叶,随着社会重心聚焦于基础设施的建设和不均匀的工业化转向,为一小撮垄断自然资源开采和生产过程的工业巨头创造了巨量的财富。



量化的过程越来越深入人类情感,认知以及物理世界。训练集存在的目的是为了侦测情绪,寻找家族相似性(family resemblance),追踪不断变老的个体,以及识别坐下,挥手,举杯,或者哭泣等人类动作。每一形式的生物数据–包括法证、生物计量,社会经济,以及心理测量–都会被采集,并录入数据库以便训练人工智能。这一量化过程的基础通常非常局限:在类似于元视觉行为(AVA)这样的数据集中,女性主要位于“与孩子玩耍”的动作类别,而男人主要位于“踢人”的类别。人工智能系统的训练集声称可以深入到日常生活中极其细微的特征,但实际上却一直重复最陈腐和局限的社会模式,重新印刻一种人类过去的范式,将其投射到未来。



“对于生物多样性和知识的“圈地”是自殖民主义兴起以来所进行的一系列“圈地”活动中的最后一步。陆地和森林是最早被“圈地”的资源,它们从社会公共资源被转化为商品。之后,人类通过兴建大坝,地下水开采以及私有化对水资源进行了“圈地”。现在轮到了生物多样性和知识,它们被“圈地”的方式是通过知识产权(IPRs),”范达娜·希瓦(Vandana Shiva)解释到。在希瓦的话中,“工业发展势必破坏资源的公共性,从而能够获得工业原材料所需的自然资源。而维持生命的系统可以被分享,但却不能作为私人财产被占有或者以个人利益为目的被开采。因此,社会公共资源不得不被私有化,人们赖以为生的基础也不得不被占用,以满足工业前进的动力和资本积累”

虽然希瓦的言论只涉及了知识产权对自然的“圈地”,但同样的过程也正发生在机器学习所带来的问题之中–加剧对于自然的量化。围绕人工智能的新淘金热正在圈定人类知识,感受和行为的不同领域,以便于掌控它们,将其私有化。2015年11月,当深思技术有限公司(Deep Mind)获取了皇家自由医院160万可识别病人的健康资料时,我们见证了一种特殊形式的私有化:对于知识价值的采掘。数据集可能仍然可以为公众享有,但数据的元价值–它所创造的模型–是被私人所拥有的。虽然这样做的原因也有很多是为了促进公众健康,但如果其代价是对于公共医疗服务偷偷摸摸的私有化,那么对于社会来说是具有极大风险的。由此可以想象这样一个未来,在公共系统中的本地专业人工劳力持续扩张,有时会被私有大公司的中心化人工智能系统所取代,而它们使用公众数据来为少数人创造巨额财富。







© 2019 e-flux and the author