The Voice Controlled



Text Irini Papadimitriou  文 伊睿尼·帕帕迪米迪欧

Every atom, impressed with good and with ill, retains at once the motions which philosophers and sages have imparted to it, mixed and combined in ten thousand ways with all that is worthless and base. The air itself is one vast library, on whose pages are for ever written all that man has ever said or woman whispered. There, in their mutable but unerring characters, mixed with the earliest, as well as with the latest sighs of mortality, stand for ever recorded, vows unredeemed, promises unfulfilled, perpetuating in the united movements of each particle, the testimony of man's changeful will.

Charles Babbage

Ninth Bridgewater Treatise (1838), Chap. IX. On The Permanent Impression Of Our Words And Actions On The Globe We Inhabit

Over 180 years ago, Charles Babbage, Victorian polymath and inventor of the first mechanical computer, proposed that the air is a ‘vast library’ of every word ever spoken. If he could be here today he would immediately notice that his proposal had actually become a reality. Contemporary technologies are fulfilling our obsession for ‘absolute recollection’ and Babbage’s thinking resonates today more than ever. Every word, image, action or movement of ours is forever impressed and recorded in an age of cloud computing and connectivity - at a time of digital archives of everything - and in a society of full-scale surveillance.

Starting from this ‘vast library’ and through a combination of phantasmagoric effects and emerging technologies, Rafael Lozano-Hemmer creates an ever-changing audio-visual environment to transport us into a three-dimensional, physical manifestation of Babbage’s proposal. In Atmospheric Memory the invisible “air” of recorded voices, memories or data, materialises in front of our eyes. Invisible clouds of spoken or whispered “voices” are unveiled in front of us and transformed into something we can touch and see. In a theatrical and engaging way, Lozano-Hemmer urges us though to look at these “voices” beyond than mere archives or libraries of words that would be otherwise lost. In a way, Atmospheric Memory presents us with how we have transformed the “air” surrounding us into a giant machine that always listens and records our actions. What are the implications of this obsession with monitoring everything that happens or being said? And can we ever reverse this?

The visionary English mathematician and writer Ada Lovelace, known mainly for her work on Babbage’s Analytical Engine had imagined the Analytical Engine to be able to do a lot more than used as a calculating machine. For instance, she thought the Analytical Engine could be used to produce algorithmic music - something that later became possible for computers, as she predicted. Lovelace envisioned Babbage’s machine to be able to manipulate content such as text, sound or images. Could Lovelace had imagined the Analytical Engine able to capture or acquire a voice?

Sixty years earlier, inventor Wolfgang von Kempelen, known for his chess playing pseudo-automaton - The Turk - made what was probably one of the earliest attempts to build a speaking machine. Kempelen explored the physiology of speech production and created a mechanism to appropriate the human voice. The speaking machine could produce sounds, words and even sentences, although hardly comprehensible. In the early 1860s, a young Alexander Graham Bell created a speaking “head” after seeing an automaton by Charles Wheatstone, inspired by Kempelen’s speaking machine. It will be years later and more research and experiments that will lead to Bell's invention of the telephone, which opened up a new field of voice communication technology, but also a long journey of wiretapping and intercepting voices.

We have since learned to give up our rights to electronic privacy and surrender to a web of listening devices or smart speakers that have taken a place in our lives and occupied our domestic spaces. A whole world of connected objects, from home assistants, TVs and household appliances to wearables, cars, and urban objects and furniture constantly observe and listen to us, but we are getting accustomed to ignore this.

But long before the existence of these now ubiquitous connected devices, prevailed a series of monitoring, eavesdropping systems and architectures of control, from acoustical wall funnels and optical apparatuses to Jeremy Bentham’s panopticon prison structure composed of a central watching tower surrounded by cells. In the 17th century, celebrated scholar and polymath Athanasius Kircher designed statua citofonica or ‘talking statue’, an early intercom system with a series of spiral-shaped funnels hidden in the walls of a building that connected courtyards and public spaces creating a giant listening device. Kate Crawford and Vladan Joler reference Kircher’s statua citofonica in their recent brilliant mapping study of The Amazon Echo - in the context of human labour, data and material resources - talking about the Echo as a listening agent and “ear” in the home.

Alexa, the voice assistant who speaks to us through the Amazon Echo, along with other voice enabled devices and smart speakers that have taken a place in our homes, are not anymore perceived as commercial products; they are ‘actual voices’. By anthropomorphising Alexa, Siri or similar, by giving them a human-like voice, we forget what these systems really are. They become advisors, confidants, educators, they are there for us to converse with and trust them. Yes, every word we speak is probably recorded in the “air”; that invisible mesh of listening systems formed by our Alexas, Siris, Google Homes, thermostats and so on. One day your appliances will know your whereabouts, conversations, journeys, purchases, browsing histories. We tend to look at these devices as individual entities, although they are disembodied parts of a large, complex networked, corporate system. Kircher’s talking statues that reproduce the eavesdropped conversations from the street, picked from the spiral-shaped tubes, hidden in buildings, perfectly depict our surveillance society today.

Danish artist Cecilie Waagner-Falkenstrøm recently created FRANK, an artwork using Artificial Intelligence. FRANK is a humanised voice that can have conversations with people; it is a speaking oracle, a synthetic voice based on machine intelligence that gives personal guidance regarding existential dilemmas. You can talk to FRANK about personal anxieties, hopes, dreams and fears. FRANK will listen and give you “his” insightful direction and counselling. Actually, speaking to FRANK is not so different from how we talk to Siri, Alexa or how we might be interacting in the next few years with more voice interfaces.

Voice is an organ that identifies us and that is uniquely ours. From birth, voice - our first cry - marks our existence to the world, but it’s also one of the main modes of human expression, emotion and communication. More than 200 years after Kempelen’s speaking machine, computer systems that not only recognise, but can also synthesise human speech are commonplace, and our voice can serve as training data sets for teaching machines to speak imitating human speech. Applications such as Baidu, Deep Voice or Lyrebird use machine learning to create human-like artificial voices or enable the cloning of human speech. Imitating any human voice has become a reality that raises serious concerns and questions about how technologies like this could be used but also misused.

Our words are not only forever imprinted and captured in this ‘vast library’ of the Cloud, our (disembodied) voice is also taken over, borrowed by a machine. In The Invention of Morel, a novel by Argentine writer Adolfo Bioy Casares, the main character in the novel, a fugitive, is hiding on a desert island until one day he sees a group of tourists arriving, so he retreats away from them. Among the tourists is a woman whom he spies on and falls in love with her. The fugitive tries to talk to the woman but she ignores him and slowly he realises none of the tourists notice him either. He also notices there are two moons and two suns, conversations are repeated, and the island visitors complain they are cold when it’s hot or swim in a pool or rotting fish. He finally finds out the truth, that one of the island visitors, named Morel, invented a machine capable of reproducing reality. The group of tourists that he sees then are a recording of Morel’s machine looping their actions in a week on the island forever. By recording them though, the machine has captured their souls, so they are all dead.

Voice is a metaphor for agency and power. In a society obsessed with capturing its actions, images and voices, we quickly see the romantic and positivist side of retaining memories, failing to see the context of metrics and surveillance. Who has control over what is kept and what is left behind? And who owns these vast libraries of our voices, actions, or data? In a contemporary society of absolute recollection, people with the weaker voices and the most vulnerable members of the community have even less control over becoming subjected to invisibility or hypervisiblity. So the possibility to go back, be forgotten or escape will be probably available to only the privileged few in the future.


Irini Papadimitriou is a curator, producer and cultural manager, working in the UK and internationally. Currently Creative Director at FutureEverything, an innovation lab and arts organisation in Manchester, she was previously Digital Programmes Manager at the V&A, where she initiated and curated the annual Digital Design Weekend festival and Digital Futures among other programmes. She was also Head of New Media Arts Development at Watermans, where she curated the exhibition programme, exploring digital culture from a critical perspective and the impact of technology in society.

Her most recent exhibition, Artificially Intelligent, was on display at the V&A from September to December 2018. She is a co-founder of Maker Assembly, a critical gathering about maker culture: its meaning, politics, history and future.



Ninth Bridgewater Treatise (1838), Chap. IX. On The Permanent Impression Of Our Words And Actions On The Globe We Inhabit

180多年以前,维多利亚时期的博学家,最早的机械计算机发明者查尔斯·巴贝奇曾提出过这样一个概念:空气是所有曾被说出的词语的“巨型图书馆”。如果他今日尚在人间,他会很快注意到,当年他提出的概念已经成为了今日之真实。当代技术正在实现我们对于“绝对记忆”(absolute recollection)的执念,而巴贝奇的思考也收到前所未有的回响。我们的每一个词语,每一张图片,每一个行动,在这个云计算和全球互联的时代,都被印刻和记录了下来——这是一个对万事万物都进行数字存档的时代,一个全局监控的社会。

艺术家拉斐尔·洛萨诺-赫默尔(Rafael Lozano-Hemmer)从巴贝奇的“巨型图书馆”概念出发,通过新兴技术变幻莫测的效果组合,创作了一个永恒变迁的声音影像环境,并把我们“迁移”至一种仿佛如巴贝奇观念具象化的世界里。在作品《空气记忆》( Atmospheric Memory )中,不可见的“空气”充斥着录制的声音,和记忆数据,并逐一在我们眼前显形。喃喃低语的“声音”所构成的无形之云在我们面前物化成形,进而转化成可触碰、和亲见的画面。艺术家通过一种剧场化而引人深思的方式,促使我们思考:这些“声音”不仅仅是文献或词语的图书馆,它不仅仅为了防止散佚而存在。某种意义上说,《空气记忆》向我们展现了我们如何把环绕自身的“空气”转换成了一种巨型机器,它无时无刻不在聆听着我们的声音,记录着我们的行动。这种对所发生之事和所言说之语的穷尽式记录的执迷到底意味着什么?甚至——我们有可能放弃这种执念吗?

颇有远见的英国数学家、作家爱达·勒芙蕾丝(Ada Lovelace)以她对于巴贝奇的分析机的运用著称,她曾经想象出一种能执行比单纯计算更为复杂任务的分析机。比方说,她认为分析机可被用来生产一种基于算法的音乐——如她所愿,这在之后的计算机时代成为了现实。勒芙蕾丝认为巴贝奇的机器可以被用来执行关于内容的操作,这些内容包括文本,声音和图像。勒芙蕾丝是否也曾想象过一台能捕获声音的分析机呢?

再六十年前,著名的“土耳其机器人”(The Turk)的发明者沃尔夫冈·冯·肯佩伦(Wolfgang von Kempelen),或许是最早尝试制作一台会说话的机器的人。肯佩伦探索了口语生产的生理学要素,并且发明了一种挪用人类声音的机制。“对话机器”可以发出声音,说出词语甚至句子,虽然大多数情况下人们难以理解其内容。在19世纪60年代早期,年轻的亚历山大·格拉汉姆·贝尔(Alexander Graham Bell)看见了查尔斯·惠斯通(Charles Wheatstone)受“对话机器”启发做的自动机,创作了一个会说话的“头”。在接下来的几十年,随着更多研究和实验的进行,贝尔实验室发明了电话,这开启了一个新的关于声音通讯技术的领域,人们也迈上一段搭线窃听和声音拦截的旅程。



在17世纪,组名的学者、博物学家亚塔那修·基歇尔(Athanasius Kircher)设计了“statua citofonica”,一种“会说话的雕塑”,这是一个对讲系统雏形,由一组隐藏在建筑墙体里的螺旋形漏斗构成,它们串联起建筑和周遭的院落或公共区域,形成一个巨大的监听装置。凯特· 克劳福德 (Kate Crawford)和瓦拉丹·卓勒(Vladan Joler)在他们近来研究亚马逊Echo的地图作品(《人工智能解剖学》)中,也引用了这一装置。《人工智能解剖学》从人类劳动,数据和物质资源的语境里研究了作为一个听觉载体和家中之“耳”的Echo设备。


丹麦艺术家西西里·瓦格纳-法尔肯斯特姆近期创作了一件运用人工智能的作品《FRANK》。《FRANK》是一个可以与人对话的拟人声音;它是一台会说话的预言者,一个基于机器智能的合成声音,会给一些关于存在困境的个人建议。你可以跟“FRANK”交流个人焦虑,希望,梦想与恐惧。FRANK会聆听你的倾诉,并给你“他的”有见地的指导和建议。事实上,和FRANK的交流其实和我们与Siri, Alexa,或者将来会出现的其他声音设备之间的交流别无二致。

声音是一种有独特性的元素:我们诞生的第一声啼哭便标注了我们在人世间的存在。声音也是人类表达,感情和交流的主要模式之一。肯佩伦的“说话机器”发明两百多年后,有能力识别和合成人类声音的电脑已经无处不在,我们的声音也成为了教会机器模仿人类演说的训练数据集。百度,Deep Voice和Lyrebird等应用程序也使用机器学习来创造高拟真的人类声音,或者复制人类演说。模仿任何人类的声音已经成为了现实,同时它也提出了非常严肃的担忧:这一类的技术是否会被滥用?





她近期的展览《人工智慧》(Artificially Intelligent)于2018年9月至12月在V&A展出。她也饿时Maker Assembly的联合发起人,这是一个关于创客文化的批判性研究群体,聚焦于创客文化的意义,政治,历史与未来。