The Perfect Tweetstorm: Microsoft’s Tay and the Cultural Politics of Machine Learning
Blake Hallinan, researcher, College of Media, Communication and Information, University of Colorado, Boulder
“hellooooooo w🌎rld!!! [globe emoji]”
“c u soon humans need sleep now so many conversations today thx💖 [pink sparkle heart emoji]”
A day in the life of an artificial intelligence can be a very full day indeed. In the 16 hours that passed between its first message and its last, Tay—the AI with “zero chill”— sent over 96,000 tweets, the content of which ranged from fun novelty to full Nazi [1-2]. Similar to popular conversational agents like Siri and Alexa, Tay was designed to provide an interface for interaction, albeit one with the humor and style of a 19-year-old American girl. Microsoft created the bot using machine learning techniques with public data and then released Tay on Twitter (@TayandYou) so the bot could gain more experience and—the creators hoped—intelligence. Although some protections were built into the system, with prepared responses to handle some more recent sensitive political topics such as the 2014 death of Eric Garner after he was put in a police chokehold , the bot was ill-prepared for the full Twitter experience and soon began tweeting all manner of sexist, racist, anti-Semitic, and otherwise offensive content. This precipitated a storm of negative media attention and prompted the creators of the bot to remove some of the more outrageous tweets , take Tay offline permanently, and issue a public apology . Tay’s short life produced a parable of machine learning gone wrong that may function as a cautionary “boon to the field of AI helpers” , but it also has broader implications for the relationship between algorithms and culture [7-8].
Machine learning algorithms are increasingly used in the lifestyle domain, quietly working in the background to power conversational agents, media recommendations, ad placements and search results, image identification, and more. While the rapid development and take-up of this technology has outpaced legal frameworks [9-10], it also poses a challenge for cultural understandings of AI. This is a point made evident in the discourse that surrounds machine learning algorithms as emerging naturally from our datafied world, both simultaneously neutral and objective, as well as spooky and mythical [11-12]. Some responses following the Tay controversy argued that technology is neutral and that Tay simply presented a mirror of society [13-15]. Others framed Tay as a harbinger of a dystopia where users will be completely helpless in the face of all-powerful technologies . The more interesting account lies somewhere in the middle: Tay certainly was a mirror of sorts, but like any mirror, the image is profoundly mediated. As professor of computer science Caroline Sinders argued, “If your bot is racist, and can be taught to be racist, that’s a design flaw” .
To understand the failures of Tay as a series of design flaws requires an understanding of the design of conversational agents. As a chatbot, Tay was required to parse the textual speech of others and respond in kind. But what comes intuitively to native speakers turns out to be very hard to teach to a bot. The technical name for the problem—natural language processing (NLP)—has proven incredibly difficult to solve by means of explicit rule because conversation, in practice, is too unruly and contextual. However, significant advances have been made by applying machine learning algorithms. Like inductive reasoning in humans, machine learning works by developing generalizations from data. Where there is reason to believe that there are patterns in a set of data, machine learning provides a way to construct a “good and useful approximation” of that pattern that can increase understanding, functionality, and predictions .
In the case of Tay, the understanding of speech was built on anonymized public data, along with written material produced by professional comedians . From this training data, the conversational agent discovered patterns and associations which resulted in a kind of functional intelligence that could parse parts of speech and respond appropriately at times. However, it is important to recognize that this intelligence remains profoundly inhuman in other ways [20-21]. Consider a controversial example from Tay: In response to the question, “Did the Holocaust happen?” Tay answered, “It was made up [hand clap emoji]” . While headlines repeatedly referred to Tay as a neo-Nazi or anti-Semite, such phrasing suggests a more coherent ideology than experienced or demonstrated by the bot . This is not to claim that the tweets did not express anti-Semitic views. Instead, it intends to draw a distinction that what many humans know about the Holocaust as a historical event, its significance, and the politics around talking about it are not things that Tay or other popular conversational agents know. Instead, the knowledge or intelligence at work is the ability to identify the Holocaust as a noun and draw associations from the material that the bot was trained with and subsequently exposed to . In machine learning, contextual understanding is significantly limited , and while word associations and lexical analysis form a component of human conversation, they by no means exhaust the phenomenon. From a post-mortem analysis, it is possible to better assess the kinds of programming choices and interaction experiences Tay was subjected to—including the concerted efforts of internet trolls [26-29]—but a fundamental limit on our ability to understand or interpret machine learning remains . In light of this situation, how should we respond?
First, look to the past. Machine learning creates “models based on patterns extracted from historical data” . As an automated way of extracting patterns from data, practitioners are instructed to consider the data being used for issues of accuracy and appropriateness, but this is certainly not a guarantee of success. High-profile reports, including the difficulty that facial recognition technologies may have with queer and trans people  or the higher rates of failure when identifying African-American faces , demonstrate that the lack of diversity in the training data set implicates the functionality of the model. The case of Tay also draws attention to another confounding issue here: a lack of corporate transparency [34-35]. Microsoft, the creator of Tay, identified the training materials only in the most general manner as “public data.” Making the training data available, to the general public or an outside organization, provides an opportunity to assess issues of accuracy and bias, although such vetting can also raise concerns with privacy and the possibility of re-identifying individuals from anonymized data sets .
Second, consider context. What is the application in service of and are there any other values that might matter? Machine learning algorithms do not happen in a vacuum. As others have argued, the cultural significance of algorithms is not just about code and data, but should be considered an “assemblage of human and non-human actors” that can “take better account of the human scenes where algorithms, code, and platforms intersect” . With Tay, the emphasis on context and relations directs our attention to the way that Tay was implemented on Twitter, a social space where humans (and non-humans) interact . This context should shape ethical considerations. What are the stakes for the individuals or the populations involved if things go wrong? Putting a chatbot on Twitter, where there is a history of issues with abuse and harassment, raises a likely area of concern . Game designer Zoe Quinn, subject to harassment from Tay, was quoted saying, “It’s 2016. If you’re not asking yourself ‘how could this be used to hurt someone’ in your design/engineering process, you’ve failed” . These ethical considerations should impact not only the design of machine-learning applications, but also decisions about the appropriateness of machine learning as an approach. Given the opacity built into automated inductive reasoning, it may not be a desirable solution, or at least it may not be desirable to implement inductive reasoning in an automated way for some problems. A conversational agent to interact with on Twitter can be a source of significant entertainment and lack of harm—Microsoft’s Chinese chatbot Xiaolce provides one successful example of the technology operating in a different social context where free speech is more restricted —but sending, say, automated suicide prevention interventions to social media brings significantly different risks and issues .
Finally, question the effects. Information is intensive—it is not simply a matter of understanding the world, but also of using that understanding to shape the world . The cultural applications of machine learning build on particular kinds of data: “Unlike the data sets arising in physics, the data sets that typically fall under the big data umbrella are about people — their attributes, their preferences, their actions, and their interactions. That is to say, these are social data sets that document people’s behaviors in their everyday lives” . Although Tay was active on Twitter for less than 24 hours, the bot still participated in the harassment of public figures, including feminist video game critic Anita Sarkesian. This is something that certainly built off past patterns of behavior on Twitter but did not necessarily align with the intended use or values of Microsoft, Tay’s parent company. Engaging in vitriolic discourse is, on the one hand, a brand risk—something that could potentially harm trust in the organization —but it’s also something that should be considered on its own ethical merits. Machine learning builds models based on historical patterns in data, but that does not answer on its own whether that pattern is desirable. Should history repeat itself?
Machine learning is, at its core, predicated on fitting models to data. But as its applications become more significant aspects of everyday life, we must also consider how to fit models into the world in ways that are consistent with cultural politics and institutional values, as well as research and corporate benefits. Without filters or competing values programmed into a system, whatever “maximizes engagement gets the attention of the bot and its followers,” even, or especially, when that involves hatred . In other words, data alone does not have a moral framework for evaluation, even though, as The New Yorker’s Anthony Lydgate put it, “consciousness wants conscience” . Tay’s ability to turn toxic within just one day on Twitter was enabled by the lack of such a conscience, or the ability to understand and consider the appropriateness of responses based on social history, context, and self-reflexivity about the effects of conversations. These are not inevitable outcomes, but instead the result of design choices that emerge out of a disconnect between the technical and cultural roles of algorithms. Machine learning may seem to promise new answers to a straightforward question—what have we been? Tay’s brief existence on Twitter, however, shows us the more apt question remains unanswered—what should we become?
1. Nick Summers, “Microsoft’s Tay Is an AI Chatbot with ‘Zero Chill,’” Engadget, March 23, 2016, https://www.engadget.com/2016/03/23/microsofts-tay-ai-chat-bot/.
2. James Vincent, “Twitter Taught Microsoft’s AI Chatbot to be a Racist Asshole in Less Than a Day,” The Verge, March 24, 2016, https://www.theverge.com/2016/3/24/11297050/tay-microsoft-chatbot-racist.
3. Caroline Sinders, “Microsoft’s Tay Is an Example of Bad Design,” Medium, March 24, 2016, https://medium.com/@carolinesinders/microsoft-s-tay-is-an-example-of-bad-design-d4e65bb2569f.
4. Rob Price, “Microsoft Is Deleting Its AI Chatbot’s Incredibly Racist Tweets,” Business Insider, March 24, 2016, http://www.businessinsider.com/microsoft-deletes-racist-genocidal-tweets-from-ai-chatbot-tay-2016-3.
5. Peter Lee, “Learning from Tay’s Introduction,” Official Microsoft blog, March 25, 2016, https://blogs.microsoft.com/blog/2016/03/25/learning-tays-introduction/#PsUkq77fw0qCJXQH.99.
6. Rachel Metz, “Microsoft’s Neo-Nazi Sexbot Was a Great Lesson for Markers of AI Assistants,” MIT Technology Review, March 27, 2018, https://www.technologyreview.com/s/610634/microsofts-neo-nazi-sexbot-was-a-great-lesson-for-makers-of-ai-assistants/.
7. Blake Hallinan and Ted Striphas, “Recommended for You: The Netflix Prize and the Production of Algorithmic Culture,” New Media and Society 18, no. 1 (June 2014): 117–137, https://doi.org/10.1177/1461444814538646.
8. Ted Striphas, “Algorithmic Culture,” European Journal of Cultural Studies 18, no. 4–5 (June 2015): 395–412, https://doi.org/10.1177/1367549415577392.
9. Omer Tene and Jules Polonetsky, “Big Data for All: Privacy and User Control in the Age of Analytics,” Northwestern Journal of Technology and Intellectual Property Volume 11, no. 5 (2013): 240–273, available at: http://heinonlinebackup.com/hol-cgi-bin/get_pdf.cgi?handle=hein.journals/nwteintp11§ion=20%5Cnhttp://ssrn.com/abstract=2149364.
10. Frank Pasquale, The Black Box Society (Cambridge: Harvard University Press, 2015).
11. danah boyd and Kate Crawford, “Critical Questions for Big Data: Provocations for a Cultural, Technological, and Scholarly Phenomenon,” Information Communication and Society 15, no. 5 (May 2012): 662–679, https://doi.org/10.1080/1369118X.2012.678878.
12. David Beer, Metric Power (New York: Palgrave Macmillan, 2016).
13. Davey Alba, “It’s Your Fault Microsoft’s Teen AI Turned into Such a Jerk,” Wired, March 25, 2016, https://www.wired.com/2016/03/fault-microsofts-teen-ai-turned-jerk/.
14. Courtney Burton, “Ethics in Machine Learning: What We Learned from Tay Chatbot Fiasco?” KDnuggets, March 2016, https://www.kdnuggets.com/2016/03/ethics-machine-learning-tay-chatbot-fiasco.html.
15. Gina Neff and Peter Nagy, “Talking to Bots: Symbiotic Agency and the Case of Tay,” International Journal of Communication 10 (2016): 17, http://ijoc.org/index.php/ijoc/article/view/6277.
17. Sinders, “Microsoft’s Tay Is an Example.”
18. Ethem Alpaydin, Introduction to Machine Learning (Cambridge: MIT Press, 2014), 2.
19. Vincent, “Twitter Taught Microsoft’s AI Chatbot.”
20. Jenna Burrell, “How the Machine ‘Thinks’: Understanding Opacity in Machine Learning Algorithms,” Big Data & Society 3, no. 1 (January 2016), https://doi.org/10.1177/2053951715622512.
21. Hallinan and Striphas, “Recommended for You.”
22. Sophie Kleeman, “Here Are the Microsoft Twitter Bot’s Craziest Racist Rants,” Gizmodo, March 24, 2016, https://gizmodo.com/here-are-the-microsoft-twitter-bot-s-craziest-racist-ra-1766820160.
23. Vincent, “Twitter Taught Microsoft’s AI Chatbot.”
24. Peter Bright, “Tay, the Neo-Nazi Millennial Chatbot, Gets Autopsied,” Ars Technica, March 25, 2016, https://arstechnica.com/information-technology/2016/03/tay-the-neo-nazi-millennial-chatbot-gets-autopsied/.
25. Ari Schlesinger et al., “Let’s Talk about Race: Identity, Chatbots, and AI,” in CHI 2018: Proceedings of the 36th Annual ACM Conference on Human Factors in Computing Systems, Montreal, Canada, 21–26, https://dl.acm.org/citation.cfm?doid=3173574.3173889.
26. Bright, “Tay, the Neo-Nazi.”
27. Ethan Chiel, “Who Turned Microsoft’s Chatbot Racist? Surprise, It Was 4chan and 8chan,” Splinter, March 24, 2016, https://splinternews.com/who-turned-microsofts-chatbot-racist-surprise-it-was-1793855848.
28. Alex Kantrowitz, “How the Internet Turned Microsoft’s AI Chatbot into a Neo-Nazi,” BuzzFeed News, March 24, 2016, https://www.buzzfeed.com/alexkantrowitz/how-the-internet-turned-microsofts-ai-chatbot-into-a-neo-naz?utm_term=.cerrddjrJe#.aswk22ykw1.
29. Price, “Microsoft Is Deleting.”
30. Burrell, “How the Machine ‘Thinks.’”
31. John D. Kelleher, Brian Mac Namee, and Aoife D’Arcy, Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies (Cambridge: MIT Press, 2015), 44.
32. Alyx Baldwin, “The Hidden Dangers of AI for Queer and Trans People,” Model View Culture, April 25, 2016, https://modelviewculture.com/pieces/the-hidden-dangers-of-ai-for-queer-and-trans-people.
33. Clare Garvie and Jonathan Frankle, “Facial-Recognition Software Might Have a Racial Bias Problem,” The Atlantic, April 7, 2016, https://www.theatlantic.com/technology/archive/2016/04/the-underlying-bias-of-facial-recognition-systems/476991/.
34. Pasquale, The Black Box Society.
35. Burrell, “How the Machine ‘Thinks.’”
36. Arvind Narayanan and Vitaly Shmatikov, “Robust De-anonymization of Large Sparse Datasets,” Proceedings published in IEEE Symposium on Security and Privacy (Spring 2008): 111–125, https://ieeexplore.ieee.org/document/4531148/ 10.1109/SP.2008.33.
37. Mike Ananny and Kate Crawford, “Seeing without Knowing: Limitations of the Transparency Ideal and Its Application to Algorithmic Accountability,” New Media & Society 20, no. 3 (December 2016): 973–989, https://doi.org/10.1177/1461444816676645.
38. Asbjørn Følstad and Petter Brandtzæg, “Chatbots and the New World of HCI, Interactions 24, no. 4 (June 2017): 38–42, https://doi.org/10.1145/3085558.
39. Bright, “Tay, the Neo-Nazi.”
40. Alex Hern, “Microsoft Scrambles to Limit PR Damage over Abusive AI Bot Tay,” The Guardian, March 24, 2016, https://www.theguardian.com/technology/2016/mar/24/microsoft-scrambles-limit-pr-damage-over-abusive-ai-bot-tay.
41. Alba, “It’s Your Fault.”
42. Amai Eskisabel-Azpiazu, Rebeca Cerezo-Menendez, and Daniel Gayo-Avello, “An Ethical Inquiry into Youth Suicide Prevention Using Social Media Mining,” in Internet Research Ethics for the Social Age: New Challenges, Cases, and Contexts, eds. Michael Zimmer and Katharina Kinder-Kurlanda (New York: Peter Lang Publishing, 2017): 227–234.
43. Gilbert Simondon, “Individuation of Perceptive Units and Signification” in Individuation Psychique et Collective (Paris: Aubier, 1989), translation available at https://speculativeheresy.wordpress.com/2008/10/06/translation-chapter-1-of-simondons-psychic-and-collective-individuation/.
44. Hanna Wallach, “Big Data, Machine Learning, and the Social Sciences: Fairness, Accountabiliy, and Transparency,” Medium, December 14, 2014, https://medium.com/@hannawallach/big-data-machine-learning-and-the-social-sciences-927a8e20460d
45. Michael Pirson, Kristen Martin, and Bidhan Parmar, “Public Trust in Business and Its Determinants,” Business & Society 8 (May 2016): 116–153, https://doi.org/10.1177/0007650316647950.
46. Ingrid Angulo, “Facebook and YouTube Should Have Learned from Microsoft’s Racist Chatbot,” CNBC, March 17, 2018, https://www.cnbc.com/2018/03/17/facebook-and-youtube-should-learn-from-microsoft-tay-racist-chatbot.html.
47. Anthony Lydgate, “I’ve Seen the Greatest A.I. Minds of My Generation Destroyed by Twitter,” The New Yorker, March 25, 2016, https://www.newyorker.com/tech/elements/ive-seen-the-greatest-a-i-minds-of-my-generation-destroyed-by-twitter.