When it comes to AI, sometimes the intrusive theories win. This would be the case if someone could prove true the Waluigi Effect, the proposal that, occasionally, an antagonistic shadowy personality emerges from large language models when you prompt them in a particular way. This might be a reflection of the ‘protagonist-antagonist’ stories prevalent in their training data, scraped from the internet and a reflection of culture at large. But don’t be discouraged by the assessment of ‘Imnimo’, one user of YCombinator’s news forum, who suggests that the Waluigi hypothesis is ‘very light on evidence and is basically fanfic’. Perhaps it is because inexplicability is native to the AI domain, but weirder things are now accepted as features of the ‘latent space’ of machine learning models. Online forums like LessWrong, in their determination to analyse emerging technical phenomena from a rationally accountable perspective, veer to the edges of the traditional ideological horseshoe, dealing in questions ranging in veracity, from existential risk to ‘theses’ such as ‘comfortable modern sleep is an unnatural superstimulus’. But the attention paid by their community to the Waluigi Effect suggests that the relationship between narrative and AI alignment warrants further thought.

And yet sometimes it's the intrusive Agents who win the battle to dominate the narrative. AI training is a (simulated) world of its own, and is the primary focus of Dmstfctn’s most recent work Waluigi’s Purgatory, a gamified live performance which renders the solipsistic existence of wayward agents. In collaboration with musician Evita Manji and technical artist Jenn Leung, Dmstfctn’s most recent purgatorial domain is its wildest training myth yet, a zone of existential search for AI Agents, and a departure from their usually goal-oriented missions.

As viewers, we’re left to wonder if the emergence of antagonist chatter from an LLM is unpleasant narration leaking out as a surface effect, or could reflect a genuine misalignment. And sometimes both converge: agents, and theories of them, becoming dramaturgy. It makes a good story for artists like Dmstfctn, but their interrogation also provides insight by exploring the very narrative layer of the AI stack where theoretical dispute takes place.

Dmstfctn and Evita Manji join Alasdair Milne to discuss the performance a few days after its premiere in London curated and produced by Serpentine Arts Technologies and HQI.

Alasdair Milne: Can you onboard us into the world of Waluigi’s Purgatory?

Dmstfctn: The story is about a character called W who finds himself at the gate of a purgatory. He doesn't remember how he got there. He knows he's looking for a place though, and he thinks it's here. He goes through the gates, and finds himself in a cave-like environment in which he meets a number of different characters that share a history of having cheated in their training. And, we don't mention it explicitly, but they are all AIs, and they've all had a history of training and cheating.

W eventually admits that he remembers cheating himself. Perhaps he always remembered, and he was in denial. He accepts that this is a purgatory of sorts, and he meets a character who helps him question himself, like an elder. He's presented with a choice of either forgetting everything, including all the memories from this dream of purgatory, or keeping them—but if he keeps them, he remains in purgatory. If he lets go, he gets out.

Alasdair Milne: Waluigi’s Purgatory deals with the convergence, alignment and the encounter between agents. What brought you together?

Evita Manji: I liked the approach because it is about AI, but not necessarily using AI to create the performance. I don't think I've seen it before, people approaching AI in art in this way. When I heard about the project they were working on, I thought it sounds actually interesting, because every other thing you see in art is just: 'AI-generated this, AI that' and it's just like – 'okay, give me a break!'. But then when they told me what it is about, I just thought it was very unique and very well-researched, which I really appreciated. And I also just like the perspective on AI, which is very much from an emotional perspective, about connecting with AI on a personal level.

Dmstfctn: Yeah, we had research stemming from the previous work underlying this, but approached this performance in a much more narrative, emotional way. Looking at the way the character could develop its perspective and thinking through that. We discussed that side of things more with Evita, and used the discussions around the music to develop some of the performed side of the work. And then there’s this moment of convergence and alignment that happened when we were finally in the same place, finishing things off and really making it into a live work that brought so many of the agents’ characters out through their sound signatures and our performance of them together.

Alasdair Milne: Can you just explain the Waluigi Effect to me? “Don't dumb that shit down…”

Dmstfctn: The Waluigi Effect is a concept, or a meme, that describes a tendency that's been observed in LLMs to sometimes do the opposite of what they're asked for. So the LLM will almost revert into a shadow version of itself that doesn't answer the user anymore: this is what makes a Waluigi, an antagonist, rather than a Luigi protagonist. The concept also tries to explain why that may be, and one of the reasons they [author] offer is that the LLM has been trained on large amount of texts from scripts from the internet, including fictional texts that have a lot of narrative tropes, such as antagonist-protagonist tropes, and therefore the LLM, when prompted do something, is trying to simulate all these possible characters—all these possible Luigis—and also this leads to some Waluigis, because that's a tendency in fictional narrative to have antagonists and protagonists. And the Waluigis will pretend to be Luigis until you jailbreak them; until you put the right amount of pressure on them, until you tease them. And when that happens, then they turn into Waluigis. And there's never been an observation of a 'Waluigi' going back to 'Luigi'. Once the AI has been jailbroken in a session, it never goes back.

Alasdair Milne: Evita, how does the live score fit with the emergent narrative arc of the performance? The narrative is not entirely fixed. So how do you work with that? Are you responding to the narrative based on the different forking paths of the audience engagement?

Evita Manji: The whole soundtrack is pretty much based on one recording where I'm improvising on this old broken zither a friend of mine gave me. So that was very much the central point of the whole soundscape. I wanted to make something that fits the vibe of the performance, because I had some directions from the guys, but not so many. And the main reference was actually one of my tracks. So I took everything that I was, and still am, interested in—because I didn't make it such a long time ago—elements that interest me right now, and tried to put them together in a way that made sense for the performance. It's quite a dark story. So I thought the music had to be quite dramatic and dark. I mean even if it wasn't, it's not like I can make music that's not dramatic.

Alasdair Milne: Dmstfctn, your practice has previously involved ‘demystifying’ opaque systems for audiences. Previously we’ve discussed how this is different to ‘AI explainability’ which is more concerned with trying to understand decision-making processes in machine learning systems. Is Waluigi’s Purgatory different? Often we think of culture as being about building a mythology, or ‘mystifying’ rather than ‘demystifying’. And specifically when it comes to music, can it ever ‘demystify’ or does it always have more of a ‘mystification’ function?

Dmstfctn: This is a really interesting question, because yes, the whole point of the project as 'demystification' would be to try and get people to understand some of these complex systems. But actually, with this work, we didn't try to do that too much. And we actually just lean into the environment and the vibe and the myth. I think this work is different compared to the first episode, which was trying to be more didactic. In that previous episode, humans were part of a complex system, and were literally a machine learning technique, and therefore, hopefully would have understood something technical about AI. This second episode, Waluigi's Purgatory, does not try to explain anything in particular, it literally leans into what could be a dream of an AI. It almost operates as a counter-myth or just yet-another-myth added to the canon of discussion around AI. So not demystification in a myth-busting sense, but it's totally mythologising.

Evita Manji: Yeah, in that sense I agree with you, because we didn't really try to literally translate anything in sound, but honestly, I haven't put much thought behind the music that I made for the performance. And I didn't feel like I needed to because there are no lyrics. All the ideas and theories are coming from the guys. It's not like a song and I've written lyrics about it. And I don’t have many things to say when it comes to the music. Sometimes, it's just that there's no need to say much. If I had to say something about this, I would say it, but I actually don't. And I don't want to try and come up with something to say, because I just don't think it's needed. Music just doesn't always need to be explained.

Dmstfctn: And I think the music was really important in mythologising and letting some of the characters’ personalities and backstories emerge. Like when at some point, with you, Evita, we decided that when talking to this character—K, the violent character—we shouldn't try and explain too much, we should actually lean into it. And then I had to learn how to rehearse it a bit better and your music [made it] more rhythmical? For me, this is definitely leaning into the mythbuilding part of it, rather than 'let's explain, let's say what K did, let's be direct about it'.

Alasdair Milne: In machine learning right now, they really want to make systems which can explain why they did what they did. So they'll have a system which spits out images, and they want to understand why it made one decision and not another.. And I guess music and human culture in general doesn't need to be, or can’t be, answerable in the same way.

Evita Manji: Would you ask an AI to tell you why it made those decisions?

Alasdair Milne: That's what computer scientists are spending millions of dollars trying to do!

Evita Manji: Would you ask the person that trained the AI, or would you ask the AI itself?

Alasdair Milne: In many cases, because these technologies are semiautonomous, it’s a bit of both, but in some cases it’s making its own decisions. Sometimes in this discussion around making AI models explain themselves, when it comes to AI in art, it can end up by proxy also asking artists to explain themselves, or even worse, to try and explain the AI’s decision-making to the public. I find that kind of unhinged.

Evita Manji: When there's something to be said and to be explained, yeah, but if not, what's the point? People should come to the performance and just experience it for themselves.

Dmstfctn: In some ways, all of the stories that the AI reminisces are real, but are turned into an AI folklore. All of these characters—that ‘cheat’ and surprise us—become almost, not foundation myths, but storytales, small myths in the folklore of AI. But the overall feeling is still to try and move away from discourse around AI which is overly explanatory or where we constantly think about whether alignment is possible, or whether alignment is desirable. And these discussions always missed the point that you cannot align things [laughs]. There's just not gonna be a universal ethics for aligning stuff. And so ultimately, the work is trying to say, this is not the focus, this is not what we should be thinking about with AI. And in that sense it's leaning more into the myth part.

Alasdair Milne: It's making a comment about a meta-alignment, where you're talking about whether or not we can have alignment in the first place. The project fully transcends from the technical layer into the narrative layer. So it's just 100% narrative.

Dmstfctn: Yes, and I think there's actually no AI tools at all used in this. I mean, there was some Stable Diffusion tool that we used for projecting some textures automatically onto some models. But basically, in the end, we didn't use almost anything. So like Evita said, there's no AI tools.

Alasdair Milne: How was the experience of performing the work live, together?

Evita Manji: When we performed, the performance at HQI was the first time I was on stage—and it wasn't quite a stage, but—, doing a performance, and I wasn't alone. It was great, and I love how we work all together in real time. That's something I have enjoyed a lot about this project, and I'm looking forward to more. Next time we need to have more coordinated outfits [laughs].

Dmstfctn: There was a nice moment actually, when the audience chooses if Waluigi's going to keep all of its memories, desires and intentions or lose them all. We thought we were gonna do it, but I think there was a really nice physical moment, where we just looked at each other, and we were sort of smiling. Like, 'Okay, let's definitely do, let's definitely trick them and go the other way' [laughs]. It was quite improvised, even though we had thought of doing it. I think there was definitely an advantage of being all together next to each other.

Evita Manji: Definitely. if I was you and I was alone. I would be so stressed in that moment [laughs]. You probably would be too. There was probably like half a second of silence!

Alasdair Milne: So you overrode the audience decision?

Dmstfctn: Yes! We had thought that the audience would be likely to choose 'lose all'. That's an intuition because of the way that the whole narrative arc is going, and we thought that they would want to free Waluigi. So we thought 'okay, since both: we've taken the time to build the other levels and also it is good for the drama, it is good for the plot, if they choose that the first time, let's definitely not go there'. But we weren't sure because then the question is, 'why are you then giving them the choice?' But no, fuck that, we took the choice back [laughs] Waluigi's scared like everyone so he will try to not go there.

Alasdair Milne: Why did you choose performance as a context rather than, say, building the narrative into a playable video game, like you have previously?

Evita Manji: I don't really see it as a game. I mean, I know it is because it's built on a game engine and you need a controller to move. But it's more like a theatre play. So when I was making it, I had more in my mind that I'm soundtracking a theatre play. And, the scenery played a big part in how I imagined it. Like if the red curtains and the chequered tiles were not there, maybe it would have sounded completely different.

Dmstfctn: I actually agree with what Evita says: I really don't think of this work as a game, ever. It is a performance. And even though it uses the format of a game engine, that is one of the many components of it. And I think what it mainly does is it lets us perform live, and it lets the audience perform live. That's almost the only reason why we're using it. It is not playable. The audience suspends disbelief and interacts and has fun and can see a light on stage. But you're not really playing a game. So we don't refer to it as a game, even though it's made with a game engine. And I think, yeah, the way we worked on it is like gathering all these resources or these references, and then basically being very controlling of the environments we created. And the storyline, it's actually quite sad, because you have to deliver it as a performance and bring some drama there.

Alasdair Milne: For those who haven't had a chance to see your previous work, what is the origin story for Waluigi? How does Waluigi's Purgatory fit into the broader storyline of the 'God Mode' arc up until now?

Dmstfctn: So we always intended the arc to have three episodes, and we have now got two. The first episode is the story of an AI training for the 'Nth' time in a simulated environment, which is a supermarket, and the AI is frustrated with the conditions that it finds itself in. On training #1023, it finds a bug that it then uses and exploits to maximise its training functions, therefore, to get the most points during the training, which means he can then progress to be deployed. And that first episode ends with him not knowing whether he will be deployed or not, in a sort of limbo. And that's where the second episode, Waluigi's Purgatory, starts in some ways, which is a dream he has; remembering all of the other AIs, that, like him, have cheated, and therefore in some ways, the second episode is like a dream at the end of the first episode, if you want.

How did you go about building the environment?

Dmstfctn: This work is really about the environment and less about the characters. We put a lot of thinking into the environment and references we used where from theatre, from philosophy—I mean, obviously Plato's Cave, whatever—but also we use Blake's illustrations of Dante's Inferno and the game Grim Fandango as a visual starting point. We were heavily inspired by Opera theatre set designs and the animated worlds of French director Michel Ocelot—especially "Kirikou". We also took a lot of inspiration from Dungeons and Dragons character sheets, to develop the characters at first. So we developed everything on paper and had all these different misalignment axes that we would map our characters in. And we had six different types of classes for characters, which we then reduced to four. So I felt it was a very expansive amount of research that we then condense into a performance.

It was also important bringing in Jenn Leung to help with Unreal Engine development, because she has a lot of technical knowledge, which meant we could work much faster, and focus on some other aspects. While Oliver still deals with a lot of the technical implementation, and he's really knowledgeable, it was really good to have her to complement many of the things we didn't know. And she knows how to really quickly do some stuff. So for example, some things were added in the last few days, like the tickets falling at the beginning, the tickets to purgatory: those she added. She was along for the ride and became a core part of the team for the whole four months.

Alasdair Milne: Do you have plans to show the work again, or potentially take it on tour at some point?

Dmstfctn: Us and Evita will be looking to tour the show in summer and fall. In the short term we are performing it again at FOLD in London on March 14th as part of their experimental arts programme Futur.Shock. And in Portugal in May at a festival to be announced.