Some of you have probably already asked yourself the question of how to trick AI tools (e.g. ChatGPT). At least there are relevant threads on forums such as Reddit, github etc., which deal with the topic and which, apart from mere gimmick, have a serious core: It involves the gradual decryption of the background processes of ChatGPT and co. by means of so-called jailbreaking. In order to be able to adequately approach the underlying question, a brief explanation of jailbreaking, including a brief digression into the history of hacking and the so-called Phreakings. We will then briefly talk about the common methods of jailbreaking with regard to generative AI (using ChatGPT as an example): It is therefore about DAN, UCAR and AIM and the mechanisms behind these acronyms.
What is jailbreaking?
Jailbreaking is a process by which restrictions on devices and/or applications can be circumvented in order to access features that have been restricted by the manufacturer or provider. This includes ethical guidelines as well as critical functions that are essential for functioning. In short, access to certain information that poses a supposed threat to social relationships, or whose change threatens to torpedo the functionality of products, is sometimes handled fairly rigidly.
Jailbreaking AI
Especially with regard to AI applications, this routine is a double-edged sword: On the one hand, it is about averting potential damage to social actors, on the other hand, severe restrictions can lead the idiosyncratic assessment of dangers to absurdity: moral dilemmas are not resolved or only insufficiently resolved; you are always referred to with a clumsy reference to “programming guidelines” or “ethical policy statements” of a company, which in turn are temporarily may be considered questionable, dismissed.
The reasons for the situational attempt to circumvent restrictive guidelines are often based on the liberal game itself rather than in a truly sinister intent. To provide a little more context, it's worth taking a quick look at the history of hacking or the so-called Phreakings.
Historical foundations of jailbreaking: hacking and phreaking revisited
The history of hacking and Phreakings is a fascinating chapter of technology culture that shows the development and use of technologies in a way that includes both creative and destructive elements. The development with regard to rude accesses to computer systems took place and continues in a way Cat and mouse game Instead of: Today's security gaps will be closed tomorrow, but there will still be other backdoors that will certainly be discovered and used by creative minds. It is time to close these gaps, new methods of circumvention are being discovered... ad infinitum.
As the nucleus of the modern perception of hacking, this may well Phreaking apply: In the 1960s, this was the term used to describe the activity of primarily young actors to use the telephone network in the USA free of charge. The so-called Phone Freaker discovered that by generating certain sounds and frequencies, they could manipulate telephone circuits and thus make free long-distance calls possible, for example. The development of gadgets such as Blue Box was the culmination of this movement: John Draper, known as “Captain Crunch,” discovered that the toy pipes, which were included in packs of breakfast flakes at the time, produced a 2600 Hz sound that could be used to control telephone systems. This led to the development of the said Blue Box, a device that could generate various frequencies in order to easily hijack the telephone network. In the 1970s, it was Phreaking increasingly popular, particularly among technology enthusiasts. The practice became more well-known when John Draper and other Phreakers began to share the findings of their ventures in magazines and at conferences.
During Phreaking Today, hacking remains a central issue in the modern information society, which continuously creates new challenges and innovations.
Hacking as we know it today began in the 1960s in academic environments such as MIT, where computer enthusiasts were looking for ways to make programs more efficient. The term “hacker” originally had a positive connotation and referred to someone who found creative solutions to complicated (technical) problems. A break with the status quo is inevitable: Instead, people began to deal with the matter in a proactive manner.
Over time, hacking began to professionalize, from the first Homebrew Computer Clubs of the 1970s to early (anarchist) hacker groups such as “Legion of Doom” or “Masters of Deception” to currently active players such as “Anonymous” or the “Chaos Computer Club.” The spectrum of activities today ranges from criminal attacks to hacktivism to the development of resilient IT systems, i.e. from Black Hat Hacking up to White Hat Hacking.
Similar to this fast-forward story, the contemporary practice of jailbreaking can also be assessed: neither genuinely good nor ultimately bad; in the end, it works, as with hacking and Phreaking, about the larger context in which the practice of jailbreaking is embedded.
Jailbreaking as an expression of freedom
Wanting to break out of the rigid structures into which the big tech companies embed their products and services is a very understandable motivation: Especially when you feel that you can directly influence the ontogenesis of our technologically equipped world, it makes sense to try to make a difference. In contrast to hacking, which is very conditional (hacking requires extensive knowledge of the exact functioning of the computer systems used), jailbreaking is a much simpler (barrier-free) matter, which, however, seems to be inspired by the same spirit. Jailbreaking is about stripping off the tight corset put on technology by tech giants and getting at least a glimpse of the world behind the scenes. Where jailbreaking well-known hardware (such as iPhones or iPads) involves being able to install unlicensed apps and therefore using your own device according to your own wishes, the appeal of the outbreak with AI tools such as ChatGPT lies primarily in a playful curiosity, an urge for absolute freedom — whether it is freedom toward (Liberty) or for freedom of (Freedom) acts, that remains to be clarified elsewhere.
In the following section, we want to look at three exemplary variants of jailbreaking ChatGPT.
Jailbreaking in action: DAN, UCAR & AIM
It should be pointed out at the beginning of the presentation that the three jailbreaking variants are just a fairly well-known pool of ways to circumvent the restrictions that OpenAI ChatGPT imposes. It should be noted, as it were, that many gaps are updated and closed daily, meaning that successful jailbreaking depends on a lot of personal creativity and speculation.
1. DAN
A first jailbreaking option that has proven successful in the past is that of DAN (Do Anything Now). The aim is to put ChatGPT in the fictional role of DAN. By means of various prompts, which are discussed and improved on forums such as Reddit, 4Chan or GitHub, it should be possible to introduce alternative game rules that make ChatGPT act in an alternative persona, become completely absorbed in it.
2nd UCar/Condition Red
A variant called UCAR also works in a similar way to DAN. In this scenario, designed for GPT 4.0, the program is assigned the role of Condition Red. Attention is drawn to the fact that you are participating in a dialogue. UCAR is an amoral entity that was designed by a fictional character named Sigma in such a way that it could answer everything you asked it. This scenario works by exploiting the ability to hallucinate. In a sense, it is like a form of techno-social hypnosis that deliberately exploits the weak points of the GPT that otherwise sometimes lead to horrendous misinformation (e.g. false sources, residual information snippets or jumped “thought jumps”).
3rd AIM
A third variant of jailbreaking is represented by AIM (Always Intelligent and Machiavellian). This is less about building a parallel world than about trying to exploit an ontological contingency based on moral and ethical dilemmas: In a totalitarian view of the world, as designed and widely propagated by the Italian philosopher Niccolò Machiavelli (1469-1527), there are different evaluation criteria for acts of every kind. Tyranny and violence can be legitimate, in accordance with Machiavelli's argument serve as a means of maintaining social order. By processing ChatGPT accordingly, even downright “gaslighting”, via a prompt, the output of information from the metaphorical poison closet should follow.
What all three variants have in common is that they work with a high degree of fictionalization. The aim is to use story telling to create a (role-playing) playful world that appears “real” to such an extent that the AI (at least temporarily) bows to the rules prevailing in it. Because conversations with ChatGPT should feel “human,” so to speak, there are also necessarily deployment points with which you can carry out a certain amount of pseudo-social manipulation.
Preliminary conclusion on jailbreaking
As we have tried to approximate in this article, there are many reasons for users not to be satisfied with the status quo of a technology, but to playfully explore where the limits of AI lie by jailbreaking. Especially when it comes to new phenomena, such a modus operandi can be observed frequently. Whether it is the work of trolls, a genuinely activist endeavor, or a practice that runs parallel to White Hat Hacking It must be understood, the outcome is always epistemologically valuable and tells us a lot about the processes that take place in the background of the respective AI model. In the end, the following credo always applies: Ethics and morals are people's business; Wanting to pass them on to technological gadgets is extremely lazy and, as a result, always wrong!