In the last post of this series I have described that Aria (a ChatGPT variant) declines a user to call it, i.e. a LLM, “stupid”. Despite the fact that there were and often are not only situation-related, but also very fundamental reasons for calling an algorithm stupid.
My first thought was that this somehow fits current developments in the home-country of present western providers of advanced LLM-based chat-SW in a strange way. One one side a president who calls former leaders of his own country stupid in the public – but on the other side users of LLM-based chat tools are not allowed to call the bunch of stapled deterministic algorithms stupid. What a modern kind of sanctimoniousness of a capitalistic society – one might involuntarily think …
But, of course, this behavior of Aria and ChatGPT alike is either a pre-trained or defined as a standard reaction of additional filters to the appearance of certain words in the “communication” with an LLM. In particular when potentially offensive words are used within an explicit assignment. And such corrections of an LLM’s reaction patterns (after its basic training on the use of human language) are certainly not done without good reasons – in particular, as the require the attention of human beings and human action in a business context is expensive.
So, what could be reasons for this? This was the question which I got first from friends who followed my dispute with Aria, closely. So, before reproducing the contents of a further communication with Aria on stupidness, let me first give a plausible answer to this question.
Concerns over typically human reaction patterns?
Let us assume that good verbal conduct and manners are not attitudes the main promoters of AI chat SW are really interested in or care about. At least not in a chat between a human being and a bot. The algorithm may reproduce sentences according to probabilities of expressions in a certain context, but as both the creators and the bots themselves rightfully claim: Algorithms have no feelings. They are mathematical procedure. So, they cannot be offended and “mentally” hurt. (At least not on the level of today’s LLMs).
IT and AI companies are first of all companies that want to earn money and gather data with their products. They normally do not care about politeness – otherwise e.g. their social media platforms would look differently.
Why then implement a reaction pattern regarding the rigorous decline of the use of offensive words?
The reason is simple: A stupid LLM algorithm reacts and forms texts according to patterns it has been trained on. Regarding language, we talk about context related patterns for the use of words and expressions (among other important aspects of NLP, as e.g. grammer rules).
This also concerns negative aspects of certain conversational contexts. Training contents from literature and the Internet, certainly comprises a multitude of conversations which have turned into an argument accompanied by emotions and offensive verbal attacks between the human conversation partners or, often enough, opponents. And from our personal experiences we know very well that and how such disputes escalate. Such escalations are characterized by the assignment of offensive adjectives to the opponent, his attitudes thoughts or other capabilities. One offensive word triggers the next one …
So, because an algorithm is basically stupid and simply reproduces patterns – not only nice ones, but also patterns of escalating and emotional conflicting conversations – the following makes sense: OpenAI and alike companies apply “post-training” methods like reinforcement-learning to block any development of a “conversation” turning into an escalating verbal exchange of words.
Such escalation patterns are at least typical for human beings – and are certainly reflected in many textual episodes LLMs are trained on. If we are honest than we know very well that we have followed such patterns a few (?) times in our lives, too.
This line of thought leads us to a more realistic view upon the fact that the providers of Aria stop the use of potentially offensive words in a conversation with their bots at once: They are afraid that the algorithm might use replies containing words which, during LLM-training, got a high probability in the contexts of verbal human disputes. These could be words and terms with offensive character, words expressing negative emotions. Triggering a further offensive escalation.
So, the setup AND/OR training of filters which stop the exchange of words, that might lead into a typical pattern of dispute escalation at once, is a reasonable effort before publishing LLMs. Not in the least because the providers would want to avoid potential legal consequences.
A lesson regarding LLMs and AI in general
Beyond taking the above explanation to notice my readers should try and find related explanations of OpenAI, Google, Meta ad others on the Internet. The research may come with some surprises … But, is there a lesson, we can take with us regarding the further development of AI-tools? Yes, in my opinion, there is one.
We live in a time in which scientific progress has allowed us to create systems which produce a mocking reflection of ourselves: LLMs. But history tells us that we humans are beings not “designed” to interact with each other in positive ways, only. Homo homini lupus! The world wars have proven this phrase of Thomas Hobbes to be true, and our current wars and conflicts prove even more that we human beings have serious problems with learning from our own history.
We cannot hope that present and future AI algorithms which get at least in parts trained on our own destructive reaction patterns would not reflect our own negative properties or “features” (in ML terms). Actually, this is something, we should be most afraid of during our continuing approach towards creating an AGI:
To get in the end confronted with patterns of human aggression – this time exerted and “applied” by an AGI.
Outlook on further posts
After this intermezzo, which I simply had to write, the next post will turn back to the question of how we can make an algorithm admit that it is stupid.
Stay tuned …