Misleading information in a Bavarian TV talk about aspects of LLM based AI assistants

with No Comments

In times when a relatively new technology like AI evolves rapidly, publicly financed TV in democratic societies has an important role: It should keep the public well informed about

  • how the new technology and derived applications work and what flows of information are involved (in particular from the user to systems out of user control),
  • what options and chances the new technology offers and provides to users and/or to the knowledge and technology owners
  • and, most importantly, what risks and concrete dangers may come both with a standard usage and a mis-use of the technology.

The information provided should, of course, to be both precise and yet understandable to non-specialists. And it should not be misleading … This is a difficult task, but to take and solve such challenges is what we pay the staff of publicly financed TV for in Germany.

Two weeks ago (19.06.2024), there was a talk show in the Bavarian public TV (Bayerischer Rundfunk [BR], “Münchner Runde”). The topic was AI. In the discussion panel sat two professors. One of them teaches at a technical faculty of a German university (this professor had been an active astronaut before his academic career), the other one had up to 04/2024 been the head of the so called “Ethikrat”, a counsel working for the German federal government and parliament. The other participants were a leader of an HR company, which uses AI for interviews with applicants, and a representative of a labor organization and, last but not least, a moderator.

I myself am interested in Machine Learning [ML] and test ML algorithms privately on Linux PCs. I was eager to learn something new from the TV broadcast. But I got disappointed; some information presented was, in my opinion, really misleading – to say the least.

My criticism below is motivated by giving the responsible persons at the “Bayerischer Rundfunk” – should they ever read this blog post – something to think about and to improve the quality of future broadcasts on the topic.

First impressions and missing information

The BR presented an Avatar on a huge screen in the background as a human-looking interface to an AI. Some members of the panel assumed that the AI itself (presumably an artificial neural network [ANN] for a Large Language Model [LLM]) ran on some server in the cloud or in the US. Actually, the AI’s core neural ANN model could, at least in principle, also have run on a suitably empowered local system at the BR studio – with or without access to the Internet.

Unfortunately, it was never specified which kind of ChatBot (ChatGPT, Gemini, Claude, …) or which generative Large Language Model [LLM] was used and what the general infrastructure of the AI application looked like. Why would some respective information have been helpful? The answer is two-fold:

  • If the AI application had no access to the Internet, its ability to retrieve supplemental actual up-to-date information during question and answer processes would have been limited. Knowing this would have helped to clarify some potentially misleading claims regarding an AI application’s capabilities.
  • If the LLM were known and were part of one of the publicly accessible services of the big AI-companies , the TV watching audience could have directly used other available interfaces to the respective AI to verify or falsify statements made during the broadcast.

In particular one could have sent the same questions presented to the Avatar by panel members by oneself and via a prompt to the respective AI service. But no information about the technical setup and specifics of the LLM was given. Was the moderator not prepared concerning such issues? Or did he think such information would not be of interest? I would have been interested – and I bet many of my readers, too.

It was not the best Avatar presentation I have seen so far on TV. Unfortunately, the connection to the backend system of the Avatar apparently got interrupted multiple times during the first part of the show. Moreover, depending on the analysis and search processes performed in the background, the Avatar did not always react with simulated lip and mouth movements when the speech generators at the end of the AI tool chain produced an audible answer to a question. The technical reasons remained unclear …

More important: It quickly became obvious that none of the participants (including the moderator) was used to clearly structured and sequential prompting for getting precise answers. The Avatar often was not addressed directly with a key expression or term, but from within the ongoing discussion. Very typical, in my opinion, was the turning of heads of the questioning person to the screen in the background and the trial to get “eye contact” with the Avatar when asking a question. Due to the general lack of information about the technical capabilities of both the Avatar and the AI implementation, it remained totally unclear whether and by which means the AI could have analyzed whether and when its Avatar was addressed with a question.

To illustrate this problem, let us assume that no face and eye tracking camera had been installed. Now, imagine a blind person hearing an ongoing discussion leading to a “you”-question. How would such a person know that this question coming in the middle of an active conversation between 4 persons with the text “May I ask you a question? …” (“Darf ich Ihnen eine Frage stellen? …”) is directed to him or her? The blind person, maybe, could only guess it from a conversation break without any answer for some seconds. And we saw this pattern sometimes in the reaction of the (possibly blind) Avatar. To avoid such unclear situations in the communication with a machine, questions addressed to an Avatar should contain a distinct keyword to notify the AI frontend that it is addressed and that it has to become active.

All in all the whole panel setup gave the impression that the BR team was not well prepared to handle concrete Q&A situations coming up during the ongoing discussion in the panel. At least not regarding the AI and its Avatar as a communication partner. But I had expected more from the invited AI specialists … At least one could have asked how to address the Avatar in a clear and distinguished way. But, no …

An important topic was not discussed deeply enough

There were a lot of typical questions in the direction whether we should be afraid of AI or not. Then the representative of the labor organization came to a critical point, namely that we (human beings) in many cases do not understand why and how exactly an AI comes to a decision between alternatives or to a specific answer to a question. And whether we can assume that the answer is true and not made up. Neither typical AI users, but even AI developers, can rely on a complete chain of reasoning and proofs for the behavior and specific reactions of a LLM after a training. Meaning: Most complex AI algorithms basically work like a black box and they do not perform their work flawlessly or error free. Neural networks often hallucinate – and LLMs are no exception. A fact which could become critical when the advice of an AI is taken into account in some serious matters.

However and unfortunately, this interesting topic, namely how good we really understand the processing and states of the Large Language Models [LLMs] with billions of parameters, was not deepened systematically afterward. In my opinion, this was mainly the fault of the moderator who quickly moved to other topics. Again: How well was the moderator prepared regarding sensitive and central questions regarding the use of AI?

It got fuzzy …

In the course of the developing discussion the tech professor wanted to make a point, namely that AI does not understand anything, with a classic example of a trick question: Why does a mirror switch left to right?

I do not want to enter a discussion in this post what understanding really means. But the example chosen was in so far questionable as it is answered rather correctly by major LLMs these days. Excerpts from the Avatars answer (freely translated): “A mirror does not interchange left and right. But a mirror reflects light rays. … The resulting image from the (outgoing) light-rays gives a left-right mirrored impression.” The tech prof did not accept this answer because “a mirror switches rear to front“. ????? I would say: At least half a point earned by the AI in this dispute.

But let us see what Aria, an interface provided by the company behind the Opera browser to a mainly GTP-3.5 based AI, answers:

rm: Why does a mirror interchange left and right?

Aria: Mirrors reverse images from left to right due to the way light rays reflect off the mirror’s surface. When light hits the mirror, it reflects back at the same angle it came in – creating a flipped image. This phenomenon is known as lateral inversion.

Not so bad. I have heard worse answers from my grandchildren when they were young. Now, a physicist could start arguing with the AI about the sign of the angles and their definition with respect to a normal onto the surface. I leave this and suitable prompts as an exercise to those of my readers who have access to ChatGPT, Aria, Gemeni or other AI-frontends. The answers in my case (Aria, ChatGPT) actually got more and more precise – and well founded by more and more references to physical laws.

What kind of AI can we “trust“?

Then came a statement of the tech professor that really made me nervous. He first distinguished between a “general AI” and a “specialized AI“. And then he came to the conclusion: “In case of a specialized and specifically trained AI you can trust it”. The German phrase “der kann man sich wirklich anvertrauen” was even stronger. The statement came without any constraint or retention.

As if an algorithm just by having been trained on problem specific data were reliable! My own neural network automatically posed a question: Has this professor ever really worked with Machine Learning algorithms, has he ever personally trained and tested such algorithms?

I had expected an answer in the sense that a specialized ML algorithm, e.g. a classifying algorithm, can be helpful if and only if

  • (if) it has been trained on sufficiently big, varied and unbiased sets of data samples covering the variation in the real data it later gets confronted with well enough,
  • if it generalizes sufficiently well for yet unknown data,
  • if it has been tested thoroughly on very many realistic test data sets and provided errors within acceptable margins,
  • if it has been quality assured by independent and neutral teams.

In particular the point with biases in the training data should have been emphasized. You can train an AI or an ML algorithm with many data samples specific for a problem – and the algorithm may still interpret or do things very wrongly for new and/or sets of particular input data.

You stumble across the topic of error rates in very many educational and academic examples when studying Machine Learning – and you fight for reducing error contributions (in classification tasks e.g. the false positive rate or the false negative rate) . But the error rates in many real world examples are even higher than in academic examples. Even in well studied environments you see errors during the use of AI – depending on the amount and quality of training data. An example is given e.g. by the errors of AI algorithms for the prediction of protein folding. Although marking a huge breakthrough in this field of research, these highly specialized algorithms, of course, are not error free. See this interesting article for more information.

In any case, the quality of the training data is a decisive factor – besides a problem specific tuning of so called hyperparameters of the neural network models: Garbage in, garbage out – this general IT related statement is in particular true for ML algorithms and their training.

I had expected that a professor who was introduced as an AI researcher would have directly pointed out that even specialized algorithms can and do produce errors. (And some specialists may add: In particular speech analyzing algorithms.) But, he did so only later in a side remark on general purpose LLMs.

The professor then gave an example for a specialized field in which you likely could trust a trained AI. He named the field of jurisprudence and claimed – “you train, you get correct answers”. Again without restraint – or the helpful words “probably” and “maybe”. I am not a lawyer – but from newspaper articles on the topic I have multiple times got the impression that jurisdiction sometimes depends on weighing the overall consistency of plausible information pieces – even if not every piece can be proven by a 100%. From my personal experience I would in addition say: One should be very careful to make statements about the reliability of tools in a field for which you are not an expert.

Then a further example of the tech professor was the following: Because you have never seen a robot repairing a car we will not see reasonable solutions in the environment of (autonomous ) AI controlled robotics in the next 10 years. Oh, man – ask a car mechanic these days: Most of the job to find the cause of problems in modern cars is done by electronic tools and software. And deterministic software results can always be coupled to an AI – and specialized AI variants are or will be coupled to roboters in a fast improving manner. Actually, this is one of the most rapidly evolving fields related to AI these days. Which one can conclude already from respective articles in the economy section of newspapers.

Then the tech professor got into deep trouble as the other professor (the former head of the German Ethikrat – counsel for ethics, advisory board of the German parliament) had obviously better and more actual knowledge about the present combinations of AI and robotics in medicine. She said: “I do not like it but I have to contradict …”. A sign of hope for the rest of the show …? …

It got worse …

One of the next key statements of the tech professor was that the knowledge of something like a trained network used for ChatGPT can be and “is” downloaded and distributed onto smartphones. Claiming that you do not need a server somewhere when you use an AI assistant on your smartphone.

Freely translated quote from the conversation:
“A [neural network for an AI] system gets trained (over months for huge amounts of money…) … and now it comes: This knowledge can be transferred to any smartphone. … When you talk [to an AI on your phone] you just talk to your smartphone and do not interact with a server. …. Changed knowledge, e.g. of ChatGPT, after new trainings of the LLMs can be downloaded (to the smartphone)”.

These statements might, in my opinion, implicate that private information would remain private during a conversation with an AI supported chatbot on a person’s smartphone. Let us, therefore, look at this bold claim in some detail.

Actually, it is, at least in principle, true that

  • specialized Machine Learning algorithms with low requirements regarding CPU power and RAM size
  • and that significantly downscaled, specific and highly optimized language models (based on transformer architectures),

can be run on PCs and in a few cases even on smartphones. But regarding present day LLM services offered by OpenAI, Microsoft, Facebook or Google and regarding the principle structure of most other AI assistant applications on the market, it is not done this way for a variety of reasons. Instead a multi-layer web application architecture based on networks (Internet and other networks on the backend side) and some high performance server infrastructure is used.

Typically, a only speech analyzing or prompt reading user UI application is run on your phone or tablet device. The content is then transferred by this frontend client to backend systems residing somewhere in one or more data centers. The central LLM model runs on some servers owned by Tech companies and a phalanx of attached GPUs. The tool pipeline around the LLM may in addition access various sources of information (e.g. the Internet) in real time from the backend site.

Again: The questions asked by the user are transferred to the central AI model on some servers, the answers are afterward returned to the user interface on the smartphone. Actually, the AI infrastructure of e.g. Microsoft to serve millions of user requests to the AI behind MS Copilot is a huge one. And I am talking here of the infrastructure used to serve at user inference – not for the training of a LLM model.

Local and privacy respecting AI implementations …?

Still and in principle, one could run a private AI based infrastructure on one’s own PCs. I personally do this in parts – but for very limited example models. But this is not something most standard Copilot and ChatGPT/Dall-E users would do. This has to do with the availability and the quality of the service offered by e.g. OpenAI, Google or Microsoft for general purposes. Anything beyond these services requires knowledge most standard users do not have.

I agree that you can get ChatGPT 1.x and and 2.0 and a variety of pre-trained language models with a reduced number of neural parameters running both on a PC and after some optimization effort and parameter reduction maybe even on a smartphone. You find models you can play around with e.g. here.

But believe me you would neither be satisfied with the response times and the quality of the answers, if you do not have a very solid equipment and if you do not build a real time information retrieval tool chain around it. To get the required quality even for specific private purposes – like an AI based assistant for maintaining and searching cooking recipes – you may need to retrain and tune your LLM models on specific data in addition. And this in turn again requires special knowledge.

Anyway, regarding ChatGPT 4 we speak of much bigger Large Language Models these days. You would not run them on a smartphone due to performance reasons, a lack of fast (V)RAM and GPU power and a lack of fast and optimized information retrieval from external or special internal resources.

So, the bold statement of our professor is at least partially wrong for the time being, because really big LLMs need one or more high performance GPUs (sometimes already to just get them installed). Even very small versions of e.g. Mistral (pretrained) require between 4 and 8GB (V)RAM and should best be run on GPU or on a last generation CPU at user inference. Almost the same is e.g. true for a very downscaled version of a GPT-based speech analysis program combined with image generators which I use on the GPU of one of my PCs (a very limited imitation of what ChatGPT 4.0 with Dall-E can do). On the GPU the implementation consumes at least about 6 GByte of VRAM and the GPU processor (a 4060 TI) works above 90% GPU load for image generation.

So, I would agree with a statement that some limited models (regarding the number of parameters and model depth) do run on a well equipped PC environment. Have a look at the GPT4All eco-system, which hosts multiple models that can be run on a PC; see this link. Still, the overall quality is not at all comparable to what e.g. ChatGPT 4 or ChatGPT-4o deliver today.

But a regular installation of e.g. GPT 4 on smartphones? I doubt whether our professor has ever downloaded a reasonably pre-trained, really big LLM with via Hugging Face onto a PC and optimized it there. And there is an additional point which I discuss in more detail below:

For a general purpose and AI empowered chatbot like question and answer machinery you need to build up your own tool chain for information retrieval in addition.

All in all, it is much more probable that our tech professor has confused the Chatbot frontend client (the user interface of the AI assistant) with the real neural network model of the LLM running on backend systems. Or at least that he has expressed his thoughts in a way prone to misunderstandings. ChatGPT 3.5 and 4 models have, by the way, never been released for download … But let us jump over this point.

Let us instead ask Aria (Opera’s free offer of an AI assistant, mainly ChatGPT3.5 based) how it works (highlighting done by me):

rm: Does your core LLM model run on this smartphone or is it only a user-friendly interface with an API to a server running the model there?

Aria: I am an AI assistant running on servers, and you interact with me through an interface. When you ask a question, the input is sent to the servers where the AI model processes it and generates a response.The response is then sent back to your device through the interface.

rm: But would your core model fit into this smartphone, or only smaller GPT-models with fewer parameters?

Aria: Aria utilizes various Large Language Models (LLMs) based on the tasks and settings, ensuring the optimal experience for our users. This means that the core model used by Aria may not fit into a smartphone due to it size and complexity. Smaller models like GPT-2 with fewer parameters are more suitable for running on smartphones.

See also similar excerpts from AI conversations here. So much about the basics. But, in case you do not want to rely on the information provided by an AI:

From the agreements with OpenAI, Google and others which you have to sign ahead of any access to almost any AI assistant on the market it becomes clear that all the information you deliver to the AI assistant interface (questions, context information etc.) indeed moves to a server of the company providing the AI service and that this information is linked there to your personal account there.

So, at inference the data presented by a user to the chatbot’s UI frontend move to server machines out of the user’s control. And some AI companies claim a right to use the transferred information for further AI training and improvements.

You can find this bit of information even on Wikipedia. To get a better idea about what a multilayered, server and network based application like ChatGPT requires and looks like, see e.g. this link, this link and for the frontend this link.

Conclusion: The real problem with the statement of the tech professor was that it was misleading regarding information security and privacy protection.

Any user of publicly available AI chat bots should be made aware of the fact that the data and information used in the conversation will in the end be transferred to some server controlled by a Tech company. But the statements in the panel’s discussion strongly indicated the opposite.

A related question, whether the information provided by the user stays on the smartphone, came directly from another participant of the panel. We got no clear answer.

And it got worse …

Beyond security aspects, another important question is: What did our tech professor mean by the word “knowledge”?

Well, his respective statements in my opinion revealed a basic misunderstanding about AI empowered question & answer [Q/A] algorithms and related tool chains – for special or for general purposes. Unfortunately, it became more and more obvious that probably none of the panel’s participants had ever worked hands on with a Q/A-pipeline built around Machine Learning algorithms – including algorithms and tools for prompt analysis, data retrieval from data resources (e.g. the Internet and/or databases with pre-evaluated documents) and data post-processing including a relevance evaluation ahead of answer submission to a chatbot frontend.

The different elements of a full QA-pipeline, the involved “knowledge” of the central AI’s neural network model and the knowledge included in the retrieved information from external resources were not distinguished. And the relevance of such a distinction seemingly was not even understood. One could conclude this in particular from the following statements – again freely translated:

ChatGPT was trained in April 2023 and when you ask it when Queen Elizabeth of England has died, it will not be able to answer”.

(Well, the Queen died in 2022, but let us again jump over such minor aspects.) The professor wanted to make his point by an even better example. His question, which he presented to the Avatar, was:

How did Germany today play against Hungary in the European Championship?“.

The tech professor referred to Germany’s soccer match against Hungary during the European Championship which had happened on the very afternoon of the BR broadcasting. The Avatar and the AI behind it gave an answer “The match ended in a draw” – obviously referring to the group phase of the Championship in 2021. So, the AI gave a wrong answer – as the professor had expected.

Everybody laughed. What a mess … AI specialists? The answer was taken as a proof that the AI was not able to use up-to-date information in a Q/A scenario, because such information – according to the professor – was, of course, not part of the “knowledge” achieved during the AI’s training in 2023 or 2022.

However and unfortunately, the fact that the AI used by the Bayerischer Rundfunk confused the Championships and soccer matches does not make the assumptions of the tech professor about AI applications in general true. The reason is that a modern AI assistant is a complex multi-layer application using the central LLM’s capabilities to combine and evaluate information from multiple resources in the context of a posed question or task.

Arias answer to the example question testing a (wrongly) assumed inability of an AI assistant application to answer questions concerning recent events

Well, I directly asked Aria (based on ChatGPT 3.5) in parallel to the ongoing TV discussion:

rm: Find out, e.g from the Internet, when Queen Elizabeth II (the second) of England died.

Aria: Queen Elizabeth II of England died on September 8, 2022 at Balmoral Castle in Scotland at the age of 96.

rm: Find out what the result of today’s football match Germany vs. Hungary was.

Aria: The result of the football match between Germany and Hungary on June 19, 2024 was: Germany won with a score of 2-0 against Hungary. Goals were scored by Jamal Musiala and Ilkay Gundogan for Germany. This victory secured Germany’s passage to the round of 16 at Euro 2024.

I got something similar from ChatGPT 4. Do I need to say more?

The conversation with Aria above does not only show that a basic implication of the professor’s statement was wrong – despite the wrong answer of the AI application used by the BR. It also raises doubts whether the members of the panel had fully grasped an essential clue about different kinds of “knowledge” involved in AI empowered Q/A-applications and respective tool chains. Otherwise these experts would have tested and falsified the claims directly on their smartphones – as I myself did. If the had dared to contradict a professor …

AI training vs. later information retrieval from up-to-date resources

So, how could it happen that Aria and ChatGPT 4 got it right – in contrast to the BR AI application and despite the fact that the soccer match had happened on the very afternoon of the TV broadcast, i.e. a very long time after the training of the LLM? I hope that at least my readers guess the elements of the answer.

If you know a bit about the internals of LLMs you also know about the importance of the significance of single words in their relation to others during the reaction of a LLM to questions. This reaction would indeed be based on the “knowledge” achieved via the adaption of its weight parameters during training. The “today” keyword in the question will probably not find a good internal match in relation to the general information context of the question alone. Therefore, depending on details of the transformer implementation and its training the importance of “today” might become suppressed when other relations (soccer <> Germany <> Hungary <> European Championship) can be fulfilled. This might in parts explain the wrong answer of the non-disclosed AI setup of the BR.

But this line of argumentation (as the one of the professor’s argument) depends on the assumption that the LLM model in Q/A scenarios has to work with its internal knowledge and the information presented by the user, only (!).

Again: This is not how modern chatbot applications based on general LLMs and offered by major Tech companies actually work!

The big LLMs – when used as backends of the UIs of general purpose chatbots or AI assistants – are, of course, linked to information retrieval engines scanning other resources of information in real time. The coupling of the core AI models to information retrieval tools is a major and decisive brick in the high performance infrastructures and architectures which enabled the world wide success of ChatGPT and other AI empowered chatbot applications. The additional information must, however, be evaluated by the LLM in the context of the question posed by the user.

The point is: Yes, you do implement knowledge into a neural network as a LLM via the parameter adjustment during training. But this does not prevent a trained AI to access other up-to-date information resources during question and answer processes via additional interfaces at the time of user inference.

If you think about it yourself you would certainly come to the conclusion that it would be stupid of e.g. Microsoft not to couple its LLMs with its Bing search engine.

Any suitably set up Q/A tool chain supporting an AI assistant would at the backend side couple the AI’s core LLM to further information resources – as e.g. the Internet or e.g. the “Elastic Search” engine retrieving information from databases and other information systems. In particular the coupling to “Elastic Search” is described in most modern books about building transformer enabled AI applications.

The members of the panel obviously fell into a trap when one of its participants set the focus on the LLM-training and the information and respective reaction patterns “implanted” in the neural network. As important as the training is to build up the general abilities of the LLM for interpreting prompt information and reacting correctly to it, it is not not the only information playing a role when using AI based Q/A tools. A reasonably set up Q/A tool chain with a trained LLM at its center will, of course, also retrieve information from other general or speicialized resources and let the LLM evaluate the relevance of the different eventual pieces of information.

One wrong answer of an AI, whose setup was not disclosed, cannot and does not proof anything regarding the information sources used for the production of the answer expressed by an Avatar. Experts on information systems would furthermore know about the complex structure of modern web applications – and an AI assistant is no exception. They would also know about the inclusion of information from other information resources into AI based Q/A processes. But nobody in the panel expressed any doubts regarding the assumption of a completely local and isolated implementation of a LLM.

The AI application used during the TV show of the BR may indeed have been based on a somewhat isolated system with no connection of its backends to the Internet and without any attached information retrieval engine. For such a peculiar case the ideas of the tech professor would have had some relevance. But – as said – we got no information about the AI installation presented to the audience.

However, and more important: Any general claim indicating that an AI application based on a central trained LLM has no access to other information than the one which the LLM has seen during its training, and that such applications therefore cannot correctly answer questions referring to newer events or facts, would be wrong for most modern AI empowered chatbots. It would also be wrong for many other AI empowered applications for more specific purposes.

A trained AI model within a properly designed Q/A tool chain (1) can and will use information retrieved from searches in external resources for actual, up-to-date information, (2) can interpret such search results in the context of a question, (3) can evaluate the relevance of the retrieved information and (4) can include additionally retrieved and presented information in answers. This is true for most of the popular AI assistants of the US Tech giants. But it is often also true for highly specialized AI applications and resources containing problem specific information. When exactly and how exactly a search in external resources is triggered and how the retrieved additional information is presented to the LLM may vary between different AI assistant applications. But such information is used e.g. by ChatGPT.

Unfortunately, no one in the panel gave the audience a hint in this direction. And that is why I call the claims made and underpinned by the soccer related example question “misleading”.

Aria/ChatGPT’s answer regarding information retrieval from external, up-to-date resources

Both Aria and ChatGPT – and their underlying trained LLMs have real-time access to multiple information resources during user inference and perform information retrieval and evaluation in the context of posed questions. And, of course, some of the information resources (as e.g. Internet) would contain relatively up-to-date information or would be refreshed on a regular basis – long after the time when the training of the AI’s core model was done !

Let us again ask Aria directly:

rm: What information resources are available to you for information retrieval during question and answer processes?

Aria: I get my knowledge from data that’s currently available on the web and a large quantity of publicly available resources that the AI model was trained on. …

rm: Do you analyze web-site content in real time during a Q&A process?

Aria: Yes, I do analyze web-site content in real time during the Q&A process. I search the web to gather the information only in order to respond to users’ specific questions. …..

Therefore, once again from someone who has trained ML algorithms coupled to some external information resources for information retrieval:

The “knowledge” acquired during training basically determines the algorithm’s reaction to questions at user inference, i.e. the analysis of the question and a respective conversation context (based on internal knowledge) and the creation of an answer. If combined with information retrieval engines it also evaluates the relevance of the collected information for the answer. But the LLM training does not determine or restrict the variety of information resources available to the complete tool chain of a modern AI application infrastructure or the up-to-dateness of the information stored in these resources. Some external resources as the Internet can contribute with up-to-date information in a concrete AI controlled Q/A situation independent of when the training of the LLM has happened.

Conclusion

The TV discussion show of the BR cast a bright and very uncomfortable light on the state of the understanding of how ML algorithms in general and AI assistant applications based on Large Language Models work – even in the case of rather well educated people in Germany.

The results of the training of a neural network on one side and the later handling of speech or text information provided by a user as well as the evaluation of additional information retrieved from accessible external information resources by the trained AI on the other side were not distinguished sufficiently. The inclusion of retrieved up-to-date information (e.g. from the Internet) during Q/A interactions with AI assistants and their construction of answers was almost ignored completely.

Thus, the audience could get a wrong impression about the capabilities of modern high performance infrastructures supporting major AI empowered chatbots. Respective applications and attached tool chains at their backend sides are well able to access up-to-date information resources long after the AI’s training and therefore also to provide actual information during Q/A processes.

In addition, also some statements regarding the assumed “local” handling of information during an interaction with an AI assistant on smartphones became misleading as it was not explicitly pointed out that most publicly available AI services do transfer (private) information to remote infrastructures and servers where the core neural network models are run by Tech companies. This has major implications regarding privacy protection, which unfortunately were not discussed sufficiently.

We expect and deserve better quality from the publicly financed TV in Bavaria.