The last months one could read multiple articles about using LLMs for solving problems in mathematics. This morning, I read e.g. an article in the quanta magazine (“quantamagazine.org”) about a famous mathematician who uses AI and supporting programs to verify mathematical statements and suggested proofs. His idea is to stack all steps of a proof into a computer program which translates everything into a formal logical language – and thereby detects errors. Well, well, the interesting point is that this mathematician has programmed parts of the AI himself.
But what about available LLM-based AI and math? In particular regarding the freely available versions of ChatGPT 5.x? Are these tools so clever as the producers claim them to be? Or are the Chatbots as stupid as they themselves found out in the last posts of this series:
The last month I tried to use the free version of ChatGPT in the realm of some standard math problems.
A simple test problem
I am in the process of writing a little book about Multivariate Normal Distributions and the geometry of their companions – multidimensional ellipsoids. I sometimes use the free version of ChatGPT 5.x to verify claims, to collect literature links and to evaluate different proofs of certain statements. Most of the stuff is published as my topic is not a new one – and based on references most of the time GPT reproduces or applies the learned stuff correctly.
However, it happens that you come across things which are not identical to what you would find by a regular search on the Internet. You know a solution, but you would like to prove it along a non-standard, but not really complex path. To save time and to avoid (elementary) derivations by yourself, you may think: Delegate some tasks to ChatGPT. I tried it multiple times – and had mixed experiences. This is an area where the free versions of ChatGPT may fail. And the failure may astonishingly happen at very simple steps.
In my case this has happened relatively often for mathematical problems in physics and it has happened also for relatively basic stuff in pure math. The last example, which made me write this post, was a problem a student at high school would be able to solve:
Take a general rotated ellipse centered in your coordinate system. Write down the quadratic form condition and its coefficients. Take a graph function (depending on the coefficients) that describes an upper part of the curve. Use the graph function to derive the length of position vectors of points on the ellipse as a new function of coordinates r(x,y). Differentiate this function (or its square). Search for extrema by setting the derivative to zero. The answer would be the lengths of the ellipse’s semi-axes.
The present free version of ChatGPT 5.5 obviously did not like this problem. It called the task “messy” due to appearing nested square roots. It tried to circumvent expanding the terms and suggested a method working with gradients, tangents and slopes. It presented this approach. (The suggested method would have been similar to using a Lagrange multiplier for finding extrema of functions under constraints). Basically, GPT would have preferred to use eigenvalues of the quadratic form matrix of the ellipse to find a solution, but the setting of the task excluded this. But, I did not want a solution based on Lagrange multipliers.
In the end, I had to explicitly force ChatGPT to follow the “messy” path, which it itself qualified as messy, but “walkable”. I told it explicitly, NOT to use the suggested and fully presented approach with tangents and slopes which I already was familiar with. It thought about the “messy” problem for around 30 secs. But it failed – and the way it failed was remarkable:
During the “walk” it tried multiple times to avoid an expansion of all required terms. It neither wanted to solve a (relatively simple) resulting equation for x4 and eliminate spurious solutions. It remarked that an early factorization would be necessary (good idea, but in this particular case fruitless). Then it claimed: The important step comes next. Well, I show you a respective screen shot of the protocol – including a central wrong step of GPT5.5 (free version). You do not need to understand the origin of the involved expressions, just its transformations:
Screenshot of blatant mistake of GTP5.5

Well, my readers have of course detected the error: The third equation is wrong, actually blatantly wrong. Asked for a proof of this step, GPT gave the following answers:
ChatGPT:5.5: You are absolutely right to challenge that step — because that identity is false. … I incorrectly claimed it. [Remark eremo: 3rd eq. above] … So that step was simply wrong. Your consistency check is completely correct. What is actually true? The honest situation is: The direct differentiation route leads to a genuinely messy algebraic equation. Squaring does introduce spurious solutions [Remark eremo: which can be analyzed and eliminated!]. .. But the elimination of those spurious solutions is not achieved by the fake factorization I wrote. The clean quadratic does not emerge from the previous polynomial by a simple cancellation trick. So, I overcompressed the derivation and introduced an invalid algebraic identity in the process.” … “The deeper lesson: Your check illustrates something very important mathematically: When manipulating equations with square roots, one must be extremely careful about: squaring, factorization, cancellation, branch structure. A single unjustified algebraic rewriting can silently change the solution set. You caught exactly such a mistake [Remark eremo: of GPT5.5].“
Remarks in rigid brackets were added by me. What an insight at the end … Really remarkable, isn’t it? And this is really no rocket-math. Would you want to pay for a GPT-version which maybe (!) follows most elementary rules when dealing with algebraic expressions? No, you expect basic mathematical steps and a verification of each step of any GPT-version – or a clear hint that the task may be beyond the versions capabilities.
After telling GPT that we were now back to square one and it had not at all solved the given task, the following answer came:
ChatGPT:5.5: The answer is yes — but the cleanest rigorous execution of the “messy route” requires organizing the algebra much more intelligently than brute-force expansion. The key insight is: the differentiation itself is messy, but after obtaining the extremality condition, one should not fully expand everything [Remark eremo: Wrong again!]. Instead one should introduce a slope parameter early enough to preserve structure.
Why did I characterize this new “insight” as wrong again? Because a full expansion, straightforward collecting and cancelling of terms actually provides the aspired solution! No dirty tricks required. You can find the solution in the math section of my blog on Machine Learning.
And what did ChatGPT 5.5 eventually do after the last above answer? It returned to the previous, already presented solution which was pretty close to applying a Lagrange multiplier. And presented it again – in full length. Showing its limited capacity to analyze the contexts of a chat. So, it repeated an approach which I had explicitly asked NOT to follow.
The problem is this example of a misguided derivation is not the only one I have seen. ChatGPT even praises “astonishing term cancellations” and other types of simplifications based on mistakes at some intermediate step. At least the free version is sloppy regarding reasoning and verification of mathematical transformations.
Be careful …
What to make of such kinds of experience? The results above and of multiple other experiments indicate the following:
- ChatGPT 5.x tries to avoid longer expansions of terms which it regards as being complicated [“messy”].
- ChatGPT 5.x “proudly” invents steps – with errors – if these steps help to overcome a hurdle. You can not trust simplifications or “remarkable cancellations” it claims to have detected.
- You can not trust a mathematical derivation of ChatGPT in its present free version, when GPT has to do multiple steps on its own.
- You have to verify a “proof” presented by ChatGPT step by step – even if its simple math.
- Be particularly suspicious when you do not find a direct reference for your problem on the Internet.
I dare to say this because this is not the first time that ChatGPT offered me wrong “solutions”. I have experienced this for problems in physics, complex programming, math and Python codes suggested for complex Autoencoders in Machine Learning. Often accompanied by a substantial investment of wasted time – as I in the end had to solve the problem myself. So, if you want to work with GPT on complex problems check each of the suggested steps and do not try to let GPT cover a whole sequence of steps without intermediate control.
Math should not be a matter of money
How complex is the problem I had confronted ChatGPT with? At the center of it we need a prove of the following relation
with
(after carefully selecting a sign) and
Now, I hear already fans of ChatGPT saying something similar as: When you want to get an answer to the problem of proving an identity as the one above, then you should absolutely use the expensive professional version (103 € per month). Maybe, … But there is no guarantee that the results would get better.
And think about it. The real consequence would be: Math in combination with AI would be something accessible only for the rich.
Consequences
The example shows one thing: Advanced forms of AI should not be developed by private companies, only. European states must quickly establish funds and resources to build our own AI-models to be used in science. And such advanced models should also be accessible freely for ordinary people – not only for institutions.
Aside of some frustrating experiences with the free version of ChatGPT, I nevertheless have to admit that even wrong approaches of an AI can point you into the right direction. And sometimes a discussion with ChatGPT may give you an idea which you did not have before.
But the rue is to be careful and not to trust the answers of what profit oriented companies call an “AI”. Once again: It may be much more stupid than you think or expect it to be.