THANK YOU FOR SUBSCRIBING

Intelligent...Artificial Intelligence?
Argyro (Iro) Tasitsiomi, PhD, Head of Investments Data Science, T. Rowe Price


Argyro (Iro) Tasitsiomi, PhD, Head of Investments Data Science, T. Rowe Price
Since the introduction of ChatGPT, Generative AI and Large Language Models (or “LLMs,” with “large” referring to the model number of parameters), more specifically, have generated enormous excitement and interest with their ability to compose stories and poems, answer questions, and engage in conversations.
But how intelligent are these AIs? By intelligence, I mean the ability to respond successfully to novel challenges. And does this even matter? From a practical standpoint, their potential is undeniable: even if they are not intelligent per se, users can boost productivity with LLMs, both in content consumption and creation.
Yet, the answer to this question matters. To effectively address the potential risks, we must understand the capabilities and limitations of LLMs, both crucial to mitigating the risks of either excessive reliance on AI-generated information or unwarranted fears of automation replacing humans, both of which, though distinct, can have adverse consequences.
In what follows, I offer some considerations for readers to ponder as they contemplate this and related topics.
Why the “perfect” model is not the best.When we effectively fit a model to data, we are trying to find a data compression mechanism. For example, say we fit a line to 1000 points; assuming the fit is good, this would mean that we managed to store most of the information in the data in only two parameters: the slope and the intercept of the line. So, if we wanted to communicate the information the 1000 data points carry, we can now do this with only two values rather than 1000.
Good models yield data compressions that are efficient and with small information loss. Efficient means that the model captures the data informational content with only a few model parameters – much smaller than the data size. And small information loss means it produces values close to the real data. That is why we find the best model parameters by minimizing metrics that represent the loss - how far the fit predicted values are from the real data (think least squares).
The perfect minimal information loss scenario would be when the model values are equal to the real data. This can happen if the model has the same number of parameters as the number of data points: each parameter will “store” the information of exactly one data point. Yet this “perfect” fit achieves nothing compression-wise: it uses as many parameters to capture the data information as the number of data points…
Furthermore, all real-world datasets contain both useful information and useless noise. By forcing a model to represent the data with fewer parameters, we force it to learn the information, not the noise. Allowing for more parameters beyond a certain point leads to overfitting: the model learns every little twist and kink in the data, signal, or noise and, as a result, lacks the flexibility to fit data it has not seen before. As we approach the “perfect” model, we can fit exactly all the data the model sees. And the model will be useless when it encounters data it has not seen. This is why the perfect model is not the best; it is like a student who has memorized all the knowledge he was given but cannot solve any unfamiliar problems.
To effectively address the potential risks, we must understand the capabilities and limitations of LLMs, both crucial to mitigating the risks of either excessive reliance on AI-generated information or unwarranted fears of automation replacing humans, both of which, though distinct, can have adverse consequences
Even though it is a bit more difficult to understand when language is involved, all the above applies to LLMs equally well, at least intuitively.
It turns out that LLMs’ compression ability is relatively poor: they “fit” a lot of data points (~ the whole internet) but also have a lot of parameters (~ trillions+!). The larger the LLMs we develop, the higher the fraction of the internet “stored” in the LLM parameters and, thus, the closer we get to “overfitting” and memorization.
Furthermore, in the hypothetical scenario where an LLM was given the whole internet to learn from, we would encounter the paradox of overfitting not being a problem because… there is no data the model has not seen! And with the model being large enough to retain most of all this data, we would have this enormous “perfect” model of everything: a more verbose copy of the internet…?
ConclusionLLMs are marvels of human ingenuity that can deliver significant value to us, not because they are efficient but because they are enormous. That makes them “brute” force, insanely complex models with equally brutally enormous ability to “memorize” information, and this memorization can imitate intelligence.
Then again, life itself may have jumped off enormous, complex systems because of emergent behaviors; couldn’t real intelligence emerge from these enormous LLMs in a similar way? Well, this is a conversation for another time!
Weekly Brief
I agree We use cookies on this website to enhance your user experience. By clicking any link on this page you are giving your consent for us to set cookies. More info
Read Also
