AI Musings
AIMusings and Thoughts
Since my last post I’ve been thinking about why LLMs work as well or poorly as they do and I think I’ve come up with a working hypothesis. Admittedly, it’s probably obvious, but hey! I never said I was quick!
What Were These Systems Trained On?
Programming
So, in the case of programming, these LLM systems have been trained on things like code in GitHub or posts on Stack Overflow. What are these posts typically? I would assume these posts are typically amateur programmers asking entry level questions more often than not. Further, the questions will likely be for smaller systems or short code snippets and not entire applications. Given that sort of training bias, I would expect for programming that an LLM would be well versed in simple programming tasks and less so in more advanced tasks. And I think, in that line of reasoning, it would be safe to expect the kind of output seen from the reports I reviewed in my last post.
Also, consider the LLM works on the idea of “advanced autocomplete”, for lack of a better description. So, even if the LLM were to ingest a complex program, how do you prompt an LLM to produce the works of the Linux Foundation when there is no real starting point? Nobody has asked “how to you make Linux” where someone else provided the result. Nobody has said “And here is my magnum opus: a web browser” and followed that up with Konqueror. But there have been tons of posts beginning with “how do I write a script to update my Arch box on a regular basis” followed with a SystemD script to do just that. So if I ask Gemini “how do I make a chat server?” it won’t know how because it’s just not in the training data or it’s not there in high frequency. But if I ask it “how do I make a bash script to show me the size of directories in .?” it can pop that out all day long because everyone asks that.
I may need to review this thought for actual data. It could be informative!
Health Care and Other Languagey Things
Following on that idea and expanding a bit, other disciplines may run into this issue less because their discussions or publications are more widely available or have been around longer. Additionally, these can be discussions at length and provide more context leading to a conclusion whereas programming has not so much. Take medicine, for example. There are multiple journals online with these discussions. If I ask Copilot about chickenpox, it will probably be able to answer because these questions/prompts/context are in the training dataset or these patterns are in the training dataset and it can use agents to find other information. It’s not new or novel – these are things the training specifically trained the LLM on.
So if I ask an LLM to summarize something, it can do it pretty well because we the “summary” header everywhere. Or we see a “TL;DR” or whatever. The LLM can handle that sort of thing easily because that’s what it has been explicitly trained on.
So Where do I go from Here?
I might need to investigate this more and see what information I can find. Where do LLMs do better? Why? Where do they do worse? Why? If I remain in the LLM’s wheelhouse, are there benefits or value? Even if I leave the wheelhouse and move out to the driveway, is there any value? Maybe more to come!
An aside
I’ve seen Microsoft recently released a new Copilot Agent Mode. I can’t tell you how much I want to try it, but that feeling is super tempered by every other interaction I’ve had with one of these systems and that is they aren’t great for anything but simple scripts. Plus I’m already paying for Gemini and getting mileage out of it for this research and I don’t want to pay $10 per month for another AI thing that’s very mid, even if it’s better than every other tool I’ve used so far. If I hadn’t burned through my Copilot trial over Christmas, I might be more inclined but alas.