“Improving Accuracy Tips"

2025-03-18 20:45 AI

Tips from the DOD!

I was just reading an article from Carnegie Mellon on what concerns the DoD has in software procurement, and more relevantly to this topic, AI/LLM usage. The whole thing is interesting, but for my purposes here I’m most interested in LLM accuracy and on that topic, there wasn’t much in the article, but what was there was good.

Source: Perspectives on Generative AI in Software Engineering and Acquisition

Prompt Engineering

So this is kind of a dumb term, but whatever. It’s now the lingua franca. A few suggestions or recommendations are these:

Chain of Thought prompting

In this technique you’re basically asking your model to behave like those new-fangled “thinking” models. So when you prompt it, you also ask it to describe the process step by step too. I know from other sources, that “thinking” models have improved accuracy due to the “thought” output so it makes sense that simulating that would similarly help

Tree of Thought prompting

In this technique you are performing similar actions to chain of thought prompting, but instead of asking the model to explain the whole process in one go you would ask it to explain only the next step. In its explanation, you would also ask for multiple possible ways to accomplish this step.

One thing you can do at this point as well, depending on your API budget, is start up multiple chat sessions and ask the same question of each session with the intent of receiving multiple different answers.

Then, when you have a selection of possible next steps, feed those possibilities back into the model and ask it to evaluate how effective each next step might be. Take the highest rated possibility and start the process again, but now looking for the next step.

Source: Chain of Thought (CoT)

Implement Processes to identify errors

One thing that’s harder for me to do, as I’m a one man band here, is also to implement processes that can identify and reduce harm from erroneous output. Unfortunately the CMU article doesn’t go too in-depth here, but one could imagine that if you were doing software development, for example, you could include some vulnerability scanning tool in your CI/CD system. Something like that perhaps or similar systems in other disciplines I’m not familiar with.

Thinking further myself

Adding these suggestions to what I discovered earlier about locating higher quality sources should help improve LLM output accuracy for me decently.

Next Steps?

So this doesn’t really help determine if the current crop of AI helps reduce or increase harms. It’s just good info as I continue my quest as I am using these systems to perform searches for good articles and papers, like this article from CMU (which also incidentally lead me to this honey on ACM). So the research will continue, but it’s good to arm myself with helpful techniques to find what I’m looking for!