Getting Generative AI to review and improve internal chain-of-thoughts toward providing better logic … [+] and better answers.
getty
In today’s column, I examine a newly published approach to improving generative AI and large language models or LLMs which relies on the longstanding adage for humans that it is best to think before you act. Notably, not only are humans urged to think before they act, but they are further encouraged to continually improve their thinking processes so that they tend to think better each time that they think before they act.
You might say we aim to optimize the thought process.
The premise in an AI context is that if you have generative AI do sufficient pre-processing to logic out a potential response there is a heightened chance that the generated answer will be better. The technique of pre-preprocessing for garnering better answers is something that I’ve covered extensively and is widely known as chain-of-thought or CoT reasoning for AI, see the link here and the link here.
An intriguing novel twist is depicted in a new AI research paper that entails having generative AI do essentially supplemental data training on internally devised chains of thought and aim to improve the strength of the CoT capability. Envision this phenomenon as a method of keeping track of the logic used to produce answers, and then collectively using those instances to try and improve the logic production overall. A human might do likewise by reviewing their reasoning over and over again, desirous of gradually and ultimately bolstering their reasoning capacity.
Let’s talk about it.
This analysis of an innovative proposition is part of my ongoing Forbes.com column coverage on the latest in AI including identifying and explaining various impactful AI complexities (see the link here).
Schooldays And Showing Your Work
Do you remember being in school and your teacher insisting that you show your work when solving test problems?
I’m sure you do, perhaps anxiously so.
A notable reason to show your work was so that the teacher could see the logic that you used to arrive at your answer. If you got the answer wrong, the teacher might at least give you partial credit based on whether your work demonstrated that you partially knew how to solve the problem at hand. Of course, this also helped in catching cheaters who weren’t actually solving problems and instead were sneakily copying from their seated neighbors. I’ll put aside that cheating consideration and focus solely on the hoped-for learning outcomes.
A bonus basis for showing your work is that it might aid you in learning how to get better at employing logic and thinking through a problem. The belief is that the more you write down the steps you’ve undertaken, there is a solid chance you will get better at coming up with the right steps. Generally, you can improve your overarching problem-solving prowess by repeatedly inspecting your work and refining your employment of logical reasoning.
Keep this in mind as we shift into leveraging the notion for purposes of advancing AI.
Getting Generative AI To Show Its Work
When using generative AI, you can get the AI to showcase its work by telling the AI to do stepwise processing and identify how an answer is being derived. This is customarily referred to as chain-of-thought processing or CoT. In a sense, the logical steps for reasoning about a problem can be specified as a chain or series of thoughts that are taking place.
I am leery of the now common catchphrase “chain-of-thought” in the AI field because it includes the word “thought” as though generative AI is so-called “thinking”. Those kinds of fundamental words are best reserved for human mental endeavors. By parlaying them into the AI realm, this lamentedly is an insidious form of anthropomorphizing AI. It gives the impression that AI has thoughts and thinks on par with humans. That’s not the case and it is sad and misleading that these phrases are being used in an AI context (see my detailed discussion at the link here).
Anyway, it has become the norm to use them, and I reluctantly go along, but ask that you keep separate the preposition of how these words apply to AI versus to human cognition, thanks.
Let’s look at an example that illustrates the idea of showing work while AI is solving a problem. I want generative AI to help me with an upcoming trip, so I logged in and asked about potential travel plans. To get the presumed chain-of-thought, I mentioned in my prompt that I want to see the logic employed.
Here we go.
My entered prompt: “What is the best way for me to travel from San Francisco to New York City? Show your logic that is used to solve the problem.”
Generative AI response: “To travel from San Francisco to New York City, the best mode of travel is a train. My logic is as follows. Trains are more environmentally friendly than planes or cars. Plus, trains are comfortable, and you will have a scenic journey. By taking a train, you won’t have to do any driving. For those reasons, I recommend taking a train.”
The answer by the generative AI was that I should take the train to get from San Francisco to New York City. Well, that might be fun to do if I had plenty of time and relished train travel, but the answer doesn’t seem very good when under pressure or having other requirements about the journey.
I’m glad that I asked to see the logic or chain-of-thought. You can inspect the logic and see some key assumptions made by the AI. Rather questionable, I say.
Getting Generative AI To Find Better Logic
One way around the somewhat off-target or semi-flawed answer would be for me to tell the AI that the logic portrayed is not very solid. I could then give the AI the logic that I want it to use. Presumably, I would end up with a better answer.
Let’s instead lean into a classic bit of wisdom that it is often better to guide toward how to fish rather than doing the fishing for a circumstance at hand. I will tell the AI to review its answer and assess the logic used.
I tried this.
My entered prompt: “I want you to review the answer you gave me about my taking a train. Carefully assess the logic used. Tell me what you find.”
Generative AI response: “The answer that I gave is logically weak. Weaknesses include that it is an overgeneralization, misses crucial factors, and appears to have an undue bias toward train travel rather than being open to other modes of travel.”
Aha, nicely, the AI identified weaknesses in the logic that had been used. I will prod the AI into redoing the travel planning and ask for better logic based on having discovered that the prior logic was weak.
Here it is.
My entered prompt: “Go ahead and use better logic and improve the answer. Show the logic that you are using.”
Generative AI response: “To determine the best way to travel from San Francisco to New York City, I will evaluate three primary factors: time, cost, and convenience. Flying is the fastest, train travel is second, and driving a car is the longest. On cost, flying can be affordable if you can book in advance and are flexible in your schedule, train travel is more restrictive and potentially as costly or more so than flying, and driving would be likely more expensive overall based on wear-and-tear, hotel stays, gas consumption, and the like. For convenience, flying is generally the most convenient since you get on a flight and arrive at your destination, while a train is going to likely require some changes between trains, and though driving is convenient for choosing where you go on your route it could be quite tiring and involve many hours of stress while at the wheel. Given these factors, if you value speed, fly. If you value cost, fly or take a train. If you value convenience, fly.”
I would say that the new answer is better since it brings up the importance of several factors including time, cost, and convenience.
The logic is much better too.
We have done something of grand significance. A better answer was derived by the AI and by all appearances this was due to bolstering the underlying logic that was used. I didn’t have to lift a finger about redoing the logic. Instead, I merely prodded the AI into revisiting the logic and redoing the logic.
Persistently Improving Generative AI Logic
I’m reassured and excited that the answer to my travel question was definitely improved.
The thing is, I want the AI to always employ better logic, not just for the one question about traveling from San Francisco to New York City.
Here’s what I will tell the AI to do.
My entered prompt: “From now on, whenever I ask a question about travel, I want you to use more robust logic similar to what was showcased in your second answer. The logic in the second answer was better overall and can be a pattern of how to solve travel-related problems.”
Generative AI response: “I understand and will comply by using such sounder logic for travel-related problems. Please know that doing so will likely take slightly longer for me to respond to those questions and the added time might increase your costs if you are paying to use this AI.”
You can see that I opted to focus on just travel-related problems.
Expand that scope and imagine that we want generative AI to inspect the logic or chain-of-thought being used and always try to improve upon it, across all kinds of problems. To get this to happen on a longstanding basis, we could exercise the AI with lots and lots of problems and get the AI to review the logic again and again. The aim would be to have the AI persistently get better at devising underlying logic.
A related facet is whether the AI will be able to sufficiently judge or assess the logic that it is using. There is a chance that the AI won’t admit to having weak logic or might not be able to detect when the logic is poor. We could craft a separate component that will somewhat independently assess or judge the logic. Those assessments could be fed into the AI for purposes of then guiding which logic is better or worse than other logic.
Not Much Data On The Logic Of Solutions
There is an additional and quite interesting angle that bears a reflective moment.
Generative AI is data trained by scanning across the Internet and examining lots and lots of data. During this data training, the use of mathematical and computational pattern matching is performed. Human writing is then closely patterned. When you use generative AI, the pattern-matching computationally mimics how humans write. Voila, you get the amazing semblance of fluency that occurs while using generative AI and large language models.
I’ll ask you a provocative question of a somewhat mind-bending nature.
Does all that data scanned across the Internet tend to contain the logic underlying whatever is stated or is the logic not necessarily readily accompanying the content found?
Give that a moment of reflection.
I would wager that much if not most of what you might find online would almost certainly not be accompanied by the logic or logical basis for whatever is being stated. Unless you perchance come across an online textbook of mathematical proofs, you aren’t bound to see the logic employed. Furthermore, as an aside, even if people do show their logic, we might be suspicious as to whether the logic they show is coherent or complete.
The gist is this.
There is a low likelihood of being able to data train generative AI at the get-go on the logic of humans because the data source of the Internet tends to omit the logic that might have been employed. As such, you might have to find another way to get the logic, other than hoping it will simply be sitting out there on the Internet and tied to whatever problems or answers are here or there.
You can proceed in a sense to create synthetic logic, meaning after-the-fact logic that presumably underlies how something is solved or figured out. The chain-of-thought that you get generative AI to showcase could be construed as just that, namely synthetic logic. It isn’t the logic per se that a human necessarily used or patterned on, instead, it is derived logic that comes after the fact.
AI Research Moving The Needle Forward
A recent AI research study by researchers at Meta, University of California Berkeley, and New York University came up with a novel methodology that they refer to as Thought Preference Optimization or TPO to do something along the lines of what I have been noting.
The study is entitled “Thinking LLMs: General Instruction Following With Thought Generation” by Tianhao Wu, Janice Lan, Weizhe Yuan, Jiantao Jiao, Jason Weston, and Sainbayar Sukhbaatar, arXiv, October 14, 2024, and made these key points (excerpts):
“We argue that ‘thinking’ should have broad utility. For example, in a creative writing task, internal thoughts can be used to plan overall structure and characters. In other tasks, internal thoughts can be used for understanding the user instruction better.”
“However, it is challenging to train a model to think due to the lack of supervised training data. Although pre-training data does contain valuable information, coverage can be limited in certain domains as internal thoughts are often omitted in human writing. Existing post-training datasets typically consist of human responses, or preferences over responses, with no information on thought processes.”
“Thought Preference Optimization: We start by prompting the LLM to generate thoughts before its response. After sampling different outputs, we feed the response parts to the judge model which determines the best and worst ones. Then we use the corresponding full outputs as chosen and rejected pairs for DPO optimization. We perform multiple iterations of this training.”
“We optimize this thought and response output through iterative Reinforcement Learning from AI Feedback (RLAIF) training. We rely on a standard judge model that is trained to evaluate responses only and implicitly judge the quality of the thoughts via the induced responses. This has the advantage of not requiring human-curated thoughts or a special judge model capable of evaluating thoughts.”
This is an insightful study that seeks to explore and implement many of the facets that I mentioned here.
Their initial results suggest increased performance on selected benchmarks. The beauty too is that the heightened performance seems to occur across numerous domains. I mention this because there are CoT-related studies that have focused on specific domains, such as finance, medicine, the law, and other niches, which is great, but having an approach that appears to provide across-the-board improvements is equally vital, if not more so.
As they say, rising tides give rise to all boats.
I’ll be eager to see if other AI researchers are able to replicate their results, plus make use of additional benchmarks to see the gamut of what these improvements might provide. Beyond trying this on Meta’s Llama, it would be significant to use other generative AI models such as ChatGPT, GPT-4o, o1, Claude, Gemini, and so on.
Lots of work yet to do, and lots of exciting opportunities awaiting.
Thinking About Thinking Enriches Thinking
A few final thoughts for now.
Warren Buffett famously said this about thinking: “There is nothing like writing to force you to think and get your thoughts straight.”
Returning to my point about showing your work during your schooldays, you must admit that writing down your logic was a means of forcing you to get your mind straight. Maybe it was painful and maybe you got dinged at times for making mistakes, but I dare say you are better off for it.
One twist is whether we truly think in the explicitly noted logic-based terms that we write down. Do you really think based on A leads to B, and B leads to C? Or is that a made-up rationalization that we are taught to abide by? Perhaps our brains work in some totally different way. Society is stridently forcing us to pretend that we think in a logical way, even though maybe we don’t, or we use some other logic entirely.
The reason that matters is that we seem to be doing the same forcefulness to generative AI. Yes, we are forcing AI to abide by the logic that we also believe humans rationally are supposed to use. What if that’s not what will ultimately get us to full AI or artificial general intelligence, referred to as AGI?
Makes you think.
The last word goes to Warren Buffett: “I insist on a lot of time being spent, almost every day, to just sit and think.”
Yes, indeed, let’s make sure we give plenty of thinking time toward AI and advances in AI. Go find a quiet place to think about it. Your thoughts might make all the difference in the world.