Did you know New Study Shows LLMs are Good At Generalizing on their Own Without Human Input
According to a new study by Hong Kong University and University of
California, large language models can generalize things and find better
solutions if they are left alone to solve them. This study
challenges that belief that large language models require proper
training examples to start generalizing things on their own. Many large
language models are being trained on supervised fine-tuning (SFT) in
which a model gets trained on a large set of handcrafted examples after
being trained on raw data. After a model has trained on SFT, it further
goes into training for reinforcement learning from human feedback where
it learns about human preferences and which responses humans like the
best from the models.
SFT guides a model’s behavior but gathering
data for it is costly and needs a lot of time and effort so now the
developers have applied reinforcement learning approaches in large
language models where they give a model a task, and make it learn about
it without a handcrafted example. One of the biggest examples of this is
DeepSeek-R1 which uses reinforcement learning to learn about complex
reasoning tasks.
During the experiment, the researchers used two tasks, with one being GeneralPoints which is used to access arithmetic reasoning of LLMs. The model is given four cards and is asked to combine them to reach a specific target number. The researchers trained the models on one set of rules and then tested them with a different rule to understand rule-based generalization. They also evaluated LLMs on different colored cards to access their visual generalization.
V-IRL was the second task researchers used to test models' spatial reasoning capabilities which used realistic visual input to test the models. The tests were run on LLama 3.2 Vision-11B and the results showed that reinforcement learning consistently improved performance on examples that were very different from the training data. This shows that RL is better at generalizing than SFT, but initial SFT training is important to achieve desirable results for RL training.