Did you know New Study Shows LLMs are Good At Generalizing on their Own Without Human Input

Did you know New Study Shows LLMs are Good At Generalizing on their Own Without Human Input

 

According to a new study by Hong Kong University and University of California, large language models can generalize things and find better solutions if they are left alone to solve them. This study challenges that belief that large language models require proper training examples to start generalizing things on their own. Many large language models are being trained on supervised fine-tuning (SFT) in which a model gets trained on a large set of handcrafted examples after being trained on raw data. After a model has trained on SFT, it further goes into training for reinforcement learning from human feedback where it learns about human preferences and which responses humans like the best from the models.

SFT guides a model’s behavior but gathering data for it is costly and needs a lot of time and effort so now the developers have applied reinforcement learning approaches in large language models where they give a model a task, and make it learn about it without a handcrafted example. One of the biggest examples of this is DeepSeek-R1 which uses reinforcement learning to learn about complex reasoning tasks.


One of the biggest problems that comes up while training LLMs is overfitting where the LLMs do good on training data but cannot generalize on their own when they are given unseen examples. When a model is being trained, it gives an impression that it has learned the task completely, but it only memorizes it for the training. Complex AI models find it hard to differentiate between memorization and generalization so this new study analyzed RL and SFT training of large language models in textual and visual reasoning tasks.

During the experiment, the researchers used two tasks, with one being GeneralPoints which is used to access arithmetic reasoning of LLMs. The model is given four cards and is asked to combine them to reach a specific target number. The researchers trained the models on one set of rules and then tested them with a different rule to understand rule-based generalization. They also evaluated LLMs on different colored cards to access their visual generalization.

V-IRL was the second task researchers used to test models' spatial reasoning capabilities which used realistic visual input to test the models. The tests were run on LLama 3.2 Vision-11B and the results showed that reinforcement learning consistently improved performance on examples that were very different from the training data. This shows that RL is better at generalizing than SFT, but initial SFT training is important to achieve desirable results for RL training. 


 

 

Mohamed Elarby

A tech blog focused on blogging tips, SEO, social media, mobile gadgets, pc tips, how-to guides and general tips and tricks

Post a Comment

Previous Post Next Post

Post Ads 1

Post Ads 2