Did you know Study Reveals ChatGPT-4's Remarkable 'Theory of Mind' Abilities, Outperforming Previous Models
A new study published in Proceedings of the National Academy of Sciences reveals
that many large language models (LLMs) like ChatGPT are showing “theory
of mind” abilities which are seen in humans. While testing ChatGPT-4,
the researchers found that it can perform 75% of the tasks that a six
year old can too. This shows that LLMs are showing improvement in their
reasoning abilities. Theory of mind refers to the ability of humans to
understand beliefs, emotions and mental states of other people, and then
they interact with them on the basis of that. In humans, this ability
is developed in their early childhood and continues to develop
throughout their lives.
The researcher, Michal Kosinski, said
that LLMs can predict preferences of users based on what websites they
visits, what products they purchases, their music choices and other
behavioral data. While predicting the behaviors, it is also important to
know the psychological processes of the individuals. For the study on
LLMs, the researcher used a false-belief task, a psychological test, to
understand the ability of LLMs to predict responses.
Two types of
tasks, the Unexpected Contents task and the Unexpected Transfer task,
were used for the false-belief test. In the Unexpected Contents task, a
subject sees an object with a misleading title and assumes the
misleading title to be accurate. In an Unexpected Transfer task, an
object gets moved without the subject knowing and the subject searches
for the object in the same place. The LLMs tested had to predict and
conclude what a human would do if he encountered these two situations.
Kosinski evaluated 11 LLMs and created 40 false beliefs to test them.
Each false-belief scenario targeted the model's comprehension and
understanding of the real world.
The results of the tests showed that GPT-1 and GPT-2 weren't able to solve false-belief tasks, concluding that earlier models of ChatGPT don't have ability to do so. On the other hand, 20% of the tasks were performed accurately by ChatGPT-3 which is equivalent to tasks performed by a three year old. The LLM with the best performance was ChatGPT-4 which was able to complete 75% of the tasks accurately. It predicted 90% of Unexpected Contents tasks while 60% of Unexpected Transfer tasks. The results also showed that ChatGPT-4 was able to adjust its predictions based on context and reasoning instead of simple patterns.