Did you know MetaAI Researchers Have Suggested Scalable Memory Layers for LLMs for Improved Memory
Even though many companies are integrating LLMs into their systems, the
biggest challenges they face are factual knowledge and hallucinations.
LLMs tend to hallucinate often and sometimes make up information that
can have major consequences. Now a new research by MetaAI has suggested that
this problem in LLMs can be increased through scalable memory layers.
In simple words, scalable memory layers add more parameters to LLMs and
there is no need for extra compute resources. This enhances the learning
capacity of LLMs, especially in applications where there is room for
extra memory for factual knowledge.
In traditional large
language models, dense layers are for memorising information. In dense
layers, LLMs have to learn large amounts of information in small
parameters which makes all parameters full. During interference, all
parameters get activated at the same time. Even though dense layers can
grow larger when they learn more, it requires additional compute
resources as well as energy resources. In the case of factual knowledge,
simple layers are more than effective and memory layers can do it
perfectly. Simple mechanisms are used to encode and retrieve knowledge
and information in memory layers, and they can take more memory than
dense layers too.
Memory layers have been existing for a long time but they do not get
used in modern technology and architectures because they are not
optimized for hardware accelerators which are currently being used.
Nowadays, LLMs are using a mixture of expert (MoE) architecture which is
similar to memory layers with many smaller expert components. Google
DeepMind has recently developed an architecture known as PEER which
extends more control of MoE parameters to experts. When there is
interference time, a mechanism through these architectures determine
which specific task has to be performed.
As memory layers have
capacity for more memory and do not need much computation, there are
also challenges in hardware and software of LLMs too. In the paper by
MetAI, the researchers have proposed some ideas that can help in making
memory layers possible in latest LLMs. The researchers said that memory
layers can get distributed in a parallel way across GPUs. They also
proposed that CUDA kernel can be implemented to handle high memory
operations. These two modifications can make memory layers possible in
modern LLMs.
The researchers also tested memory layers in Llama models for common sense world knowledge, factual question answering, scientific knowledge and coding. The results showed that memory improved a lot in models as compared to models which were using more compute. The researchers also found that results of memory layers also remained consistent across all parameters (134 million to 8 billion parameters).
