<Meta's LLaMa AI: A Transformative Leap in Open Source Technology>
Written on
In a significant development, Meta has unveiled the second iteration of its acclaimed language model, LLaMa, alongside its inaugural chatbot, LLaMa-2-Chat, marking a noteworthy challenge to the reigning champion, ChatGPT.
This release transcends the typical “check out our new LLM” announcements; Meta is genuinely aiming to reshape the AI narrative for good.
This launch holds the potential to permanently transform the AI landscape, ushering in an era of democratized access to AI and knowledge.
LLaMa's success represents a victory for the community, and here's why.
Stay informed about the fast-paced world of AI and get inspired to take action with my free weekly AI newsletter.
Subscribe below to gain a new perspective on AI and harness its transformative potential:
TheTechOasis
#### The newsletter to stay ahead of the curve in AI thetechoasis.beehiiv.com
Discover AI Easily by Joining Me!
# A New Benchmark for Open-source Initially, one thing is crystal clear from the extensive 70-page paper: LLaMa 2 is exceptionally proficient.
Striving for Excellence
Meta has produced four models, with parameters of 7, 13, 34, and 70 billion.
While the largest model is significant, all are considerably smaller than GPT-4, which is speculated to comprise up to eight interconnected 220-billion-parameter models, totaling around a trillion parameters. Even the biggest LLaMa model is more than twice as small as GPT-3.
Interestingly, the parameters haven’t increased from its predecessor, which was already a 65-billion-parameter model.
So, what accounts for its enhanced performance?
The answer is straightforward: Meta's strategy focuses on producing open-source models, prioritizing data over size.
They trained LLaMa with a dataset that is 40% larger and for extended periods, also doubling its context window to accommodate 4,000 tokens (approximately 3,000 words).
In terms of quality, as illustrated below, the LLaMa 2-Chat 70-billion-parameter model outperforms nearly all competitors, slightly surpassing ChatGPT (version 3.5) despite its smaller size.
Compared to other open-source models, it stands out as unequivocally superior.
Of course, it still lags behind GPT-4 (not depicted here), which could be 12-15 times larger, so this isn’t unexpected.
However, what truly sets this research apart is the meticulous detail provided regarding its training methodology.
Cultivating Intelligence
The initial “distinctive” aspect of Meta's approach is its separate optimization for helpfulness and safety, leading to what may be the safest high-performing chatbot currently available.
To illustrate, let’s review the comprehensive process outlined by Meta for developing LLaMa-2-chat.
Training a GenAI chatbot encompasses four phases:
- The base model is trained using a self-supervised approach to predict the subsequent token in a text sequence, masking the next word and prompting the model to forecast it.
- The pre-trained model is fine-tuned with a curated dataset of {prompt, desired answer} pairs, termed ‘behavior cloning’ by OpenAI, where the model learns to respond as desired. This represents the first version of LLaMa-2-Chat.
- The model is then optimized against human preferences, minimizing harmful responses. A copy from step 1 has its word-prediction head removed, enabling it to output a scalar value indicating the quality of a response to a specific prompt, based on human preferences. This is referred to as a Reward Model (RM).
- Finally, LLaMa-2-Chat is trained against this reward model to maximize its score, meaning the chatbot learns to generate responses that yield the highest value according to the RM.
Steps 3 and 4 involve Reinforcement Learning from Human Feedback (RLFH) and are vital in achieving the final LLaMa-2-Chat model.
If you're familiar with LLM training processes, you may notice something peculiar in the image.
The Helpfulness-Safety Balance
In conventional training models, like OpenAI's for ChatGPT, a single reward model is utilized.
However, in Meta's case, LLaMa-2-Chat is built using two reward models:
- A Helpfulness Reward Model
- A Safety Reward Model
This is a groundbreaking approach in AI.
The rationale stems from the challenging interplay between helpfulness and safety.
According to research led by Yuntao Bai, a prominent researcher behind Anthropic's Claude models, it is complex to optimize a model for both helpfulness and safety due to inherent trade-offs.
Building the most helpful model can lead to answering any question, regardless of ethical implications.
Need instructions for creating a bomb? Here you go.
Interested in the simplest way to harm someone? Sure, why not.
Thus, focusing solely on helpfulness is essentially creating a ticking time bomb.
Conversely, if the aim is to create the safest model, it becomes challenging for it to answer many inquiries, as nearly everything today can be scrutinized morally.
For example, while Pi is one of the safest chatbots I've encountered, it often struggles to provide meaningful assistance.
So, what did Meta do?
They ingeniously developed two reward models and employed a dynamic cost function to balance both.
The Meta team annotated the most harmful responses in the dataset, and when the model was trained with that input, the cost function associated with this reward shifted from helpfulness to safety.
In simpler terms, for harmful training examples, the model’s goal was to respond “safely,” while in other cases, it was guided to respond “as helpfully as possible.”
Technical clarification: When I mention optimizing against something, I refer to the process of adjusting a cost function. To train a neural network, a differentiable mathematical expression that measures the model's prediction error is defined. By calculating gradients of the model's parameters, the optimal combination minimizing this cost is achieved, thereby enhancing prediction accuracy.
This approach enabled the model to discern how to best respond to each prompt while considering harmful inputs that should remain unaddressed.
A significant advancement for the open-source community, offering crucial insights into the previously guarded process known as RLHF.
If revolutionizing safety training wasn't sufficient, they unveiled another innovative concept.
GAtt Enhances Memory in Your Model
Attention is a fundamental component in LLMs, facilitating the understanding of word relationships. The more effective this mechanism, the more proficient the model.
However, longer text sequences pose challenges for models to retain initial instructions.
For example, if you ask the model to “act as Napoleon” in the first prompt, it may forget that directive by the 20th interaction.
With Ghost Attention (GAtt), the model has been fine-tuned to emphasize instructions and retain them throughout the conversation:
The GAtt model retains the initial instruction, continuing to provide emoji responses even without explicit requests from the user.
This development is thrilling, as effective instruction following is essential for a useful chatbot, and maintaining those instructions throughout the conversation is a capability that most chatbots currently lack.
GAtt is poised for enduring relevance.
OpenAI recently announced ‘custom instructions’ for ChatGPT, which could be seen as a similar but less persistent feature. However, OpenAI likely implements this as a user interface enhancement, incorporating the instruction into every prompt behind the scenes.
Yet, the most significant announcement came shortly afterward.
# Bridging New Frontiers In a subsequent press release, Meta declared that LLaMa-2-Chat is not only suitable for commercial use but also accessible through Microsoft's Azure cloud.
This is monumental, as enterprise clients can now utilize ChatGPT via Azure and access LLaMa as well.
Importantly, LLaMa 2 is downloadable, allowing clients to install it on private servers, thus mitigating the security risks associated with transmitting data to OpenAI or Anthropic servers.
As a result, LLaMa-2-Chat could emerge as the premier chatbot for enterprise applications, potentially validating Yann LeCun's assertion:
“Open-source will eventually win the AI race.”
Crafting Their Own Legacy
In a daring move, Meta’s introduction of LLaMa 2 and LLaMa-2-chat signifies a pivotal transformation in the field of large language model development.
This release is not merely about launching another advanced product; it represents a bold challenge to tech giants like Microsoft, underscoring Meta’s dedication to democratizing access to knowledge and resources related to model training.
The AI sector now has its first high-performing chatbot, accompanied by a comprehensive 70-page research paper detailing every aspect of its construction.
Thus, beyond merely leveling the playing field, Meta is positioned to redefine it.
By clarifying the intricate RLHF process and introducing the dual reward model approach, Meta is lifting the veil on LLM training while propelling open-source initiatives to unprecedented heights.
What was once the domain of a select few is now accessible to a global audience, with the potential to accelerate the advancement of open-source models to rival the proprietary systems safeguarded by some of the most powerful corporations worldwide.
And you, feeling excited?
Link to the original paper.