Exploring VeRA: A Revolutionary Approach to LoRA Efficiency
Written on
Chapter 1: Introduction to LoRA and Its Innovations
LoRA (Low-Rank Adaptation) was introduced in 2022 to enhance the efficiency of model fine-tuning. By integrating small tensors atop the base model, it allows for the training of only those tensors while keeping the model's parameters fixed. This approach drastically minimizes the number of trainable parameters compared to traditional fine-tuning methods.
For example, with the Llama 2 7B model, LoRA typically manages to train between 4 and 50 million parameters, compared to the staggering 7 billion required for standard fine-tuning. Additionally, LoRA can be employed to fine-tune quantized models, such as with QLoRA:
This tutorial demonstrates how to fine-tune the Llama 2 model on your own machine using QLoRA, illustrating its practical applications.
Section 1.1: Challenges with LoRA Adapters
When fine-tuning LoRA adapters using QLoRA, suboptimal performance can arise if a naive merging approach is employed. This highlights the need for a more sophisticated strategy in managing LoRA's parameters.
In this video, learn how to train a LoRA model with just one image, showcasing the flexibility and efficiency of the method.
Subsection 1.1.1: The Impact of Rank on Trainable Parameters
While LoRA significantly reduces the number of trainable parameters, the number can still escalate based on the rank of the tensors (denoted as r) and the total number of target modules. To achieve optimal performance, targeting all model modules with a rank greater than 64 may require training several hundred million parameters, which can be counterproductive.
Section 1.2: Introducing VeRA
This week marks the introduction of VeRA (Vector-based Random Matrix Adaptation) as a solution to further decrease the number of trainable parameters associated with LoRA.
VeRA operates by adding trainable vectors on top of the frozen low-rank tensors used in LoRA. Notably, VeRA typically requires training 10 times fewer parameters than the original LoRA framework.
But what about the initial low-rank tensors, labeled A and B in the illustrations? These tensors are randomly initialized and subsequently frozen. Although they may appear redundant, they play a crucial role in the training process. Previous studies have demonstrated that even random tensors can significantly contribute to model fine-tuning.
The authors of the VeRA paper conclude that:
"Collectively, these works create a compelling case for the utilization of frozen random matrices in finetuning methods, providing both a theoretical and an empirical foundation for the approach taken in this paper."
As of now, the authors have not yet released their implementation.
This article is adapted from The Weekly Kaitchup, my newsletter dedicated to providing insights, analyses, and tutorials on the latest developments in AI. To stay updated with news and tips on fine-tuning large language models, subscribe to The Kaitchup:
The Kaitchup - AI on a Budget | Benjamin Marie, PhD | Substack
Weekly news, tips, and tutorials on fine-tuning, running, and serving large language models on your computer.