Choosing Between LoRA/QLoRA for Fine-Tuning LLM
When exploring how to fine-tune large language models (LLMs), I came across two prevalent frameworks: "LoRA" and "QLoRA". Both seem to be at the forefront of current methodologies, compelling me to delve deeper into their functionalities and implications for my project. As a newcomer to the realm of LLMs, the journey to understand these frameworks has been anything but straightforward.LoRA, introduced in a study available at https://arxiv.org/abs/2106.09685, and QLoRA, detailed at https://arxiv.org/abs/2305.14314, each propose unique approaches to model fine-tuning. In an effort to discern which framework might better serve my needs, I referred to a guidance page provided by Google Cloud, which discusses the trade-offs between LoRA and QLoRA. This resource, accessible at https://cloud.google.com/vertex-ai/generative-ai/docs/model-garden/lora-qlora?hl=ko, offers insights into how each framework performs across various metrics.According to the guidance, the choice between LoRA and QLoRA hinges on specific needs, summarized as follows:GPU Memory Efficiency: QLoRA is recommended for better utilization of GPU memory.Speed: LoRA offers superior speed during the training process.Cost Efficiency: LoRA is more cost-effective, likely due to its speed advantage.Higher Max Sequence Length: QLoRA supports longer sequence lengths, beneficial for tasks requiring extensive context.Accuracy Improvement: Both frameworks offer similar improvements in accuracy.Higher Batch Size: QLoRA accommodates larger batch sizes, which can enhance training efficiency.Additionally, the guidance notes a practical consideration: the 7B model variant (specifically openLLaMA-7b, not Gemma-7b) with a batch size of 1 fails to train on L4/V100 GPUs, whereas the A100 GPU supports a batch size of 2.Ultimately, the choice between LoRA and QLoRA should align with one's specific project requirements. For those seeking further insights, the following resources provide implementation examples and additional context:Thanks to:An article detailing how to fine-tune Gemma using QLoRA: https://medium.com/@samvardhan777/fine-tune-gemma-using-qlora-️-6b2f2e76dc55.An exploration of the differences between QLoRA and LoRA for fine-tuning LLMs: https://medium.com/@sujathamudadla1213/difference-between-qlora-and-lora-for-fine-tuning-llms-0ea35a195535.