peft

2025-12-10 0 330

PEFT

State-of-the-art Parameter-Efficient Fine-Tuning (PEFT) methods

Fine-tuning large pretrained models is often prohibitively costly due to their scale. Parameter-Efficient Fine-Tuning (PEFT) methods enable efficient adaptation of large pretrained models to various downstream applications by only fine-tuning a small number of (extra) model parameters instead of all the model\’s parameters. This significantly decreases the computational and storage costs. Recent state-of-the-art PEFT techniques achieve performance comparable to fully fine-tuned models.

PEFT is integrated with Transformers for easy model training and inference, Diffusers for conveniently managing different adapters, and Accelerate for distributed training and inference for really big models.

Tip

Visit the PEFT organization to read about the PEFT methods implemented in the library and to see notebooks demonstrating how to apply these methods to a variety of downstream tasks. Click the \”Watch repos\” button on the organization page to be notified of newly implemented methods and notebooks!

Check the PEFT Adapters API Reference section for a list of supported PEFT methods, and read the Adapters, Soft prompts, and IA3 conceptual guides to learn more about how these methods work.

Quickstart

Install PEFT from pip:

pip install peft

Prepare a model for training with a PEFT method such as LoRA by wrapping the base model and PEFT configuration with get_peft_model. For the bigscience/mt0-large model, you\’re only training 0.19% of the parameters!

from transformers import AutoModelForCausalLMfrom peft import LoraConfig, TaskType, get_peft_modeldevice = \"cuda\"model_id = \"Qwen/Qwen2.5-3B-Instruct\"model = AutoModelForCausalLM.from_pretrained(model_id, device_map=device)peft_config = LoraConfig(r=16,lora_alpha=32,task_type=TaskType.CAUSAL_LM,# target_modules=[\"q_proj\", \"v_proj\", ...]  # optionally indicate target modules)model = get_peft_model(model, peft_config)model.print_trainable_parameters()# prints: trainable params: 3,686,400 || all params: 3,089,625,088 || trainable%: 0.1193# now perform training on your dataset, e.g. using transformers Trainer, then save the modelmodel.save_pretrained(\"qwen2.5-3b-lora\")

To load a PEFT model for inference:

from transformers import AutoModelForCausalLM, AutoTokenizerfrom peft import PeftModeldevice = \"cuda\"model_id = \"Qwen/Qwen2.5-3B-Instruct\"tokenizer = AutoTokenizer.from_pretrained(model_id)model = AutoModelForCausalLM.from_pretrained(model_id, device_map=device)model = PeftModel.from_pretrained(model, \"qwen2.5-3b-lora\")inputs = tokenizer(\"Preheat the oven to 350 degrees and place the cookie dough\", return_tensors=\"pt\")outputs = model.generate(**inputs.to(device), max_new_tokens=50)print(tokenizer.decode(outputs[0], skip_special_tokens=True))# prints something like: Preheat the oven to 350 degrees and place the cookie dough in a baking dish [...]

Why you should use PEFT

There are many benefits of using PEFT but the main one is the huge savings in compute and storage, making PEFT applicable to many different use cases.

High performance on consumer hardware

Consider the memory requirements for training the following models on the ought/raft/twitter_complaints dataset with an A100 80GB GPU with more than 64GB of CPU RAM.

Model Full Finetuning PEFT-LoRA PyTorch PEFT-LoRA DeepSpeed with CPU Offloading
bigscience/T0_3B (3B params) 47.14GB GPU / 2.96GB CPU 14.4GB GPU / 2.96GB CPU 9.8GB GPU / 17.8GB CPU
bigscience/mt0-xxl (12B params) OOM GPU 56GB GPU / 3GB CPU 22GB GPU / 52GB CPU
bigscience/bloomz-7b1 (7B params) OOM GPU 32GB GPU / 3.8GB CPU 18.1GB GPU / 35GB CPU

With LoRA you can fully finetune a 12B parameter model that would\’ve otherwise run out of memory on the 80GB GPU, and comfortably fit and train a 3B parameter model. When you look at the 3B parameter model\’s performance, it is comparable to a fully finetuned model at a fraction of the GPU memory.

Submission Name Accuracy
Human baseline (crowdsourced) 0.897
Flan-T5 0.892
lora-t0-3b 0.863

Tip

The bigscience/T0_3B model performance isn\’t optimized in the table above. You can squeeze even more performance out of it by playing around with the input instruction templates, LoRA hyperparameters, and other training related hyperparameters. The final checkpoint size of this model is just 19MB compared to 11GB of the full bigscience/T0_3B model. Learn more about the advantages of finetuning with PEFT in this blog post.

Quantization

Quantization is another method for reducing the memory requirements of a model by representing the data in a lower precision. It can be combined with PEFT methods to make it even easier to train and load LLMs for inference.

  • Learn how to finetune meta-llama/Llama-2-7b-hf with QLoRA and the TRL library on a 16GB GPU in the Finetune LLMs on your own consumer hardware using tools from PyTorch and Hugging Face ecosystem blog post.

  • Learn how to finetune a openai/whisper-large-v2 model for multilingual automatic speech recognition with LoRA and 8-bit quantization in this notebook (see this notebook instead for an example of streaming a dataset).

Save compute and storage

PEFT can help you save storage by avoiding full finetuning of models on each of downstream task or dataset. In many cases, you\’re only finetuning a very small fraction of a model\’s parameters and each checkpoint is only a few MBs in size (instead of GBs). These smaller PEFT adapters demonstrate performance comparable to a fully finetuned model. If you have many datasets, you can save a lot of storage with a PEFT model and not have to worry about catastrophic forgetting or overfitting the backbone or base model.

PEFT integrations

PEFT is widely supported across the Hugging Face ecosystem because of the massive efficiency it brings to training and inference.

Diffusers

The iterative diffusion process consumes a lot of memory which can make it difficult to train. PEFT can help reduce the memory requirements and reduce the storage size of the final model checkpoint. For example, consider the memory required for training a Stable Diffusion model with LoRA on an A100 80GB GPU with more than 64GB of CPU RAM. The final model checkpoint size is only 8.8MB!

Model Full Finetuning PEFT-LoRA PEFT-LoRA with Gradient Checkpointing
CompVis/stable-diffusion-v1-4 27.5GB GPU / 3.97GB CPU 15.5GB GPU / 3.84GB CPU 8.12GB GPU / 3.77GB CPU

Tip

Take a look at the examples/lora_dreambooth/train_dreambooth.py training script to try training your own Stable Diffusion model with LoRA, and play around with the smangrul/peft-lora-sd-dreambooth Space which is running on a T4 instance. Learn more about the PEFT integration in Diffusers in this tutorial.

Transformers

PEFT is directly integrated with Transformers. After loading a model, call add_adapter to add a new PEFT adapter to the model:

from peft import LoraConfigmodel = ...  # transformers modelpeft_config = LoraConfig(...)model.add_adapter(lora_config, adapter_name=\"lora_1\")

To load a trained PEFT adapter, call load_adapter:

model = ...  # transformers modelmodel.load_adapter(<path-to-adapter>, adapter_name=\"lora_1\")

And to switch between different adapters, call set_adapter:

model.set_adapter(\"lora_2\")

The Transformers integration doesn\’t include all the functionalities offered in PEFT, such as methods for merging the adapter into the base model.

Accelerate

Accelerate is a library for distributed training and inference on various training setups and hardware (GPUs, TPUs, Apple Silicon, etc.). PEFT models work with Accelerate out of the box, making it really convenient to train really large models or use them for inference on consumer hardware with limited resources.

TRL

PEFT can also be applied to training LLMs with RLHF components such as the ranker and policy. Get started by reading:

  • Fine-tune a Mistral-7b model with Direct Preference Optimization with PEFT and the TRL library to learn more about the Direct Preference Optimization (DPO) method and how to apply it to a LLM.

  • Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU with PEFT and the TRL library, and then try out the gpt2-sentiment_peft.ipynb notebook to optimize GPT2 to generate positive movie reviews.

  • StackLLaMA: A hands-on guide to train LLaMA with RLHF with PEFT, and then try out the stack_llama/scripts for supervised finetuning, reward modeling, and RL finetuning.

Model support

Use this Space or check out the docs to find which models officially support a PEFT method out of the box. Even if you don\’t see a model listed below, you can manually configure the model config to enable PEFT for a model. Read the New transformers architecture guide to learn how.

Contribute

If you would like to contribute to PEFT, please check out our contribution guide.

Citing ? PEFT

To use ? PEFT in your publication, please cite it by using the following BibTeX entry.

@Misc{peft,  title =        {{PEFT}: State-of-the-art Parameter-Efficient Fine-Tuning methods},  author =       {Sourab Mangrulkar and Sylvain Gugger and Lysandre Debut and Younes Belkada and Sayak Paul and Benjamin Bossan},  howpublished = {url{https://*g*ithu*b.com/huggingface/peft}},  year =         {2022}}

下载源码

通过命令行克隆项目:

git clone https://github.com/huggingface/peft.git

收藏 (0) 打赏

感谢您的支持,我会继续努力的!

打开微信/支付宝扫一扫,即可进行扫码打赏哦,分享从这里开始,精彩与您同在
点赞 (0)

申明:本文由第三方发布,内容仅代表作者观点,与本网站无关。对本文以及其中全部或者部分内容的真实性、完整性、及时性本站不作任何保证或承诺,请读者仅作参考,并请自行核实相关内容。本网发布或转载文章出于传递更多信息之目的,并不意味着赞同其观点或证实其描述,也不代表本网对其真实性负责。

左子网 编程相关 peft https://www.zuozi.net/33094.html

techniques
上一篇: techniques
stable diffusion webui
下一篇: stable diffusion webui
常见问题
  • 1、自动:拍下后,点击(下载)链接即可下载;2、手动:拍下后,联系卖家发放即可或者联系官方找开发者发货。
查看详情
  • 1、源码默认交易周期:手动发货商品为1-3天,并且用户付款金额将会进入平台担保直到交易完成或者3-7天即可发放,如遇纠纷无限期延长收款金额直至纠纷解决或者退款!;
查看详情
  • 1、描述:源码描述(含标题)与实际源码不一致的(例:货不对板); 2、演示:有演示站时,与实际源码小于95%一致的(但描述中有”不保证完全一样、有变化的可能性”类似显著声明的除外); 3、发货:不发货可无理由退款; 4、安装:免费提供安装服务的源码但卖家不履行的; 5、收费:价格虚标,额外收取其他费用的(但描述中有显著声明或双方交易前有商定的除外); 6、其他:如质量方面的硬性常规问题BUG等。 注:经核实符合上述任一,均支持退款,但卖家予以积极解决问题则除外。
查看详情
  • 1、左子会对双方交易的过程及交易商品的快照进行永久存档,以确保交易的真实、有效、安全! 2、左子无法对如“永久包更新”、“永久技术支持”等类似交易之后的商家承诺做担保,请买家自行鉴别; 3、在源码同时有网站演示与图片演示,且站演与图演不一致时,默认按图演作为纠纷评判依据(特别声明或有商定除外); 4、在没有”无任何正当退款依据”的前提下,商品写有”一旦售出,概不支持退款”等类似的声明,视为无效声明; 5、在未拍下前,双方在QQ上所商定的交易内容,亦可成为纠纷评判依据(商定与描述冲突时,商定为准); 6、因聊天记录可作为纠纷评判依据,故双方联系时,只与对方在左子上所留的QQ、手机号沟通,以防对方不承认自我承诺。 7、虽然交易产生纠纷的几率很小,但一定要保留如聊天记录、手机短信等这样的重要信息,以防产生纠纷时便于左子介入快速处理。
查看详情

相关文章

猜你喜欢
发表评论
暂无评论
官方客服团队

为您解决烦忧 - 24小时在线 专业服务