LoRA for code fine-tuning
Quick answer
Use
LoRA (Low-Rank Adaptation) to fine-tune large language models on code tasks by injecting trainable low-rank matrices into model weights, drastically reducing training costs and memory. Combine LoRA with 4-bit quantization (QLoRA) for efficient fine-tuning on code datasets using frameworks like transformers and peft.PREREQUISITES
Python 3.8+pip install transformers>=4.30 peft bitsandbytes datasets torchBasic knowledge of PyTorch and Hugging Face Transformers
Setup
Install required libraries for LoRA and QLoRA fine-tuning with code models. Set up environment variables for API keys if using hosted models.
pip install transformers>=4.30 peft bitsandbytes datasets torch Step by step
This example fine-tunes a code generation model using LoRA with 4-bit quantization (QLoRA) on a small code dataset. It shows loading a pretrained model, applying LoRA, preparing data, and training.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model, TaskType
from datasets import load_dataset
# Load pretrained code model (e.g., CodeLlama 7B)
model_name = "code-llama/CodeLlama-7b-hf"
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Load model with 4-bit quantization for efficiency
from transformers import BitsAndBytesConfig
quant_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16)
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=quant_config,
device_map="auto"
)
# Configure LoRA for code fine-tuning
lora_config = LoraConfig(
r=16, # Low-rank dimension
lora_alpha=32, # Scaling factor
target_modules=["q_proj", "v_proj"], # Common for causal LM
lora_dropout=0.05,
task_type=TaskType.CAUSAL_LM
)
# Apply LoRA to the model
model = get_peft_model(model, lora_config)
# Load a small code dataset (e.g., CodeSearchNet subset)
dataset = load_dataset("code_search_net", "python", split="train[:1%]")
# Tokenize function
max_length = 512
def tokenize_function(examples):
return tokenizer(examples["code"], truncation=True, max_length=max_length)
# Prepare dataset
tokenized_dataset = dataset.map(tokenize_function, batched=True)
# Data collator for causal LM
from transformers import DataCollatorForLanguageModeling
collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)
# Trainer setup
from transformers import Trainer, TrainingArguments
training_args = TrainingArguments(
output_dir="./lora-code-finetune",
per_device_train_batch_size=4,
num_train_epochs=1,
logging_steps=10,
save_steps=100,
evaluation_strategy="no",
save_total_limit=1,
fp16=True
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_dataset,
data_collator=collator
)
# Train
trainer.train()
# Save LoRA adapters
model.save_pretrained("./lora-code-finetuned") output
***** Running training ***** Num examples = 1000 Num Epochs = 1 Instantaneous batch size per device = 4 Total train batch size (w. parallel, distributed & accumulation) = 4 Gradient Accumulation steps = 1 Total optimization steps = 250 ... Training completed. Saving model to ./lora-code-finetuned
Common variations
- Use
gpt-4o-miniorclaude-sonnet-4-5for code fine-tuning via API with LoRA adapters. - Apply
QLoRAby combiningBitsAndBytesConfigwithLoraConfigfor memory-efficient training. - Use async training loops or Hugging Face Accelerate for distributed setups.
Troubleshooting
- If you get CUDA out-of-memory errors, reduce batch size or use gradient checkpointing.
- Ensure
bitsandbytesis installed correctly for 4-bit quantization support. - Verify target modules in
LoraConfigmatch your model architecture (e.g.,q_proj,v_projfor causal LMs).
Key Takeaways
- Use
LoRAto fine-tune large code models efficiently by training low-rank adapters instead of full weights. - Combine
LoRAwith 4-bit quantization (QLoRA) to reduce GPU memory usage drastically. - Target
q_projandv_projmodules for causal language models during LoRA fine-tuning. - Use Hugging Face
transformersandpeftlibraries for streamlined LoRA integration. - Adjust batch size and enable mixed precision to avoid out-of-memory errors during fine-tuning.