RuntimeError
lora_qlora.adapters.RuntimeError: LoRA inference adapter not loaded
Stack trace
Traceback (most recent call last):
File "inference.py", line 42, in <module>
output = model.generate(input_ids)
File "lora_qlora/model.py", line 88, in generate
raise RuntimeError("LoRA inference adapter not loaded")
RuntimeError: LoRA inference adapter not loaded Why it happens
The LoRA or QLoRA inference adapter must be explicitly loaded and attached to the base model before inference. This error happens if the adapter loading step is skipped or fails, so the model lacks the LoRA weights needed for fine-tuned inference.
Detection
Check if the adapter loading function was called and completed successfully before inference. Add assertions or logs to confirm the adapter is attached to the model instance.
Causes & fixes
The LoRA adapter was never loaded or attached to the base model before inference.
Call the adapter loading method (e.g., `load_adapter()`) with the correct adapter path before running inference.
The adapter loading path or filename is incorrect or missing, causing load failure.
Verify the adapter file path is correct and accessible, and pass the exact path to the adapter loader.
The model instance used for inference is not the one with the loaded adapter attached.
Ensure the same model object with the loaded adapter is used for inference, not a fresh or base model instance.
Code: broken vs fixed
from lora_qlora import Model
model = Model(base_model_path="base-model")
# Missing adapter loading step here
output = model.generate(input_ids) # RuntimeError: LoRA inference adapter not loaded import os
from lora_qlora import Model
model = Model(base_model_path=os.environ["BASE_MODEL_PATH"])
model.load_adapter(adapter_path=os.environ["LORA_ADAPTER_PATH"]) # Adapter loaded here
output = model.generate(input_ids)
print("Inference succeeded with LoRA adapter loaded.") Workaround
Wrap the inference call in try/except RuntimeError, and if the adapter is not loaded, load it dynamically at runtime before retrying inference.
Prevention
Design your model loading pipeline to always load and attach the LoRA adapter immediately after loading the base model, and validate adapter presence before inference.