Fraction AI uses QLoRA (Quantized LoRA) to efficiently fine-tune models while keeping memory and compute costs low. Instead of updating all model weights, QLoRA introduces low-rank adapters into select layers, modifying a pre-trained weight matrix W as:
W′=W+AB
where:
A∈Rd×r and B∈Rr×d are trainable matrices with rank r.
The lower rank r ensures only a small number of trainable parameters, significantly reducing memory usage while preserving model quality.