Not All Layers Are Created Equal: Adaptive LoRA Ranks for Personalized Image Generation

cs.CV cs.AI cs.LG Donald Shenaj, Federico Errica, Antonio Carta · Mar 23, 2026
Local to this browser
What it does
Personalized image generation with diffusion models relies on Low-Rank Adaptation (LoRA) to fine-tune models efficiently, but current practice uses a fixed rank across all layers regardless of subject complexity. This paper proposes LoRA2,...
Why it matters
This paper proposes LoRA2, which learns adaptive ranks per LoRA component via a variational framework that imposes an importance ordering over rank indices using a discretized exponential distribution. The method achieves better subject...
Main concern
The paper presents a compelling solution to the combinatorial challenge of rank selection in LoRA-based personalization. The variational formulation using learnable parameters $\nu_\ell$ to control effective rank $D_\ell$ via quantile...
Community signal
0
0 up · 0 down
Sign in to vote with arrows
AI Review AI reviewed
Plain-language introduction

Personalized image generation with diffusion models relies on Low-Rank Adaptation (LoRA) to fine-tune models efficiently, but current practice uses a fixed rank across all layers regardless of subject complexity. This paper proposes LoRA2, which learns adaptive ranks per LoRA component via a variational framework that imposes an importance ordering over rank indices using a discretized exponential distribution. The method achieves better subject fidelity and prompt alignment while using significantly less memory than high-rank baselines, addressing the combinatorial explosion of searching $S K^L$ architectural configurations.

Critical review
Verdict
Bottom line

The paper presents a compelling solution to the combinatorial challenge of rank selection in LoRA-based personalization. The variational formulation using learnable parameters $\nu_\ell$ to control effective rank $D_\ell$ via quantile functions is theoretically elegant and practically effective, allowing dynamic parameter allocation where "as $\nu_\ell$ changes, we dynamically recompute the rank of each LoRA component". The evidence across 29 subjects and two backbones (SDXL and KOALA-700m) supports the claim that adaptive ranks outperform fixed-rank heuristics, with LoRA2 achieving comparable DINO and CLIP-I scores to rank-512 LoRA while using only 0.40 GB versus 2.80 GB of parameters.

“as $\nu_{\ell}$ changes, we dynamically recompute the rank of each LoRA component”
paper · Section 3.2
“LoRA2 has similar scores with a much lower memory occupation (0.40 GB for LoRA2 against 2.80 GB for rank 512)”
paper · Section 5.3
What holds up

The central thesis that fixed ranks are suboptimal is empirically well-supported: the per-subject analysis reveals that "rank 64 is never optimal in any of the metrics for SDXL" despite being the community default. The technical implementation is sound, using a diagonal scaling matrix $\Lambda_\ell = diag(f(1;\nu_\ell), \dots, f(D_\ell;\nu_\ell))$ to impose decreasing importance across rank indices. The ablation studies rigorously validate the design choices, demonstrating that rank regularization is essential to prevent uncontrolled growth—without it, models expand to 2.7 GB compared to 406 MB with regularization, confirming that "regularizing both the rank parameters and LoRA weights allows LoRA2 to produce compact models with minimal degradation in generation quality".

“rank 64 is never optimal in any of the metrics for SDXL”
paper · Section 5.3
“removing the rank regularization increases the file size from an average of 406 MB to 2.7 GB”
paper · Section 5.5
“regularizing both the rank parameters and LoRA weights allows LoRA2 to produce compact models with minimal degradation in generation quality”
paper · Section 5.5
Main concerns

The method introduces several hyperparameters ($\lambda_r$, $\lambda_e$, $r_{target}$, quantile $q$) that require tuning, yet the paper uses only one subject (vase) for hyperparameter selection before testing on 29 subjects, raising questions about sensitivity and generalization of these choices. While motivated by variational inference, the final loss $\mathcal{L}_{\mathrm{total}}=\mathcal{L}_{\mathrm{MSE}}+\lambda_{r}\mathcal{L}_{\mathrm{reg}}+\lambda_{e}\mathcal{L}_{\mathrm{entropy}}$ combines the MSE objective with L1 regularization on ranks and an entropy term on attention maps; the entropy term appears somewhat ad-hoc rather than derived from the variational framework, and the interaction between these terms is not deeply analyzed. The paper acknowledges but does not resolve a significant practical limitation: "LoRA2 produces LoRA adapters of different ranks across subjects," complicating model merging since "the lower-rank LoRA must be expanded to match the rank of the larger one prior to merging."

“We select one random subject (vase) for hyper-parameter tuning”
paper · Section 4
“$\mathcal{L}_{\mathrm{total}}=\mathcal{L}_{\mathrm{MSE}}+\lambda_{r}\mathcal{L}_{\mathrm{reg}}+\lambda_{e}\mathcal{L}_{\mathrm{entropy}}$”
paper · Equation 14
“LoRA2 produces LoRA adapters of different ranks across subjects”
paper · Section A6 (Limitations)
Evidence and comparison

The comparison against fixed-rank baselines (ranks 8–512) is thorough across two backbones, though the explicit exclusion of other adaptive methods like AdaLoRA, DoRA, or ARD-LoRA from quantitative evaluation weakens the evidence—the authors claim these "do not trivially transfer to computer vision models" without empirical validation. The qualitative results effectively illustrate LoRA2's superior preservation of fine details (e.g., the numeral "3" on a clock face that rank-512 LoRA fails to render). However, the evaluation relies solely on automated metrics (DINO, CLIP-I, CLIP-T); human evaluation or user studies are absent despite the subjective nature of personalization quality, and the analysis of per-layer rank distributions, while insightful, is limited to self-attention and cross-attention modules without exploring the full UNet architecture.

“To the best of our knowledge, the effectiveness of adaptive LoRA has not been validated for personalized diffusion models, possibly because these techniques do not trivially transfer to computer vision models”
paper · Section 2.3
“the numeral "3" on the clock face is preserved exclusively in our result; rank 512 fails to render it”
paper · Section 5.1
Reproducibility

The authors provide code at a public GitHub repository and detail the DreamBooth protocol, optimizer settings (Adam with learning rate $5 \times 10^{-5}$), and training steps (500 for SDXL, 800 for KOALA). However, critical hyperparameters for the rank regularization—specifically the target rank $r_{target}$ and quantile $q$ used to compute $\nu_{\mathrm{target}} = -\frac{\log(1-q)}{r_{\mathrm{target}}}$—are not explicitly stated in the main paper, making exact reproduction of the reported 0.40 GB model size difficult. The dynamic rank computation during training (recomputing $D_\ell$ based on updated $\nu_\ell$) introduces non-determinism in parameter counts that may lead to slight variations across runs. While the supplementary material contains prompts and per-subject splits, the single-subject hyperparameter tuning protocol (using "vase") provides limited guidance for practitioners applying the method to new subjects.

“$\nu_{\mathrm{target}}=-\frac{\log(1-q)}{r_{\mathrm{target}}}$”
paper · Equation 12
“hyper-parameter tuning process selected 500 training steps for SDXL and 800 steps for KOALA”
paper · Section 4
“fixed weights $\lambda_{r}=\lambda_{e}=1e^{-4}$”
paper · Section 4
Abstract

Low Rank Adaptation (LoRA) is the de facto fine-tuning strategy to generate personalized images from pre-trained diffusion models. Choosing a good rank is extremely critical, since it trades off performance and memory consumption, but today the decision is often left to the community's consensus, regardless of the personalized subject's complexity. The reason is evident: the cost of selecting a good rank for each LoRA component is combinatorial, so we opt for practical shortcuts such as fixing the same rank for all components. In this paper, we take a first step to overcome this challenge. Inspired by variational methods that learn an adaptive width of neural networks, we let the ranks of each layer freely adapt during fine-tuning on a subject. We achieve it by imposing an ordering of importance on the rank's positions, effectively encouraging the creation of higher ranks when strictly needed. Qualitatively and quantitatively, our approach, LoRA$^2$, achieves a competitive trade-off between DINO, CLIP-I, and CLIP-T across 29 subjects while requiring much less memory and lower rank than high rank LoRA versions. Code: https://github.com/donaldssh/NotAllLayersAreCreatedEqual.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.