Not All Layers Are Created Equal: Adaptive LoRA Ranks for Personalized Image Generation
Personalized image generation with diffusion models relies on Low-Rank Adaptation (LoRA) to fine-tune models efficiently, but current practice uses a fixed rank across all layers regardless of subject complexity. This paper proposes LoRA2, which learns adaptive ranks per LoRA component via a variational framework that imposes an importance ordering over rank indices using a discretized exponential distribution. The method achieves better subject fidelity and prompt alignment while using significantly less memory than high-rank baselines, addressing the combinatorial explosion of searching $S K^L$ architectural configurations.
The paper presents a compelling solution to the combinatorial challenge of rank selection in LoRA-based personalization. The variational formulation using learnable parameters $\nu_\ell$ to control effective rank $D_\ell$ via quantile functions is theoretically elegant and practically effective, allowing dynamic parameter allocation where "as $\nu_\ell$ changes, we dynamically recompute the rank of each LoRA component". The evidence across 29 subjects and two backbones (SDXL and KOALA-700m) supports the claim that adaptive ranks outperform fixed-rank heuristics, with LoRA2 achieving comparable DINO and CLIP-I scores to rank-512 LoRA while using only 0.40 GB versus 2.80 GB of parameters.
The central thesis that fixed ranks are suboptimal is empirically well-supported: the per-subject analysis reveals that "rank 64 is never optimal in any of the metrics for SDXL" despite being the community default. The technical implementation is sound, using a diagonal scaling matrix $\Lambda_\ell = diag(f(1;\nu_\ell), \dots, f(D_\ell;\nu_\ell))$ to impose decreasing importance across rank indices. The ablation studies rigorously validate the design choices, demonstrating that rank regularization is essential to prevent uncontrolled growth—without it, models expand to 2.7 GB compared to 406 MB with regularization, confirming that "regularizing both the rank parameters and LoRA weights allows LoRA2 to produce compact models with minimal degradation in generation quality".
The method introduces several hyperparameters ($\lambda_r$, $\lambda_e$, $r_{target}$, quantile $q$) that require tuning, yet the paper uses only one subject (vase) for hyperparameter selection before testing on 29 subjects, raising questions about sensitivity and generalization of these choices. While motivated by variational inference, the final loss $\mathcal{L}_{\mathrm{total}}=\mathcal{L}_{\mathrm{MSE}}+\lambda_{r}\mathcal{L}_{\mathrm{reg}}+\lambda_{e}\mathcal{L}_{\mathrm{entropy}}$ combines the MSE objective with L1 regularization on ranks and an entropy term on attention maps; the entropy term appears somewhat ad-hoc rather than derived from the variational framework, and the interaction between these terms is not deeply analyzed. The paper acknowledges but does not resolve a significant practical limitation: "LoRA2 produces LoRA adapters of different ranks across subjects," complicating model merging since "the lower-rank LoRA must be expanded to match the rank of the larger one prior to merging."
The comparison against fixed-rank baselines (ranks 8–512) is thorough across two backbones, though the explicit exclusion of other adaptive methods like AdaLoRA, DoRA, or ARD-LoRA from quantitative evaluation weakens the evidence—the authors claim these "do not trivially transfer to computer vision models" without empirical validation. The qualitative results effectively illustrate LoRA2's superior preservation of fine details (e.g., the numeral "3" on a clock face that rank-512 LoRA fails to render). However, the evaluation relies solely on automated metrics (DINO, CLIP-I, CLIP-T); human evaluation or user studies are absent despite the subjective nature of personalization quality, and the analysis of per-layer rank distributions, while insightful, is limited to self-attention and cross-attention modules without exploring the full UNet architecture.
The authors provide code at a public GitHub repository and detail the DreamBooth protocol, optimizer settings (Adam with learning rate $5 \times 10^{-5}$), and training steps (500 for SDXL, 800 for KOALA). However, critical hyperparameters for the rank regularization—specifically the target rank $r_{target}$ and quantile $q$ used to compute $\nu_{\mathrm{target}} = -\frac{\log(1-q)}{r_{\mathrm{target}}}$—are not explicitly stated in the main paper, making exact reproduction of the reported 0.40 GB model size difficult. The dynamic rank computation during training (recomputing $D_\ell$ based on updated $\nu_\ell$) introduces non-determinism in parameter counts that may lead to slight variations across runs. While the supplementary material contains prompts and per-subject splits, the single-subject hyperparameter tuning protocol (using "vase") provides limited guidance for practitioners applying the method to new subjects.
Low Rank Adaptation (LoRA) is the de facto fine-tuning strategy to generate personalized images from pre-trained diffusion models. Choosing a good rank is extremely critical, since it trades off performance and memory consumption, but today the decision is often left to the community's consensus, regardless of the personalized subject's complexity. The reason is evident: the cost of selecting a good rank for each LoRA component is combinatorial, so we opt for practical shortcuts such as fixing the same rank for all components. In this paper, we take a first step to overcome this challenge. Inspired by variational methods that learn an adaptive width of neural networks, we let the ranks of each layer freely adapt during fine-tuning on a subject. We achieve it by imposing an ordering of importance on the rank's positions, effectively encouraging the creation of higher ranks when strictly needed. Qualitatively and quantitatively, our approach, LoRA$^2$, achieves a competitive trade-off between DINO, CLIP-I, and CLIP-T across 29 subjects while requiring much less memory and lower rank than high rank LoRA versions. Code: https://github.com/donaldssh/NotAllLayersAreCreatedEqual.
Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.
No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.