A Generalised Exponentiated Gradient Approach to Enhance Fairness in Binary and Multi-class Classification Tasks

cs.LG stat.ML Maryam Boubekraoui, Giordano d'Aloisio, Antinisca Di Marco · Mar 22, 2026
Local to this browser
What it does
While most bias mitigation research targets binary classification, multi-class fairness remains under-explored. This paper proposes Generalised Exponentiated Gradient (GEG), an in-processing method that extends the Exponentiated Gradient...
Why it matters
This paper proposes Generalised Exponentiated Gradient (GEG), an in-processing method that extends the Exponentiated Gradient framework to multi-class settings and enables simultaneous optimization of multiple fairness constraints via...
Main concern
The paper delivers a technically sound extension of the Exponentiated Gradient framework to multi-class classification and combined fairness constraints, supported by extensive experiments. The formulation preserves convexity through...
Community signal
0
0 up · 0 down
Sign in to vote with arrows
AI Review AI reviewed
Plain-language introduction

While most bias mitigation research targets binary classification, multi-class fairness remains under-explored. This paper proposes Generalised Exponentiated Gradient (GEG), an in-processing method that extends the Exponentiated Gradient framework to multi-class settings and enables simultaneous optimization of multiple fairness constraints via positive-label moment conditions. Evaluated on ten datasets against six baselines, GEG achieves fairness improvements up to 92% with moderate accuracy trade-offs, filling a critical gap in fair machine learning toolboxes.

Critical review
Verdict
Bottom line

The paper delivers a technically sound extension of the Exponentiated Gradient framework to multi-class classification and combined fairness constraints, supported by extensive experiments. The formulation preserves convexity through linear moment constraints and saddle-point optimization. However, claims regarding simultaneous improvement of conflicting fairness criteria should be interpreted cautiously given established impossibility results, though the authors appropriately note performance trade-offs. "GEG differs from the original EG approach in two aspects: first, it can mitigate bias in both binary and multi-class classification tasks; second, it mitigates bias across multiple fairness constraints simultaneously."

“GEG differs from the original EG approach in two aspects: first, it can mitigate bias in both binary and multi-class classification tasks; second, it mitigates bias across multiple fairness constraints simultaneously.”
Boubekraoui et al. · Section 1.1
What holds up

The theoretical generalization to multi-class settings through positive-label Demographic Parity and Equalized Odds definitions is elegant and maintains compatibility with the reduction-based framework. The empirical rigor is notable: the evaluation spans seven multi-class and three binary datasets against six baselines using multiple metrics, providing robust evidence of GEG's capability to mitigate bias across diverse scenarios. "We conduct an extensive empirical evaluation of GEG against six baselines across seven multi-class and three binary datasets, using four widely adopted effectiveness metrics and three fairness definitions."

“We conduct an extensive empirical evaluation of GEG against six baselines across seven multi-class and three binary datasets, using four widely adopted effectiveness metrics and three fairness definitions.”
Boubekraoui et al. · Abstract
Main concerns

While GEG demonstrates strong fairness improvements, the simultaneous optimization of Demographic Parity and Equalized Odds appears to contradict impossibility theorems, though the authors acknowledge a "systematic trade-off" where combined optimization yields slightly worse individual metrics than specialized versions. The method's reliance on defining a single positive class $y_p \in \{0, 1, \ldots, K\}$ may limit applicability in scenarios without a clear favorable outcome. Additionally, effectiveness drops significantly on highly imbalanced datasets (e.g., 14% accuracy decrease on Crime), suggesting sensitivity to class distribution. "GEG-EO and, partially, GEG-CP tend to provide statistically significantly lower Precision, Recall and, consequently, F1 Score, under specific datasets (namely Crime, Drug, Obesity, and, partially, Wine)."

“GEG-EO and, partially, GEG-CP tend to provide statistically significantly lower Precision, Recall and, consequently, F1 Score, under specific datasets (namely Crime, Drug, Obesity, and, partially, Wine).”
Boubekraoui et al. · Section 5.3
“These results contrast with previous studies highlighting the impossibility of improving different fairness definitions simultaneously [41, 17].”
Boubekraoui et al. · Section 6.1
Evidence and comparison

The results support the claim that GEG outperforms existing multi-class baselines like DEMV and Blackbox in fairness metrics, with improvements up to 92% on Statistical Parity Difference. However, the trade-off is evident: on datasets with high class imbalance or many classes (Crime, Obesity), GEG sacrifices more effectiveness than on balanced datasets (Law, Park). "GEG overcomes existing baselines, with fairness improvements up to 92% and a decrease in accuracy up to 14%." The comparison reveals that pre-processing methods like DEMV sometimes achieve better Pareto optimality on specific metrics, though GEG dominates in 75% of combinations.

“GEG overcomes existing baselines, with fairness improvements up to 92% and a decrease in accuracy up to 14%.”
Boubekraoui et al. · Abstract
Reproducibility

The authors provide comprehensive reproducibility resources: a public GitHub repository, implementation built on the established Fairlearn library, and detailed hyperparameters ($\eta = 10^{-5}$, $\delta = 0.05$). The experimental protocol uses standard 10-fold cross-validation with fixed seeds. However, the paper lacks sensitivity analysis for hyperparameters and does not report computational costs beyond a brief note on training time increase (~10s vs ~2s for baseline). "We release a replication package including a Python implementation of GEG and the results of our empirical evaluation to foster future research."

“We release a replication package including a Python implementation of GEG and the results of our empirical evaluation to foster future research.”
Boubekraoui et al. · Section 1.2
“We implemented GEG in Python 3.9 by extending the EG implementation provided by the Fairlearn Python library [28].”
Boubekraoui et al. · Section 3.5
Abstract

The widespread use of AI and ML models in sensitive areas raises significant concerns about fairness. While the research community has introduced various methods for bias mitigation in binary classification tasks, the issue remains under-explored in multi-class classification settings. To address this limitation, in this paper, we first formulate the problem of fair learning in multi-class classification as a multi-objective problem between effectiveness (i.e., prediction correctness) and multiple linear fairness constraints. Next, we propose a Generalised Exponentiated Gradient (GEG) algorithm to solve this task. GEG is an in-processing algorithm that enhances fairness in binary and multi-class classification settings under multiple fairness definitions. We conduct an extensive empirical evaluation of GEG against six baselines across seven multi-class and three binary datasets, using four widely adopted effectiveness metrics and three fairness definitions. GEG overcomes existing baselines, with fairness improvements up to 92% and a decrease in accuracy up to 14%.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.