Leaderboard | Paper | Github | DomainBed
Spawrious 🐾 is a challenging Out-Of-Distribution image classification benchmark including spurious correlations. It consists of six separate challenges split into two types: one-to-one and many-to-many spurious correlation challenges.
One-to-One
Many-to-Many
One-to-One
Method | Easy | Medium | Hard | Average |
---|---|---|---|---|
ERM | \(72.15\%_{\pm 0.03}\) | \(69.85\%_{\pm 0.01}\) | \(65.65\%_{\pm 0.04}\) | \(69.22\%\) |
GroupDRO | \(68.72\%_{\pm 0.02}\) | \(71.87\%_{\pm 0.01}\) | \(60.90\%_{\pm 0.03}\) | \(67.16\%\) |
IRM | \(71.26\%_{\pm 0.02}\) | \(68.18\%_{\pm 0.02}\) | \(63.78\%_{\pm 0.03}\) | \(67.74\%\) |
CORAL | \(83.85\%_{\pm 0.01}\) | \(73.96\%_{\pm 0.01}\) | \(72.18\%_{\pm 0.03}\) | \(\boldsymbol{76.66}\%\) |
CausIRL | \(\boldsymbol{84.21\%_{\pm 0.01}}\) | \(73.45\%_{\pm 0.02}\) | \(71.20\%_{\pm 0.02}\) | \(76.29\%\) |
MMD-AAE | \(82.92\%_{\pm 0.03}\) | \(\boldsymbol{74.09\%_{\pm 0.03}}\) | \(\boldsymbol{72.60\%_{\pm 0.05}}\) | \(76.54\%\) |
Many-to-Many
Method | Easy | Medium | Hard | Average |
---|---|---|---|---|
ERM | \(72.51\%_{\pm 0.05}\) | \(51.36\%_{\pm 0.04}\) | \(47.02\%_{\pm 0.01}\) | \(56.96\%\) |
GroupDRO | \(74.82\%_{\pm 0.04}\) | \(52.06\%_{\pm 0.03}\) | \(52.79\%_{\pm 0.03}\) | \(59.89\%\) |
IRM | \(73.28\%_{\pm 0.04}\) | \(42.43\%_{\pm 0.07}\) | \(44.51\%_{\pm 0.06}\) | \(53.41\%\) |
CORAL | \(79.91\%_{\pm 0.00}\) | \(58.09\%_{\pm 0.01}\) | \(56.51\%_{\pm 0.03}\) | \(64.84\%\) |
CausIRL | \(81.21\%_{\pm 0.01}\) | \(56.79\%_{\pm 0.01}\) | \(56.31\%_{\pm 0.03}\) | \(64.77\%\) |
MMD-AAE | \(\boldsymbol{83.45\%_{\pm 0.01}}\) | \(\boldsymbol{60.27\%_{\pm 0.03}}\) | \(\boldsymbol{58.26\%_{\pm 0.00}}\) | \(\boldsymbol{67.33\%}\) |
One-to-One
Method | Easy | Medium | Hard | Average |
---|---|---|---|---|
ERM | \(77.49\%_{\pm 0.05}\) | \(76.60\%_{\pm 0.02}\) | \(71.32\%_{\pm 0.09}\) | \(75.14\%\) |
GroupDRO | \(80.58\%_{\pm 0.01}\) | \(75.96\%_{\pm 0.02}\) | \(76.99\%_{\pm 0.03}\) | \(77.84\%\) |
IRM | \(75.45\%_{\pm 0.03}\) | \(76.39\%_{\pm 0.02}\) | \(74.90\%_{\pm 0.01}\) | \(75.58\%\) |
CORAL | \(\boldsymbol{89.66\%_{\pm 0.01}}\) | \(\boldsymbol{81.05\%_{\pm 0.01}}\) | \(79.65\%_{\pm 0.02}\) | \(\boldsymbol{83.45}\%\) |
CausIRL | \(89.32\%_{\pm 0.01}\) | \(78.64\%_{\pm 0.01}\) | \(\boldsymbol{80.40\%_{\pm 0.01}}\) | \(82.79\%\) |
MMD-AAE | \(78.81\%_{\pm 0.02}\) | \(75.33\%_{\pm 0.03}\) | \(72.66\%_{\pm 0.01}\) | \(75.60\%\) |
Many-to-Many
Method | Easy | Medium | Hard | Average |
---|---|---|---|---|
ERM | \(83.80\%_{\pm 0.01}\) | \(53.05\%_{\pm 0.03}\) | \(58.70\%_{\pm 0.04}\) | \(65.18\%\) |
GroupDRO | \(79.96\%_{\pm 0.03}\) | \(61.01\%_{\pm 0.05}\) | \(60.86\%_{\pm 0.02}\) | \(67.28\%\) |
IRM | \(76.15\%_{\pm 0.03}\) | \(\boldsymbol{67.82\%_{\pm 0.04}}\) | \(60.93\%_{\pm 0.01}\) | \(68.30\%\) |
CORAL | \(81.26\%_{\pm 0.02}\) | \(65.18\%_{\pm 0.05}\) | \(67.97\%_{\pm 0.01}\) | \(71.47\%\) |
CausIRL | \(\boldsymbol{86.44\%_{\pm 0.01}}\) | \(66.11\%_{\pm 0.01}\) | \(\boldsymbol{71.36\%_{\pm 0.02}}\) | \(\boldsymbol{74.64\%}\) |
MMD-AAE | \(78.91\%_{\pm 0.02}\) | \(64.21\%_{\pm 0.03}\) | \(66.86\%_{\pm 0.01}\) | \(69.99\%\) |
@article{lynch2023spawrious,
title={Spawrious: A benchmark for fine control of spurious correlation biases},
author={Lynch, Aengus and Dovonon, Gb{\`e}tondji JS and Kaddour, Jean and Silva, Ricardo},
journal={arXiv preprint arXiv:2303.05470},
year={2023}
}