About

Leaderboard | Paper | Github | DomainBed

Spawrious 🐾 is a challenging Out-Of-Distribution image classification benchmark including spurious correlations. It consists of six separate challenges split into two types: one-to-one and many-to-many spurious correlation challenges.

One-to-One

O2O Illustration

Many-to-Many

M2M Illustration

Example Pictures

Image 1 Image 2 Image 3 Image 4 Image 5 Image 6

Leaderboard

Results Submission Form

ResNet18 pre-trained on ImageNet

One-to-One

Method Easy Medium Hard Average
ERM \(72.15\%_{\pm 0.03}\) \(69.85\%_{\pm 0.01}\) \(65.65\%_{\pm 0.04}\) \(69.22\%\)
GroupDRO \(68.72\%_{\pm 0.02}\) \(71.87\%_{\pm 0.01}\) \(60.90\%_{\pm 0.03}\) \(67.16\%\)
IRM \(71.26\%_{\pm 0.02}\) \(68.18\%_{\pm 0.02}\) \(63.78\%_{\pm 0.03}\) \(67.74\%\)
CORAL \(83.85\%_{\pm 0.01}\) \(73.96\%_{\pm 0.01}\) \(72.18\%_{\pm 0.03}\) \(\boldsymbol{76.66}\%\)
CausIRL \(\boldsymbol{84.21\%_{\pm 0.01}}\) \(73.45\%_{\pm 0.02}\) \(71.20\%_{\pm 0.02}\) \(76.29\%\)
MMD-AAE \(82.92\%_{\pm 0.03}\) \(\boldsymbol{74.09\%_{\pm 0.03}}\) \(\boldsymbol{72.60\%_{\pm 0.05}}\) \(76.54\%\)

Many-to-Many

Method Easy Medium Hard Average
ERM \(72.51\%_{\pm 0.05}\) \(51.36\%_{\pm 0.04}\) \(47.02\%_{\pm 0.01}\) \(56.96\%\)
GroupDRO \(74.82\%_{\pm 0.04}\) \(52.06\%_{\pm 0.03}\) \(52.79\%_{\pm 0.03}\) \(59.89\%\)
IRM \(73.28\%_{\pm 0.04}\) \(42.43\%_{\pm 0.07}\) \(44.51\%_{\pm 0.06}\) \(53.41\%\)
CORAL \(79.91\%_{\pm 0.00}\) \(58.09\%_{\pm 0.01}\) \(56.51\%_{\pm 0.03}\) \(64.84\%\)
CausIRL \(81.21\%_{\pm 0.01}\) \(56.79\%_{\pm 0.01}\) \(56.31\%_{\pm 0.03}\) \(64.77\%\)
MMD-AAE \(\boldsymbol{83.45\%_{\pm 0.01}}\) \(\boldsymbol{60.27\%_{\pm 0.03}}\) \(\boldsymbol{58.26\%_{\pm 0.00}}\) \(\boldsymbol{67.33\%}\)

ResNet50 pre-trained on ImageNet

One-to-One

Method Easy Medium Hard Average
ERM \(77.49\%_{\pm 0.05}\) \(76.60\%_{\pm 0.02}\) \(71.32\%_{\pm 0.09}\) \(75.14\%\)
GroupDRO \(80.58\%_{\pm 0.01}\) \(75.96\%_{\pm 0.02}\) \(76.99\%_{\pm 0.03}\) \(77.84\%\)
IRM \(75.45\%_{\pm 0.03}\) \(76.39\%_{\pm 0.02}\) \(74.90\%_{\pm 0.01}\) \(75.58\%\)
CORAL \(\boldsymbol{89.66\%_{\pm 0.01}}\) \(\boldsymbol{81.05\%_{\pm 0.01}}\) \(79.65\%_{\pm 0.02}\) \(\boldsymbol{83.45}\%\)
CausIRL \(89.32\%_{\pm 0.01}\) \(78.64\%_{\pm 0.01}\) \(\boldsymbol{80.40\%_{\pm 0.01}}\) \(82.79\%\)
MMD-AAE \(78.81\%_{\pm 0.02}\) \(75.33\%_{\pm 0.03}\) \(72.66\%_{\pm 0.01}\) \(75.60\%\)

Many-to-Many

Method Easy Medium Hard Average
ERM \(83.80\%_{\pm 0.01}\) \(53.05\%_{\pm 0.03}\) \(58.70\%_{\pm 0.04}\) \(65.18\%\)
GroupDRO \(79.96\%_{\pm 0.03}\) \(61.01\%_{\pm 0.05}\) \(60.86\%_{\pm 0.02}\) \(67.28\%\)
IRM \(76.15\%_{\pm 0.03}\) \(\boldsymbol{67.82\%_{\pm 0.04}}\) \(60.93\%_{\pm 0.01}\) \(68.30\%\)
CORAL \(81.26\%_{\pm 0.02}\) \(65.18\%_{\pm 0.05}\) \(67.97\%_{\pm 0.01}\) \(71.47\%\)
CausIRL \(\boldsymbol{86.44\%_{\pm 0.01}}\) \(66.11\%_{\pm 0.01}\) \(\boldsymbol{71.36\%_{\pm 0.02}}\) \(\boldsymbol{74.64\%}\)
MMD-AAE \(78.91\%_{\pm 0.02}\) \(64.21\%_{\pm 0.03}\) \(66.86\%_{\pm 0.01}\) \(69.99\%\)

Citation

@article{lynch2023spawrious,
  title={Spawrious: A benchmark for fine control of spurious correlation biases},
  author={Lynch, Aengus and Dovonon, Gb{\`e}tondji JS and Kaddour, Jean and Silva, Ricardo},
  journal={arXiv preprint arXiv:2303.05470},
  year={2023}
}