🪜 LADDER: Language-Driven Slice Discovery and Error Rectification in Vision Classifiers

📌 Summary

LADDER is a general framework that enables vision classifiers to automatically discover subpopulations (or "slices") of data where the model is underperforming — without requiring group annotations. It leverages vision-language representations and the reasoning capabilities of large language models (LLMs) to detect and rectify bias-inducing features in both natural and medical imaging domains.

🧠 Architecture & Components

🔍 Slice Discovery using:
- CLIP, Mammo-CLIP, and CXR-CLIP features
- BLIP and GPT-4o-generated captions
🧠 Hypothesis Generation using:
- GPT-4o, Claude, Gemini, LLaMA
✅ Bias Mitigation via reweighting & pseudo-labeling

📊 Datasets Used

Natural Images: Waterbirds, CelebA, MetaShift
Medical Images: NIH ChestX-ray, RSNA Mammograms, VinDr Mammograms

📦 Files Included

File	Description
`model.pt`	Pretrained model checkpoint
`feature_cache.pkl`	Cached representations (CLIP/Mammo-CLIP/CXR-CLIP)
`metadata.csv`	Metadata with discovered slice labels
`caption_blip.json`	BLIP-generated captions
`caption_gpt4o.json`	GPT-4o-generated captions
`predictions.json`	Model predictions on test set

🧪 Benchmarks

LADDER outperforms traditional slice discovery methods (Domino, FACTS) across 6 datasets and >200 classifiers. It is especially effective in:

Discovering hidden biases without explicit attribute labels
Reasoning about non-visual factors (e.g., preprocessing artifacts)
Operating without human-written captions

📜 Citation

@article{ghosh2024ladder,
  title={LADDER: Language Driven Slice Discovery and Error Rectification},
  author={Ghosh, Shantanu and Syed, Rayan and Wang, Chenyu and Poynton, Clare B and Visweswaran, Shyam and Batmanghelich, Kayhan},
  journal={arXiv preprint arXiv:2408.07832},
  year={2024}
}

🤝 Acknowledgements

Boston University, Stanford University, BUMC, and the University of Pittsburgh.