base_model: | |
- laion/CLIP-ViT-H-14-laion2B-s32B-b79K | |
datasets: | |
- ILSVRC/imagenet-1k | |
- mlfoundations/datacomp_small | |
license: mit | |
pipeline_tag: feature-extraction | |
library_name: transformers | |
[[Paper]](https://www.arxiv.org/abs/2506.03355) [[Code]](https://github.com/LIONS-EPFL/LEAF) | |
Model Initialized from `laion/CLIP-ViT-H-14-laion2B-s32B-b79K`. The image encoder is finetuned with FARE at $\epsilon=2/255$. The text encoder is finetuned with LEAF at $k=1$ with $\rho=50$ and semantic constraints. | |
To load this model use: | |
```python | |
from transformers import CLIPProcessor, CLIPModel | |
model_name = "LEAF-CLIP/OpenCLIP-ViT-H-rho50-k1-constrained-FARE2" | |
processor_name = "laion/CLIP-ViT-H-14-laion2B-s32B-b79K" | |
model = CLIPModel.from_pretrained(model_name) | |
processor = CLIPProcessor.from_pretrained(processor_name) | |
``` |