Feature Extraction
Transformers
Safetensors
clip
zero-shot-image-classification
File size: 832 Bytes
f3d3027
c43df35
 
f3d3027
 
 
c43df35
 
 
f3d3027
c43df35
ebb4ec5
f3d3027
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
---
base_model:
- laion/CLIP-ViT-H-14-laion2B-s32B-b79K
datasets:
- ILSVRC/imagenet-1k
- mlfoundations/datacomp_small
license: mit
pipeline_tag: feature-extraction
library_name: transformers
---

[[Paper]](https://www.arxiv.org/abs/2506.03355)   [[Code]](https://github.com/LIONS-EPFL/LEAF)

Model Initialized from `laion/CLIP-ViT-H-14-laion2B-s32B-b79K`. The image encoder is finetuned with FARE at $\epsilon=2/255$. The text encoder is finetuned with LEAF at $k=1$ with $\rho=50$ and semantic constraints.

To load this model use:

```python
from transformers import CLIPProcessor, CLIPModel

model_name = "LEAF-CLIP/OpenCLIP-ViT-H-rho50-k1-constrained-FARE2"
processor_name = "laion/CLIP-ViT-H-14-laion2B-s32B-b79K"

model = CLIPModel.from_pretrained(model_name)
processor = CLIPProcessor.from_pretrained(processor_name)
```