File size: 3,906 Bytes
42a3f9d
 
 
c2989fd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
---
library_name: keras-hub
---
### Model Overview
SigLIP model pre-trained on WebLi at resolution 224x224. It was introduced in the paper [Sigmoid Loss for Language Image Pre-Training](https://arxiv.org/abs/2303.15343) by Zhai et al. and first released in this [repository](https://github.com/google-research/big_vision).
SigLIP is [CLIP](https://huggingface.co/docs/transformers/model_doc/clip), a multimodal model, with a better loss function. The sigmoid loss operates solely on image-text pairs and does not require a global view of the pairwise similarities for normalization. This allows further scaling up the batch size, while also performing better at smaller batch sizes.
A TLDR of SigLIP by one of the authors can be found [here](https://twitter.com/giffmana/status/1692641733459267713).

Weights are released under the [Apache 2 License](https://github.com/keras-team/keras-hub/blob/master/LICENSE) . Keras model code is released under the [Apache 2 License](https://github.com/keras-team/keras-hub/blob/master/LICENSE).

## Links

* [SigLIP Quickstart Notebook](https://www.kaggle.com/code/laxmareddypatlolla/siglip-quickstart-notebook-with-hub)
* [SigLIP API Documentation](coming soon)
* [SigLIP Model Card](https://arxiv.org/abs/2303.15343)
* [KerasHub Beginner Guide](https://keras.io/guides/keras_hub/getting_started/)
* [KerasHub Model Publishing Guide](https://keras.io/guides/keras_hub/upload/)

## Installation

Keras and KerasHub can be installed with:

```
pip install -U -q keras-hub
pip install -U -q keras
```

Jax, TensorFlow, and Torch come preinstalled in Kaggle Notebooks. For instructions on installing them in another environment see the [Keras Getting Started](https://keras.io/getting_started/) page.

## Presets

The following model checkpoints are provided by the Keras team. Full code examples for each are available below.

| Preset name                            | Parameters | Description                                                                                                  |
|---------------------------------------|------------|--------------------------------------------------------------------------------------------------------------|
|   |   |  |

## Example Usage
```Python
import keras
import numpy as np
import matplotlib.pyplot as plt
from keras_hub.models import SigLIPBackbone, SigLIPTokenizer
from keras_hub.layers import SigLIPImageConverter

# instantiate the model and preprocessing tools
siglip = SigLIPBackbone.from_preset("siglip2_large_patch16_384")
tokenizer = SigLIPTokenizer.from_preset("siglip2_large_patch16_384",
sequence_length=64)
image_converter = SigLIPImageConverter.from_preset("siglip2_large_patch16_384")

# obtain tokens for some input text
tokens = tokenizer.tokenize(["mountains", "cat on tortoise", "house"])

# preprocess image and text
image = keras.utils.load_img("cat.jpg")
image = image_converter(np.array([image]).astype(float))

# query the model for similarities
siglip({
     "images": image,
     "token_ids": tokens,
})
```

## Example Usage with Hugging Face URI

```Python
import keras
import numpy as np
import matplotlib.pyplot as plt
from keras_hub.models import SigLIPBackbone, SigLIPTokenizer
from keras_hub.layers import SigLIPImageConverter

# instantiate the model and preprocessing tools
siglip = SigLIPBackbone.from_preset("hf://keras/siglip2_large_patch16_384")
tokenizer = SigLIPTokenizer.from_preset("hf://keras/siglip2_large_patch16_384",
sequence_length=64)
image_converter = SigLIPImageConverter.from_preset("hf://keras/siglip2_large_patch16_384")

# obtain tokens for some input text
tokens = tokenizer.tokenize(["mountains", "cat on tortoise", "house"])

# preprocess image and text
image = keras.utils.load_img("cat.jpg")
image = image_converter(np.array([image]).astype(float))

# query the model for similarities
siglip({
     "images": image,
     "token_ids": tokens,
})
```