SigLIP2 Content Filters - Models v.2
Collection
Moderation, Balance, Classifiers
•
4 items
•
Updated
•
2
Multilabel-GeoSceneNet is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for multi-label image classification. It is designed to recognize and label multiple geographic or environmental elements in a single image using the SiglipForImageClassification architecture.
Classification Report:
precision recall f1-score support
Buildings and Structures 0.8881 0.9498 0.9179 2190
Desert 0.9649 0.9480 0.9564 2000
Forest Area 0.9807 0.9855 0.9831 2271
Hill or Mountain 0.8616 0.8993 0.8800 2512
Ice Glacier 0.9114 0.8382 0.8732 2404
Sea or Ocean 0.9328 0.9525 0.9426 2274
Street View 0.9476 0.9106 0.9287 2382
accuracy 0.9245 16033
macro avg 0.9267 0.9263 0.9260 16033
weighted avg 0.9253 0.9245 0.9244 16033
The model predicts the presence of one or more of the following 7 geographic scene categories:
Class 0: "Buildings and Structures"
Class 1: "Desert"
Class 2: "Forest Area"
Class 3: "Hill or Mountain"
Class 4: "Ice Glacier"
Class 5: "Sea or Ocean"
Class 6: "Street View"
!pip install -q transformers torch pillow gradio
import gradio as gr
from transformers import AutoImageProcessor, SiglipForImageClassification
from PIL import Image
import torch
# Load model and processor
model_name = "prithivMLmods/Multilabel-GeoSceneNet" # Updated model name
model = SiglipForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)
def classify_geoscene_image(image):
"""Predicts geographic scene labels for an input image."""
image = Image.fromarray(image).convert("RGB")
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probs = torch.sigmoid(logits).squeeze().tolist() # Sigmoid for multilabel
labels = {
"0": "Buildings and Structures",
"1": "Desert",
"2": "Forest Area",
"3": "Hill or Mountain",
"4": "Ice Glacier",
"5": "Sea or Ocean",
"6": "Street View"
}
threshold = 0.5
predictions = {
labels[str(i)]: round(probs[i], 3)
for i in range(len(probs)) if probs[i] >= threshold
}
return predictions or {"None Detected": 0.0}
# Create Gradio interface
iface = gr.Interface(
fn=classify_geoscene_image,
inputs=gr.Image(type="numpy"),
outputs=gr.Label(label="Predicted Scene Categories"),
title="Multilabel-GeoSceneNet",
description="Upload an image to detect multiple geographic scene elements (e.g., forest, ocean, buildings)."
)
if __name__ == "__main__":
iface.launch()
The Multilabel-GeoSceneNet model is suitable for recognizing multiple geographic and structural elements in a single image. Use cases include:
Base model
google/siglip2-base-patch16-224