File size: 3,707 Bytes
8e4529a
 
1460cb8
 
 
 
 
 
 
 
8e4529a
965d1cb
 
 
 
 
 
 
8e4529a
 
 
 
 
 
 
 
 
 
 
 
 
 
965d1cb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
---
license: apache-2.0
datasets:
- prithivMLmods/Multilabel-Portrait-18K
language:
- en
base_model:
- google/siglip2-base-patch16-224
pipeline_tag: image-classification
library_name: transformers
---

![zdfghgdftg.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/N8f6dbYHMmE02vq5wseAE.png)

# **Multilabel-Portrait-SigLIP2**

> **Multilabel-Portrait-SigLIP2** is a vision-language model fine-tuned from [**google/siglip2-base-patch16-224**](https://huggingface.co/google/siglip2-base-patch16-224) using the `SiglipForImageClassification` architecture. It classifies portrait-style images into one of the following **visual portrait categories**:

```py
 Classification Report:
                  precision    recall  f1-score   support

  Anime Portrait     0.9989    0.9991    0.9990      4444
Cartoon Portrait     0.9964    0.9926    0.9945      4444
   Real Portrait     0.9964    0.9971    0.9967      4444
 Sketch Portrait     0.9971    1.0000    0.9985      4444

        accuracy                         0.9972     17776
       macro avg     0.9972    0.9972    0.9972     17776
    weighted avg     0.9972    0.9972    0.9972     17776
```

![download.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/vOq9EHhJGLzRJSQJ5_liQ.png)

---

# **Model Objective**

The model is designed to **analyze portrait images** and categorize them into **one of four distinct portrait types**:

- **0:** Anime Portrait  
- **1:** Cartoon Portrait  
- **2:** Real Portrait  
- **3:** Sketch Portrait

---

# **Try it with Transformers πŸ€—**

Install dependencies:

```bash
pip install -q transformers torch pillow gradio
```

Run the model with the following script:

```python
import gradio as gr
from transformers import AutoImageProcessor, SiglipForImageClassification
from PIL import Image
import torch

# Load model and processor
model_name = "prithivMLmods/Multilabel-Portrait-SigLIP2"  # Replace with actual HF model path
model = SiglipForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)

# Label mapping
id2label = {
    0: "Anime Portrait",
    1: "Cartoon Portrait",
    2: "Real Portrait",
    3: "Sketch Portrait"
}

def classify_portrait(image):
    """Predict the type of portrait style from an image."""
    image = Image.fromarray(image).convert("RGB")
    inputs = processor(images=image, return_tensors="pt")

    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()

    predictions = {id2label[i]: round(probs[i], 3) for i in range(len(probs))}
    predictions = dict(sorted(predictions.items(), key=lambda item: item[1], reverse=True))
    return predictions

# Gradio interface
iface = gr.Interface(
    fn=classify_portrait,
    inputs=gr.Image(type="numpy"),
    outputs=gr.Label(label="Portrait Type Prediction Scores"),
    title="Multilabel-Portrait-SigLIP2",
    description="Upload a portrait-style image (anime, cartoon, real, or sketch) to predict its most likely visual category."
)

if __name__ == "__main__":
    iface.launch()
```

---

# **Intended Use Cases**

-  **AI Art Curation** β€” Automatically organize large-scale datasets of artistic portraits.  
-  **Style-based Portrait Analysis** β€” Determine artistic style in user-uploaded or curated portrait datasets.  
-  **Content Filtering for Platforms** β€” Group and recommend based on visual aesthetics.  
-  **Dataset Pre-labeling** β€” Helps reduce manual effort in annotation tasks.  
-  **User Avatar Classification** β€” Profile categorization in social or gaming platforms.