Text Classification
Transformers
Safetensors
Ukrainian
xlm-roberta
emotions
dardem's picture
Update README.md
37737ea verified
metadata
library_name: transformers
tags:
  - emotions
license: openrail++
datasets:
  - ukr-detect/ukr-emotions-binary
language:
  - uk
metrics:
  - accuracy
  - f1
base_model:
  - intfloat/multilingual-e5-large
pipeline_tag: text-classification

EmoBench-UA: Emotions Detection in Ukrainian Texts

EmoBench-UA

Model Details

We provide the first of its kind emotions detector in Ukrainian texts. We cover six basic emotions: Joy, Anger, Fear, Disgust, Surprise, Sadness -- and None. Any text can contain any amount of emotion -- only one, several, or none at all. The texts with None emotions are the ones where the labels per emotions classes are 0.

The base model intfloat/multilingual-e5-large was fine-tuned for multi-label emotions classification task on the train part of ukr-emotions-binary dataset.

Usage: The model outputs 0 or 1 indicating the presence of the emotion in the text. Thus, this model can be used to detect any basic emotions presence or their absence in the Ukrainian texts.

Evaluation Results

General classification report of our model on the test part of ukr-emotions-binary:

precision recall f1-score support
Joy 0.69 0.78 0.73 368
Fear 0.80 0.81 0.81 151
Anger 0.41 0.25 0.31 99
Sadness 0.67 0.71 0.69 298
Disgust 0.63 0.24 0.35 79
Surprise 0.52 0.72 0.60 175
None 0.82 0.80 0.81 1108
micro avg 0.73 0.74 0.73 2278
macro avg 0.65 0.62 0.62 2278
weighted avg 0.73 0.74 0.73 2278
samples avg 0.72 0.74 0.72 2278
EmoBench-UA

How to Get Started with the Model

We provide the code to get started with the model:

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# 1. Set up model & tokenizer (update model_name as needed)
model_name = "ukr-detect/emotions_classifier"  # Or path to local dir
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()  # For inference

# 2. Define thresholds (must match the order of model's id2label)
thresholds = {
    "Joy": 0.35,
    "Fear": 0.5,
    "Anger": 0.25,
    "Sadness": 0.5,
    "Disgust": 0.3,
    "Surprise": 0.25,
    "None": 0.35
}

# 3. Prepare a function for prediction
def predict_emotions(texts):
    # Tokenize
    enc = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")
    with torch.no_grad():
        outputs = model(**enc)
        logits = outputs.logits
        probs = torch.sigmoid(logits)
    
    # Get label mapping in correct order
    id2label = model.config.id2label
    label_order = [id2label[i] for i in range(len(id2label))]
    
    # Threshold and get predictions
    predictions = []
    for prob_row in probs:
        single_pred = [
            label for label, prob in zip(label_order, prob_row.tolist())
            if prob > thresholds[label]
        ]
        predictions.append(single_pred)
    return predictions

# 4. Example usage
texts = [
    "Я щойно отримав підвищення на роботі!",
    "Я хвилююся за майбутнє.",
    "я не буду заради тебе, Гані і Каті терпіти того пйоса",
    "Сьогодні нічого не сталося.",
    "всі чомусь плутають, а мені то дивно так )))",
    " ого, то зовсім не приємно(("
]

results = predict_emotions(texts)
for t, r in zip(texts, results):
    print(f"Text: {t}\nEmotions: {r}\n")

# Expected output

Text: Я щойно отримав підвищення на роботі!
Emotions: ['Joy', 'Surprise']

Text: Я хвилююся за майбутнє.
Emotions: ['Fear']

Text: я не буду заради тебе, Гані і Каті терпіти того пйоса
Emotions: ['Anger', 'Disgust']

Text: Сьогодні нічого не сталося.
Emotions: ['None']

Text:  всі чомусь плутають, а мені то дивно так )))
Emotions: ['Surprise']

Text:  ого, то зовсім не приємно((
Emotions: ['Sadness']

Citation

If you would like to acknowledge our work, please, cite the following manuscript:

@article{dementieva2025emobench,
  title={EmoBench-UA: A Benchmark Dataset for Emotion Detection in Ukrainian},
  author={Dementieva, Daryna and Babakov, Nikolay and Fraser, Alexander},
  journal={arXiv preprint arXiv:2505.23297},
  year={2025}
}

Contacts

Nikolay Babakov, Daryna Dementieva