Text Classification
Transformers
Safetensors
Multiple languages
roberta
ancient-egyptian
classification
transliteration
historical-nlp
ae-translit

roberta-aegyptustranslit-classifier

A fine-tuned RoBERTa-base model for classifying Ancient Egyptian transliterations into their respective historical time periods ('Predynastic & Early Dynastic', 'Old Kingdom & First Intermediate', 'Middle Kingdom & Second Intermediate', 'New Kingdom & Third Intermediate', 'Late Period & Greco-Roman Egypt').

Model Details

  • Base model: roberta-base
  • Task: Text classification
  • Classes: 5 historical time periods
  • Fine-tuned on: Custom labeled dataset (transliteration, period)

Training Results

Metric Value
F1 Score ~0.562
Weighted F1 ~0.567
Validation Loss ~1.5295
Epochs 20
Learning Rate 2e-5
Batch Size 64

Per-class F1 scores:

  • Predynastic & Early Dynastic: F1 = 0.576
  • Old Kingdom & First Intermediate: F1 = 0.432
  • Middle Kingdom & Second Intermediate: F1 = 0.468
  • New Kingdom & Third Intermediate: F1 = 0.713
  • Late Period & Greco-Roman Egypt: F1 = 0.608

Intended Use & Limitations

This model is designed for historical text classification and intended for exploratory research and as a performance baseline. Current constraints include:

  • Data limitations: ~10k balanced samples may not represent all orthographic variations.
  • Period bias: Middle Kingdom classification (F1=0.47) underperforms due to:
    • Orthographic overlap with neighboring periods
  • Best practices: Always verify critical classifications with primary sources.

Roadmap:

  • v2.0: Expand to ccorpus samples (balanced & unbalanced)

Data Used For Training

Thesaurus Linguae Aegyptiae, Late Egyptian sentences, corpus v19, premium, https://huggingface.co/datasets/thesaurus-linguae-aegyptiae/tla-late_egyptian-v19-premium, v1.0, 1/19/2025 ed. by Tonio Sebastian Richter & Daniel A. Werning on behalf of the Berlin-Brandenburgische Akademie der Wissenschaften and Hans-Werner Fischer-Elfert & Peter Dils on behalf of the Sächsische Akademie der Wissenschaften zu Leipzig. Thesaurus Linguae Aegyptiae, Original Earlier Egyptian sentences, corpus v18, premium, https://huggingface.co/datasets/thesaurus-linguae-aegyptiae/tla-Earlier_Egyptian_original-v18-premium, v1.1, 2/16/2024 ed. by Tonio Sebastian Richter & Daniel A. Werning on behalf of the Berlin-Brandenburgische Akademie der Wissenschaften and Hans-Werner Fischer-Elfert & Peter Dils on behalf of the Sächsische Akademie der Wissenschaften zu Leipzig. Thesaurus Linguae Aegyptiae, Demotic sentences, corpus v18, premium https://huggingface.co/datasets/thesaurus-linguae-aegyptiae/tla-demotic-v18-premium, v1.1, 2/16/2024 ed. by Tonio Sebastian Richter & Daniel A. Werning on behalf of the Berlin-Brandenburgische Akademie der Wissenschaften and Hans-Werner Fischer-Elfert & Peter Dils on behalf of the Sächsische Akademie der Wissenschaften zu Leipzig.

Usage

# Using pipline
from transformers import pipeline

classifier = pipeline("text-classification", model="RamzyBakir/roberta-aegyptustranslit-classifier")
classifier("bn-ꞽw n rmṯ-ḫm ꞽn ꜥn pꜣ nty ḫꜣꜥ pꜣ myṱ r-ḏbꜣ swg")

# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("RamzyBakir/roberta-aegyptustranslit-classifier")
tokenizer = AutoTokenizer.from_pretrained("RamzyBakir/roberta-aegyptustranslit-classifier")
Downloads last month
82
Safetensors
Model size
125M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for RamzyBakir/roberta-aegyptustranslit-classifier

Finetuned
(1633)
this model

Datasets used to train RamzyBakir/roberta-aegyptustranslit-classifier