metadata
language: en
datasets: yahoo_answers_topics
tags:
- text-classification
- topic-classification
- yahoo-answers
- distilbert
- transformers
- pytorch
license: apache-2.0
model-index:
- name: DistilBERT Yahoo Answers Classifier
results:
- task:
name: Topic Classification
type: text-classification
dataset:
name: Yahoo Answers Topics
type: yahoo_answers_topics
metrics:
- name: Accuracy
type: accuracy
value: 0.71
DistilBERT Fine-Tuned on Yahoo Answers Topics
This is a fine-tuned DistilBERT model for topic classification on the Yahoo Answers Topics dataset. It classifies questions into one of 10 predefined categories like "Science & Mathematics", "Health", "Business & Finance", etc.
π§ Model Details
- Base model:
distilbert-base-uncased
- Task: Multi-class Text Classification (10 classes)
- Dataset: Yahoo Answers Topics
- Training samples: 50,000 (subset)
- Evaluation samples: 5,000 (subset)
- Metrics: Accuracy
π§ͺ How to Use
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("Koushim/distilbert-yahoo-answers")
model = AutoModelForSequenceClassification.from_pretrained("Koushim/distilbert-yahoo-answers")
text = "How do I improve my math skills for competitive exams?"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
predicted_class = outputs.logits.argmax(dim=1).item()
print("Predicted class:", predicted_class)
π Classes (Labels)
- Society & Culture
- Science & Mathematics
- Health
- Education & Reference
- Computers & Internet
- Sports
- Business & Finance
- Entertainment & Music
- Family & Relationships
- Politics & Government
π¦ Training Details
- Optimizer: AdamW
- Learning rate: 2e-5
- Batch size: 16 (train), 32 (eval)
- Epochs: 3
- Weight decay: 0.01
- Framework: PyTorch + π€ Transformers
π Repository Structure
config.json
β Model configpytorch_model.bin
β Trained model weightstokenizer.json
,vocab.txt
β Tokenizer files
βοΈ Author
- Hugging Face Hub: Koushim
- Model trained using
transformers.Trainer
API
π License
Apache 2.0