File size: 2,488 Bytes

70ef163

---
language: en
datasets: yahoo_answers_topics
tags:
  - text-classification
  - topic-classification
  - yahoo-answers
  - distilbert
  - transformers
  - pytorch
license: apache-2.0
model-index:
  - name: DistilBERT Yahoo Answers Classifier
    results:
      - task:
          name: Topic Classification
          type: text-classification
        dataset:
          name: Yahoo Answers Topics
          type: yahoo_answers_topics
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.71  
---

# DistilBERT Fine-Tuned on Yahoo Answers Topics

This is a fine-tuned [DistilBERT](https://huggingface.co/distilbert-base-uncased) model for **topic classification** on the [Yahoo Answers Topics dataset](https://huggingface.co/datasets/yahoo_answers_topics). It classifies questions into one of 10 predefined categories like "Science & Mathematics", "Health", "Business & Finance", etc.

## 🧠 Model Details

- **Base model**: `distilbert-base-uncased`
- **Task**: Multi-class Text Classification (10 classes)
- **Dataset**: Yahoo Answers Topics
- **Training samples**: 50,000 (subset)
- **Evaluation samples**: 5,000 (subset)
- **Metrics**: Accuracy 

## 🧪 How to Use

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("Koushim/distilbert-yahoo-answers")
model = AutoModelForSequenceClassification.from_pretrained("Koushim/distilbert-yahoo-answers")

text = "How do I improve my math skills for competitive exams?"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)

predicted_class = outputs.logits.argmax(dim=1).item()
print("Predicted class:", predicted_class)
````

## 📊 Classes (Labels)

0. Society & Culture
1. Science & Mathematics
2. Health
3. Education & Reference
4. Computers & Internet
5. Sports
6. Business & Finance
7. Entertainment & Music
8. Family & Relationships
9. Politics & Government

## 📦 Training Details

* Optimizer: AdamW
* Learning rate: 2e-5
* Batch size: 16 (train), 32 (eval)
* Epochs: 3
* Weight decay: 0.01
* Framework: PyTorch + 🤗 Transformers

## 📁 Repository Structure

* `config.json` – Model config
* `pytorch_model.bin` – Trained model weights
* `tokenizer.json`, `vocab.txt` – Tokenizer files

## ✍️ Author

* Hugging Face Hub: [Koushim](https://huggingface.co/Koushim)
* Model trained using `transformers.Trainer` API

## 📄 License

Apache 2.0

````