File size: 2,488 Bytes
70ef163
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
---
language: en
datasets: yahoo_answers_topics
tags:
  - text-classification
  - topic-classification
  - yahoo-answers
  - distilbert
  - transformers
  - pytorch
license: apache-2.0
model-index:
  - name: DistilBERT Yahoo Answers Classifier
    results:
      - task:
          name: Topic Classification
          type: text-classification
        dataset:
          name: Yahoo Answers Topics
          type: yahoo_answers_topics
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.71  
---

# DistilBERT Fine-Tuned on Yahoo Answers Topics

This is a fine-tuned [DistilBERT](https://huggingface.co/distilbert-base-uncased) model for **topic classification** on the [Yahoo Answers Topics dataset](https://huggingface.co/datasets/yahoo_answers_topics). It classifies questions into one of 10 predefined categories like "Science & Mathematics", "Health", "Business & Finance", etc.

## 🧠 Model Details

- **Base model**: `distilbert-base-uncased`
- **Task**: Multi-class Text Classification (10 classes)
- **Dataset**: Yahoo Answers Topics
- **Training samples**: 50,000 (subset)
- **Evaluation samples**: 5,000 (subset)
- **Metrics**: Accuracy 

## πŸ§ͺ How to Use

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("Koushim/distilbert-yahoo-answers")
model = AutoModelForSequenceClassification.from_pretrained("Koushim/distilbert-yahoo-answers")

text = "How do I improve my math skills for competitive exams?"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)

predicted_class = outputs.logits.argmax(dim=1).item()
print("Predicted class:", predicted_class)
````

## πŸ“Š Classes (Labels)

0. Society & Culture
1. Science & Mathematics
2. Health
3. Education & Reference
4. Computers & Internet
5. Sports
6. Business & Finance
7. Entertainment & Music
8. Family & Relationships
9. Politics & Government

## πŸ“¦ Training Details

* Optimizer: AdamW
* Learning rate: 2e-5
* Batch size: 16 (train), 32 (eval)
* Epochs: 3
* Weight decay: 0.01
* Framework: PyTorch + πŸ€— Transformers

## πŸ“ Repository Structure

* `config.json` – Model config
* `pytorch_model.bin` – Trained model weights
* `tokenizer.json`, `vocab.txt` – Tokenizer files

## ✍️ Author

* Hugging Face Hub: [Koushim](https://huggingface.co/Koushim)
* Model trained using `transformers.Trainer` API

## πŸ“„ License

Apache 2.0

````