Koushim's picture
Update README.md
70ef163 verified
---
language: en
datasets: yahoo_answers_topics
tags:
- text-classification
- topic-classification
- yahoo-answers
- distilbert
- transformers
- pytorch
license: apache-2.0
model-index:
- name: DistilBERT Yahoo Answers Classifier
results:
- task:
name: Topic Classification
type: text-classification
dataset:
name: Yahoo Answers Topics
type: yahoo_answers_topics
metrics:
- name: Accuracy
type: accuracy
value: 0.71
---
# DistilBERT Fine-Tuned on Yahoo Answers Topics
This is a fine-tuned [DistilBERT](https://huggingface.co/distilbert-base-uncased) model for **topic classification** on the [Yahoo Answers Topics dataset](https://huggingface.co/datasets/yahoo_answers_topics). It classifies questions into one of 10 predefined categories like "Science & Mathematics", "Health", "Business & Finance", etc.
## 🧠 Model Details
- **Base model**: `distilbert-base-uncased`
- **Task**: Multi-class Text Classification (10 classes)
- **Dataset**: Yahoo Answers Topics
- **Training samples**: 50,000 (subset)
- **Evaluation samples**: 5,000 (subset)
- **Metrics**: Accuracy
## πŸ§ͺ How to Use
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("Koushim/distilbert-yahoo-answers")
model = AutoModelForSequenceClassification.from_pretrained("Koushim/distilbert-yahoo-answers")
text = "How do I improve my math skills for competitive exams?"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
predicted_class = outputs.logits.argmax(dim=1).item()
print("Predicted class:", predicted_class)
````
## πŸ“Š Classes (Labels)
0. Society & Culture
1. Science & Mathematics
2. Health
3. Education & Reference
4. Computers & Internet
5. Sports
6. Business & Finance
7. Entertainment & Music
8. Family & Relationships
9. Politics & Government
## πŸ“¦ Training Details
* Optimizer: AdamW
* Learning rate: 2e-5
* Batch size: 16 (train), 32 (eval)
* Epochs: 3
* Weight decay: 0.01
* Framework: PyTorch + πŸ€— Transformers
## πŸ“ Repository Structure
* `config.json` – Model config
* `pytorch_model.bin` – Trained model weights
* `tokenizer.json`, `vocab.txt` – Tokenizer files
## ✍️ Author
* Hugging Face Hub: [Koushim](https://huggingface.co/Koushim)
* Model trained using `transformers.Trainer` API
## πŸ“„ License
Apache 2.0
````