File size: 2,488 Bytes
70ef163 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
---
language: en
datasets: yahoo_answers_topics
tags:
- text-classification
- topic-classification
- yahoo-answers
- distilbert
- transformers
- pytorch
license: apache-2.0
model-index:
- name: DistilBERT Yahoo Answers Classifier
results:
- task:
name: Topic Classification
type: text-classification
dataset:
name: Yahoo Answers Topics
type: yahoo_answers_topics
metrics:
- name: Accuracy
type: accuracy
value: 0.71
---
# DistilBERT Fine-Tuned on Yahoo Answers Topics
This is a fine-tuned [DistilBERT](https://huggingface.co/distilbert-base-uncased) model for **topic classification** on the [Yahoo Answers Topics dataset](https://huggingface.co/datasets/yahoo_answers_topics). It classifies questions into one of 10 predefined categories like "Science & Mathematics", "Health", "Business & Finance", etc.
## π§ Model Details
- **Base model**: `distilbert-base-uncased`
- **Task**: Multi-class Text Classification (10 classes)
- **Dataset**: Yahoo Answers Topics
- **Training samples**: 50,000 (subset)
- **Evaluation samples**: 5,000 (subset)
- **Metrics**: Accuracy
## π§ͺ How to Use
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("Koushim/distilbert-yahoo-answers")
model = AutoModelForSequenceClassification.from_pretrained("Koushim/distilbert-yahoo-answers")
text = "How do I improve my math skills for competitive exams?"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
predicted_class = outputs.logits.argmax(dim=1).item()
print("Predicted class:", predicted_class)
````
## π Classes (Labels)
0. Society & Culture
1. Science & Mathematics
2. Health
3. Education & Reference
4. Computers & Internet
5. Sports
6. Business & Finance
7. Entertainment & Music
8. Family & Relationships
9. Politics & Government
## π¦ Training Details
* Optimizer: AdamW
* Learning rate: 2e-5
* Batch size: 16 (train), 32 (eval)
* Epochs: 3
* Weight decay: 0.01
* Framework: PyTorch + π€ Transformers
## π Repository Structure
* `config.json` β Model config
* `pytorch_model.bin` β Trained model weights
* `tokenizer.json`, `vocab.txt` β Tokenizer files
## βοΈ Author
* Hugging Face Hub: [Koushim](https://huggingface.co/Koushim)
* Model trained using `transformers.Trainer` API
## π License
Apache 2.0
```` |