Koushim
/

distilbert-yahoo-answers-topic-classifier

Text Classification

topic-classification

Model card Files Files and versions Community

Koushim commited on 17 days ago

Commit

70ef163

·

verified ·

1 Parent(s): 7dc0f5e

Update README.md

Files changed (1) hide show

README.md +93 -3

README.md CHANGED Viewed

@@ -1,3 +1,93 @@
----
-license: mit
----

+---
+language: en
+datasets: yahoo_answers_topics
+tags:
+  - text-classification
+  - topic-classification
+  - yahoo-answers
+  - distilbert
+  - transformers
+  - pytorch
+license: apache-2.0
+model-index:
+  - name: DistilBERT Yahoo Answers Classifier
+    results:
+      - task:
+          name: Topic Classification
+          type: text-classification
+        dataset:
+          name: Yahoo Answers Topics
+          type: yahoo_answers_topics
+        metrics:
+          - name: Accuracy
+            type: accuracy
+            value: 0.71
+---
+# DistilBERT Fine-Tuned on Yahoo Answers Topics
+This is a fine-tuned [DistilBERT](https://huggingface.co/distilbert-base-uncased) model for **topic classification** on the [Yahoo Answers Topics dataset](https://huggingface.co/datasets/yahoo_answers_topics). It classifies questions into one of 10 predefined categories like "Science & Mathematics", "Health", "Business & Finance", etc.
+## 🧠 Model Details
+- **Base model**: `distilbert-base-uncased`
+- **Task**: Multi-class Text Classification (10 classes)
+- **Dataset**: Yahoo Answers Topics
+- **Training samples**: 50,000 (subset)
+- **Evaluation samples**: 5,000 (subset)
+- **Metrics**: Accuracy
+## 🧪 How to Use
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+tokenizer = AutoTokenizer.from_pretrained("Koushim/distilbert-yahoo-answers")
+model = AutoModelForSequenceClassification.from_pretrained("Koushim/distilbert-yahoo-answers")
+text = "How do I improve my math skills for competitive exams?"
+inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
+outputs = model(**inputs)
+predicted_class = outputs.logits.argmax(dim=1).item()
+print("Predicted class:", predicted_class)
+````
+## 📊 Classes (Labels)
+0. Society & Culture
+1. Science & Mathematics
+2. Health
+3. Education & Reference
+4. Computers & Internet
+5. Sports
+6. Business & Finance
+7. Entertainment & Music
+8. Family & Relationships
+9. Politics & Government
+## 📦 Training Details
+* Optimizer: AdamW
+* Learning rate: 2e-5
+* Batch size: 16 (train), 32 (eval)
+* Epochs: 3
+* Weight decay: 0.01
+* Framework: PyTorch + 🤗 Transformers
+## 📁 Repository Structure
+* `config.json` – Model config
+* `pytorch_model.bin` – Trained model weights
+* `tokenizer.json`, `vocab.txt` – Tokenizer files
+## ✍️ Author
+* Hugging Face Hub: [Koushim](https://huggingface.co/Koushim)
+* Model trained using `transformers.Trainer` API
+## 📄 License
+Apache 2.0
+````