Koushim commited on
Commit
70ef163
Β·
verified Β·
1 Parent(s): 7dc0f5e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +93 -3
README.md CHANGED
@@ -1,3 +1,93 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ datasets: yahoo_answers_topics
4
+ tags:
5
+ - text-classification
6
+ - topic-classification
7
+ - yahoo-answers
8
+ - distilbert
9
+ - transformers
10
+ - pytorch
11
+ license: apache-2.0
12
+ model-index:
13
+ - name: DistilBERT Yahoo Answers Classifier
14
+ results:
15
+ - task:
16
+ name: Topic Classification
17
+ type: text-classification
18
+ dataset:
19
+ name: Yahoo Answers Topics
20
+ type: yahoo_answers_topics
21
+ metrics:
22
+ - name: Accuracy
23
+ type: accuracy
24
+ value: 0.71
25
+ ---
26
+
27
+ # DistilBERT Fine-Tuned on Yahoo Answers Topics
28
+
29
+ This is a fine-tuned [DistilBERT](https://huggingface.co/distilbert-base-uncased) model for **topic classification** on the [Yahoo Answers Topics dataset](https://huggingface.co/datasets/yahoo_answers_topics). It classifies questions into one of 10 predefined categories like "Science & Mathematics", "Health", "Business & Finance", etc.
30
+
31
+ ## 🧠 Model Details
32
+
33
+ - **Base model**: `distilbert-base-uncased`
34
+ - **Task**: Multi-class Text Classification (10 classes)
35
+ - **Dataset**: Yahoo Answers Topics
36
+ - **Training samples**: 50,000 (subset)
37
+ - **Evaluation samples**: 5,000 (subset)
38
+ - **Metrics**: Accuracy
39
+
40
+ ## πŸ§ͺ How to Use
41
+
42
+ ```python
43
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
44
+
45
+ tokenizer = AutoTokenizer.from_pretrained("Koushim/distilbert-yahoo-answers")
46
+ model = AutoModelForSequenceClassification.from_pretrained("Koushim/distilbert-yahoo-answers")
47
+
48
+ text = "How do I improve my math skills for competitive exams?"
49
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
50
+ outputs = model(**inputs)
51
+
52
+ predicted_class = outputs.logits.argmax(dim=1).item()
53
+ print("Predicted class:", predicted_class)
54
+ ````
55
+
56
+ ## πŸ“Š Classes (Labels)
57
+
58
+ 0. Society & Culture
59
+ 1. Science & Mathematics
60
+ 2. Health
61
+ 3. Education & Reference
62
+ 4. Computers & Internet
63
+ 5. Sports
64
+ 6. Business & Finance
65
+ 7. Entertainment & Music
66
+ 8. Family & Relationships
67
+ 9. Politics & Government
68
+
69
+ ## πŸ“¦ Training Details
70
+
71
+ * Optimizer: AdamW
72
+ * Learning rate: 2e-5
73
+ * Batch size: 16 (train), 32 (eval)
74
+ * Epochs: 3
75
+ * Weight decay: 0.01
76
+ * Framework: PyTorch + πŸ€— Transformers
77
+
78
+ ## πŸ“ Repository Structure
79
+
80
+ * `config.json` – Model config
81
+ * `pytorch_model.bin` – Trained model weights
82
+ * `tokenizer.json`, `vocab.txt` – Tokenizer files
83
+
84
+ ## ✍️ Author
85
+
86
+ * Hugging Face Hub: [Koushim](https://huggingface.co/Koushim)
87
+ * Model trained using `transformers.Trainer` API
88
+
89
+ ## πŸ“„ License
90
+
91
+ Apache 2.0
92
+
93
+ ````