Koushim
/

distilbert-yahoo-answers-topic-classifier

Text Classification

topic-classification

Model card Files Files and versions Community

distilbert-yahoo-answers-topic-classifier / README.md

Koushim's picture

Update README.md

70ef163 verified 17 days ago

|

history blame contribute delete

2.49 kB

	---
	language: en
	datasets: yahoo_answers_topics
	tags:
	- text-classification
	- topic-classification
	- yahoo-answers
	- distilbert
	- transformers
	- pytorch
	license: apache-2.0
	model-index:
	- name: DistilBERT Yahoo Answers Classifier
	results:
	- task:
	name: Topic Classification
	type: text-classification
	dataset:
	name: Yahoo Answers Topics
	type: yahoo_answers_topics
	metrics:
	- name: Accuracy
	type: accuracy
	value: 0.71
	---

	# DistilBERT Fine-Tuned on Yahoo Answers Topics

	This is a fine-tuned [DistilBERT](https://huggingface.co/distilbert-base-uncased) model for topic classification on the [Yahoo Answers Topics dataset](https://huggingface.co/datasets/yahoo_answers_topics). It classifies questions into one of 10 predefined categories like "Science & Mathematics", "Health", "Business & Finance", etc.

	## 🧠 Model Details

	- Base model: `distilbert-base-uncased`
	- Task: Multi-class Text Classification (10 classes)
	- Dataset: Yahoo Answers Topics
	- Training samples: 50,000 (subset)
	- Evaluation samples: 5,000 (subset)
	- Metrics: Accuracy

	## 🧪 How to Use

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	tokenizer = AutoTokenizer.from_pretrained("Koushim/distilbert-yahoo-answers")
	model = AutoModelForSequenceClassification.from_pretrained("Koushim/distilbert-yahoo-answers")

	text = "How do I improve my math skills for competitive exams?"
	inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
	outputs = model(**inputs)

	predicted_class = outputs.logits.argmax(dim=1).item()
	print("Predicted class:", predicted_class)
	````

	## 📊 Classes (Labels)

	0. Society & Culture
	1. Science & Mathematics
	2. Health
	3. Education & Reference
	4. Computers & Internet
	5. Sports
	6. Business & Finance
	7. Entertainment & Music
	8. Family & Relationships
	9. Politics & Government

	## 📦 Training Details

	* Optimizer: AdamW
	* Learning rate: 2e-5
	* Batch size: 16 (train), 32 (eval)
	* Epochs: 3
	* Weight decay: 0.01
	* Framework: PyTorch + 🤗 Transformers

	## 📁 Repository Structure

	* `config.json` – Model config
	* `pytorch_model.bin` – Trained model weights
	* `tokenizer.json`, `vocab.txt` – Tokenizer files

	## ✍️ Author

	* Hugging Face Hub: [Koushim](https://huggingface.co/Koushim)
	* Model trained using `transformers.Trainer` API

	## 📄 License

	Apache 2.0

	````