Koushim's picture
Update README.md
70ef163 verified
metadata
language: en
datasets: yahoo_answers_topics
tags:
  - text-classification
  - topic-classification
  - yahoo-answers
  - distilbert
  - transformers
  - pytorch
license: apache-2.0
model-index:
  - name: DistilBERT Yahoo Answers Classifier
    results:
      - task:
          name: Topic Classification
          type: text-classification
        dataset:
          name: Yahoo Answers Topics
          type: yahoo_answers_topics
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.71

DistilBERT Fine-Tuned on Yahoo Answers Topics

This is a fine-tuned DistilBERT model for topic classification on the Yahoo Answers Topics dataset. It classifies questions into one of 10 predefined categories like "Science & Mathematics", "Health", "Business & Finance", etc.

🧠 Model Details

  • Base model: distilbert-base-uncased
  • Task: Multi-class Text Classification (10 classes)
  • Dataset: Yahoo Answers Topics
  • Training samples: 50,000 (subset)
  • Evaluation samples: 5,000 (subset)
  • Metrics: Accuracy

πŸ§ͺ How to Use

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("Koushim/distilbert-yahoo-answers")
model = AutoModelForSequenceClassification.from_pretrained("Koushim/distilbert-yahoo-answers")

text = "How do I improve my math skills for competitive exams?"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)

predicted_class = outputs.logits.argmax(dim=1).item()
print("Predicted class:", predicted_class)

πŸ“Š Classes (Labels)

  1. Society & Culture
  2. Science & Mathematics
  3. Health
  4. Education & Reference
  5. Computers & Internet
  6. Sports
  7. Business & Finance
  8. Entertainment & Music
  9. Family & Relationships
  10. Politics & Government

πŸ“¦ Training Details

  • Optimizer: AdamW
  • Learning rate: 2e-5
  • Batch size: 16 (train), 32 (eval)
  • Epochs: 3
  • Weight decay: 0.01
  • Framework: PyTorch + πŸ€— Transformers

πŸ“ Repository Structure

  • config.json – Model config
  • pytorch_model.bin – Trained model weights
  • tokenizer.json, vocab.txt – Tokenizer files

✍️ Author

  • Hugging Face Hub: Koushim
  • Model trained using transformers.Trainer API

πŸ“„ License

Apache 2.0