metadata
tags:
- image-captioning
- deep-learning
- pytorch
- encoder-decoder
- vision
πΌοΈ Image Captioning Model
This is a deep learning-based image captioning model trained using a CNN Encoder + LSTM Decoder architecture. The model generates captions for input images based on visual features extracted by a Convolutional Neural Network (CNN).
π Model Details
- Model Type: Image Captioning
- Architecture: CNN Encoder + LSTM Decoder
- Framework: PyTorch
- Input: Image (
.jpg
,.png
, etc.) - Output: Generated caption (text)
- Vocabulary: Pre-trained vocabulary file
π How to Use
1οΈβ£ Install Dependencies
pip install torch torchvision transformers huggingface_hub pickle5