metadata

tags:
  - image-captioning
  - deep-learning
  - pytorch
  - encoder-decoder
  - vision

🖼️ Image Captioning Model

This is a deep learning-based image captioning model trained using a CNN Encoder + LSTM Decoder architecture. The model generates captions for input images based on visual features extracted by a Convolutional Neural Network (CNN).

📌 Model Details

Model Type: Image Captioning
Architecture: CNN Encoder + LSTM Decoder
Framework: PyTorch
Input: Image (.jpg, .png, etc.)
Output: Generated caption (text)
Vocabulary: Pre-trained vocabulary file

🚀 How to Use

1️⃣ Install Dependencies

pip install torch torchvision transformers huggingface_hub pickle5