saurabhati's picture
Update README.md
ea5cde8 verified
|
raw
history blame contribute delete
1.85 kB
---
license: mit
pipeline_tag: image-classification
library_name: transformers
tags:
- PyTorch
- Mamba
- SSM
---
# VMamba: Visual State Space Model
VMamba is a bidirectional state-space model finetuned on Imagenet dataset. It was introduced in the paper:
[VMamba: Visual State Space Model](https://arxiv.org/pdf/2401.10166) and was first released in [this repo](https://github.com/MzeroMiko/VMamba/tree/main).
Disclaimer: This is not the official implementation, please refer to the [official repo](https://github.com/MzeroMiko/VMamba/tree/main).
## How to Get Started with the Model
Use the code below to get started with the model.
```python
import torch
from PIL import Image
import torchvision.transforms as T
from transformers import AutoConfig, AutoModelForImageClassification
config = AutoConfig.from_pretrained('saurabhati/VMamba_ImageNet_82.6',trust_remote_code=True)
vmamba_model = AutoModelForImageClassification.from_pretrained('saurabhati/VMamba_ImageNet_82.6',trust_remote_code=True)
preprocess = T.Compose([
T.Resize(224, interpolation=Image.BICUBIC),
T.CenterCrop(224),
T.ToTensor(),
T.Normalize(
mean=[0.4850, 0.4560, 0.4060],
std=[0.2290, 0.2240, 0.2250]
)])
input_image = Image.open('/data/sls/scratch/sbhati/data/Imagenet/train/n02009912/n02009912_16160.JPEG')
input_image = preprocess(input_image)
with torch.no_grad():
logits = vmamba_model(input_image.unsqueeze(0)).logits
predicted_label = vmamba_model.config.id2label[logits.argmax().item()]
predicted_label
'crane'
```
## Citation
```bibtex
@article{liu2024vmamba,
title={VMamba: Visual State Space Model},
author={Liu, Yue and Tian, Yunjie and Zhao, Yuzhong and Yu, Hongtian and Xie, Lingxi and Wang, Yaowei and Ye, Qixiang and Liu, Yunfan},
journal={arXiv preprint arXiv:2401.10166},
year={2024}
}
```