|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
--- |
|
|
|
|
|
|
|
<br> |
|
<br> |
|
|
|
# SOLO Model Card |
|
|
|
## Model details |
|
|
|
**Model type:** |
|
SOLO is a 7B large vision-language model with a single Transformer architecture for unified vision-language modeling. |
|
SOLO accepts both raw image patches (in pixels) and texts as inputs, without using a separate pre-trained vision encoder. |
|
|
|
|
|
**Model date:** |
|
SOLO-7B was trained in June 2024. |
|
|
|
**Paper or resources for more information:** |
|
[Paper](https://arxiv.org/abs/2407.06438) |
|
& |
|
[Github](https://github.com/Yangyi-Chen/SOLO) |
|
|
|
|
|
|
|
**Where to send questions or comments about the model:** |
|
https://github.com/Yangyi-Chen/SOLO/issues |
|
|
|
|
|
**Inference with Huggingface** |
|
Please check this [scripts](https://github.com/Yangyi-Chen/SOLO/blob/main/scripts/notebook/demo.ipynb) for an example of performing inference on the model. |
|
|
|
|