czczup commited on
Commit
09b3804
Β·
verified Β·
1 Parent(s): 6bb25f3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -23
README.md CHANGED
@@ -10,7 +10,7 @@ datasets:
10
  pipeline_tag: image-feature-extraction
11
  ---
12
 
13
- InternViT-300M-448px
14
 
15
  [\[πŸ†• Blog\]](https://internvl.github.io/blog/) [\[πŸ“œ InternVL 1.0 Paper\]](https://arxiv.org/abs/2312.14238) [\[πŸ“œ InternVL 1.5 Report\]](https://arxiv.org/abs/2404.16821) [\[πŸ—¨οΈ Chat Demo\]](https://internvl.opengvlab.com/)
16
 
@@ -26,24 +26,6 @@ This update primarily focuses on enhancing the efficiency of the vision foundati
26
  - **Pretrain Dataset:** LAION-en, LAION-zh, COYO, GRIT, COCO, TextCaps, Objects365, OpenImages, All-Seeing, Wukong-OCR, LaionCOCO-OCR, and other OCR-related datasets.
27
  To enhance the OCR capability of the model, we have incorporated additional OCR data alongside the general caption datasets. Specifically, we utilized PaddleOCR to perform Chinese OCR on images from Wukong and English OCR on images from LAION-COCO.
28
 
29
- ## Released Models
30
- ### Vision Foundation model
31
- | Model | Date | Download | Note |
32
- | ----------------------- | ---------- | ---------------------------------------------------------------------- | -------------------------------- |
33
- | InternViT-6B-448px-V1-5 | 2024.04.20 | πŸ€— [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5) | support dynamic resolution, super strong OCR (πŸ”₯new) |
34
- | InternViT-6B-448px-V1-2 | 2024.02.11 | πŸ€— [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2) | 448 resolution |
35
- | InternViT-6B-448px-V1-0 | 2024.01.30 | πŸ€— [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-0) | 448 resolution |
36
- | InternViT-6B-224px | 2023.12.22 | πŸ€— [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-224px) | vision foundation model |
37
- | InternVL-14B-224px | 2023.12.22 | πŸ€— [HF link](https://huggingface.co/OpenGVLab/InternVL-14B-224px) | vision-language foundation model |
38
-
39
- ### Multimodal Large Language Model (MLLM)
40
- | Model | Date | Download | Note |
41
- | ----------------------- | ---------- | --------------------------------------------------------------------------- | ---------------------------------- |
42
- | InternVL-Chat-V1-5 | 2024.04.18 | πŸ€— [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5) | support 4K image; super strong OCR; Approaching the performance of GPT-4V and Gemini Pro on various benchmarks like MMMU, DocVQA, ChartQA, MathVista, etc. (πŸ”₯new)|
43
- | InternVL-Chat-V1-2-Plus | 2024.02.21 | πŸ€— [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2-Plus) | more SFT data and stronger |
44
- | InternVL-Chat-V1-2 | 2024.02.11 | πŸ€— [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2) | scaling up LLM to 34B |
45
- | InternVL-Chat-V1-1 | 2024.01.24 | πŸ€— [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-1) | support Chinese and stronger OCR |
46
-
47
  ## Model Usage (Image Embeddings)
48
 
49
  ```python
@@ -86,7 +68,3 @@ If you find this project useful in your research, please consider citing:
86
  }
87
 
88
  ```
89
-
90
- ## Acknowledgement
91
-
92
- InternVL is built with reference to the code of the following projects: [OpenAI CLIP](https://github.com/openai/CLIP), [Open CLIP](https://github.com/mlfoundations/open_clip), [CLIP Benchmark](https://github.com/LAION-AI/CLIP_benchmark), [EVA](https://github.com/baaivision/EVA/tree/master), [InternImage](https://github.com/OpenGVLab/InternImage), [ViT-Adapter](https://github.com/czczup/ViT-Adapter), [MMSegmentation](https://github.com/open-mmlab/mmsegmentation), [Transformers](https://github.com/huggingface/transformers), [DINOv2](https://github.com/facebookresearch/dinov2), [BLIP-2](https://github.com/salesforce/LAVIS/tree/main/projects/blip2), [Qwen-VL](https://github.com/QwenLM/Qwen-VL/tree/master/eval_mm), and [LLaVA-1.5](https://github.com/haotian-liu/LLaVA). Thanks for their awesome work!
 
10
  pipeline_tag: image-feature-extraction
11
  ---
12
 
13
+ # InternViT-300M-448px
14
 
15
  [\[πŸ†• Blog\]](https://internvl.github.io/blog/) [\[πŸ“œ InternVL 1.0 Paper\]](https://arxiv.org/abs/2312.14238) [\[πŸ“œ InternVL 1.5 Report\]](https://arxiv.org/abs/2404.16821) [\[πŸ—¨οΈ Chat Demo\]](https://internvl.opengvlab.com/)
16
 
 
26
  - **Pretrain Dataset:** LAION-en, LAION-zh, COYO, GRIT, COCO, TextCaps, Objects365, OpenImages, All-Seeing, Wukong-OCR, LaionCOCO-OCR, and other OCR-related datasets.
27
  To enhance the OCR capability of the model, we have incorporated additional OCR data alongside the general caption datasets. Specifically, we utilized PaddleOCR to perform Chinese OCR on images from Wukong and English OCR on images from LAION-COCO.
28
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
  ## Model Usage (Image Embeddings)
30
 
31
  ```python
 
68
  }
69
 
70
  ```