Update README.md
Browse files
README.md
CHANGED
@@ -6937,19 +6937,23 @@ model-index:
|
|
6937 |
<img src="https://huggingface.co/Alibaba-NLP/gme-Qwen2-VL-2B-Instruct/raw/main/images/gme_logo.png" alt="GME Logo" style="width: 100%; max-width: 450px;">
|
6938 |
</p>
|
6939 |
|
6940 |
-
<p align="center"><b>GME: General Multimodal
|
6941 |
|
6942 |
-
## gme-Qwen2-VL-7B
|
6943 |
|
6944 |
-
We are excited to present `GME-Qwen2VL`
|
6945 |
-
which are based on advanced [Qwen2-VL](https://huggingface.co/collections/Qwen/qwen2-vl-66cee7455501d7126940800d) multimodal large language models (MLLMs).
|
6946 |
|
6947 |
-
The `GME
|
6948 |
|
6949 |
-
|
|
|
|
|
|
|
6950 |
- **Dynamic Image Resolution**: Benefiting from `Qwen2-VL` and our training data, GME models support dynamic resolution image input.
|
6951 |
-
|
6952 |
-
|
|
|
6953 |
|
6954 |
**Developed by**: Tongyi Lab, Alibaba Group
|
6955 |
|
@@ -6959,11 +6963,11 @@ You can control to balance performance and efficiency.
|
|
6959 |
## Model List
|
6960 |
| Models | Model Size | Max Seq. Length | Dimension | MTEB-en| UMRB |
|
6961 |
|:-----: | :-----: |:-----: |:-----: |:-----: | :-----: |
|
6962 |
-
|[`gme-
|
6963 |
-
|[`gme-
|
6964 |
|
6965 |
## Usage
|
6966 |
-
|
6967 |
**Use with custom code**
|
6968 |
|
6969 |
```python
|
@@ -7027,13 +7031,12 @@ We validated the performance on our universal multimodal retrieval benchmark (**
|
|
7027 |
| One-Peace | 4B | 43.54 | 31.27 | 61.38 | 42.9 | 65.59 | 42.72 | 28.29 | 6.73 | 23.41 | 42.03 |
|
7028 |
| DSE | 4.2B | 48.94 | 27.92 | 40.75 | 78.21 | 52.54 | 49.62 | 35.44 | 8.36 | 40.18 | 50.63 |
|
7029 |
| E5-V | 8.4B | 52.41 | 27.36 | 46.56 | 41.22 | 47.95 | 54.13 | 32.9 | 23.17 | 7.23 | 42.48 |
|
7030 |
-
| **GME-
|
7031 |
-
| **GME-
|
7032 |
|
7033 |
The [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) English tab shows the text embeddings performence of our model.
|
7034 |
|
7035 |
-
**More detailed experimental results can be found in the [paper](https://arxiv.org/pdf/
|
7036 |
-
|
7037 |
|
7038 |
## Limitations
|
7039 |
|
@@ -7046,23 +7049,27 @@ We will extend to multi-image input, image-text interleaved data as well as mult
|
|
7046 |
|
7047 |
## Redistribution and Use
|
7048 |
|
7049 |
-
We
|
7050 |
-
Following Llama license,
|
7051 |
-
1. if you distribute or make available the GME models (or any derivative works thereof),
|
7052 |
-
or a product or service (including another AI model) that contains any of them,
|
7053 |
-
you shall prominently display “Built with GME” on a related website, user interface, blogpost, about page, or product documentation;
|
7054 |
-
2. if you use the GME models or any outputs or results of them to create, train, fine tune, or otherwise improve an AI model,
|
7055 |
-
which is distributed or made available, you shall also include “GME” at the beginning of any such AI model name.
|
7056 |
|
|
|
|
|
|
|
7057 |
|
7058 |
## Cloud API Services
|
7059 |
|
7060 |
-
In addition to the open-source [GME](https://huggingface.co/collections/Alibaba-NLP/gme-models) series models, GME series models are also available as commercial API services on Alibaba Cloud.
|
7061 |
|
7062 |
- [MultiModal Embedding Models](https://help.aliyun.com/zh/model-studio/developer-reference/general-text-embedding/): The `multimodal-embedding-v1` model service is available.
|
7063 |
|
7064 |
Note that the models behind the commercial APIs are not entirely identical to the open-source models.
|
7065 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7066 |
|
7067 |
## Citation
|
7068 |
If you find our paper or models helpful, please consider cite:
|
|
|
6937 |
<img src="https://huggingface.co/Alibaba-NLP/gme-Qwen2-VL-2B-Instruct/raw/main/images/gme_logo.png" alt="GME Logo" style="width: 100%; max-width: 450px;">
|
6938 |
</p>
|
6939 |
|
6940 |
+
<p align="center"><b>GME: General Multimodal Embedding</b></p>
|
6941 |
|
6942 |
+
## gme-Qwen2-VL-7B
|
6943 |
|
6944 |
+
We are excited to present `GME-Qwen2VL` series of unified **multimodal embedding models**,
|
6945 |
+
which are based on the advanced [Qwen2-VL](https://huggingface.co/collections/Qwen/qwen2-vl-66cee7455501d7126940800d) multimodal large language models (MLLMs).
|
6946 |
|
6947 |
+
The `GME` models support three types of input: **text**, **image**, and **image-text pair**, all of which can produce universal vector representations and have powerful retrieval performance.
|
6948 |
|
6949 |
+
**Key Enhancements of GME Models**:
|
6950 |
+
|
6951 |
+
- **Unified Multimodal Representation**: GME models can process both single-modal and combined-modal inputs, resulting in a unified vector representation. This enables versatile retrieval scenarios (Any2Any Search), supporting tasks such as text retrieval, image retrieval from text, and image-to-image searches.
|
6952 |
+
- **High Performance**: Achieves state-of-the-art (SOTA) results in our universal multimodal retrieval benchmark (**UMRB**) and demonstrate strong evaluation scores in the Multimodal Textual Evaluation Benchmark (**MTEB**).
|
6953 |
- **Dynamic Image Resolution**: Benefiting from `Qwen2-VL` and our training data, GME models support dynamic resolution image input.
|
6954 |
+
- **Strong Visual Retrieval Performance**: Enhanced by the Qwen2-VL model series, our models excel in visual document retrieval tasks that require a nuanced understanding of document screenshots.
|
6955 |
+
This capability is particularly beneficial for complex document understanding scenarios,
|
6956 |
+
such as multimodal retrieval-augmented generation (RAG) applications focused on academic papers.
|
6957 |
|
6958 |
**Developed by**: Tongyi Lab, Alibaba Group
|
6959 |
|
|
|
6963 |
## Model List
|
6964 |
| Models | Model Size | Max Seq. Length | Dimension | MTEB-en| UMRB |
|
6965 |
|:-----: | :-----: |:-----: |:-----: |:-----: | :-----: |
|
6966 |
+
|[`gme-Qwen2-VL-2B`](https://huggingface.co/Alibaba-NLP/gme-Qwen2-VL-2B-Instruct) | 2.21B | 32768 | 1536 | - | 64.45 |
|
6967 |
+
|[`gme-Qwen2-VL-7B`](https://huggingface.co/Alibaba-NLP/gme-Qwen2-VL-7B-Instruct) | 8.29B | 32768 | 3584 | - | 67.44 |
|
6968 |
|
6969 |
## Usage
|
6970 |
+
```
|
6971 |
**Use with custom code**
|
6972 |
|
6973 |
```python
|
|
|
7031 |
| One-Peace | 4B | 43.54 | 31.27 | 61.38 | 42.9 | 65.59 | 42.72 | 28.29 | 6.73 | 23.41 | 42.03 |
|
7032 |
| DSE | 4.2B | 48.94 | 27.92 | 40.75 | 78.21 | 52.54 | 49.62 | 35.44 | 8.36 | 40.18 | 50.63 |
|
7033 |
| E5-V | 8.4B | 52.41 | 27.36 | 46.56 | 41.22 | 47.95 | 54.13 | 32.9 | 23.17 | 7.23 | 42.48 |
|
7034 |
+
| **[GME-Qwen2-VL-2B](https://huggingface.co/Alibaba-NLP/gme-Qwen2-VL-2B-Instruct)** | 2.2B | 55.93 | 29.86 | 57.36 | 87.84 | 61.93 | 76.47 | 64.58 | 37.02 | 66.47 | 64.45 |
|
7035 |
+
| **[GME-Qwen2-VL-7B](https://huggingface.co/Alibaba-NLP/gme-Qwen2-VL-7B-Instruct)** | 8.3B | **58.19** | 31.89 | **61.35** | **89.92** | **65.83** | **80.94** | **66.18** | **42.56** | **73.62** | **67.44** |
|
7036 |
|
7037 |
The [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) English tab shows the text embeddings performence of our model.
|
7038 |
|
7039 |
+
**More detailed experimental results can be found in the [paper](https://arxiv.org/pdf/2407.19669)**.
|
|
|
7040 |
|
7041 |
## Limitations
|
7042 |
|
|
|
7049 |
|
7050 |
## Redistribution and Use
|
7051 |
|
7052 |
+
We encourage and value diverse applications of GME models and continuous enhancements to the models themselves.
|
|
|
|
|
|
|
|
|
|
|
|
|
7053 |
|
7054 |
+
- If you distribute or make GME models (or any derivative works) available, or if you create a product or service (including another AI model) that incorporates them, you must prominently display `Built with GME` on your website, user interface, blog post, About page, or product documentation.
|
7055 |
+
|
7056 |
+
- If you utilize GME models or their outputs to develop, train, fine-tune, or improve an AI model that is distributed or made available, you must prefix the name of any such AI model with `GME`.
|
7057 |
|
7058 |
## Cloud API Services
|
7059 |
|
7060 |
+
In addition to the open-source [GME](https://huggingface.co/collections/Alibaba-NLP/gme-models-67667e092da3491f630964d6) series models, GME series models are also available as commercial API services on Alibaba Cloud.
|
7061 |
|
7062 |
- [MultiModal Embedding Models](https://help.aliyun.com/zh/model-studio/developer-reference/general-text-embedding/): The `multimodal-embedding-v1` model service is available.
|
7063 |
|
7064 |
Note that the models behind the commercial APIs are not entirely identical to the open-source models.
|
7065 |
|
7066 |
+
## Hiring
|
7067 |
+
|
7068 |
+
We have open positions for Research Interns and Full-Time Researchers to join our team at Tongyi Lab.
|
7069 |
+
We are seeking passionate individuals with expertise in representation learning, LLM-driven information retrieval, Retrieval-Augmented Generation (RAG), and agent-based systems.
|
7070 |
+
Our team is located in the vibrant cities of Beijing and Hangzhou, offering a collaborative and dynamic work environment where you can contribute to cutting-edge advancements in artificial intelligence and machine learning.
|
7071 |
+
If you are driven by curiosity and eager to make a meaningful impact through your work, we would love to hear from you. Please submit your resume along with a brief introduction to <a href="mailto:[email protected]">[email protected]</a>.
|
7072 |
+
|
7073 |
|
7074 |
## Citation
|
7075 |
If you find our paper or models helpful, please consider cite:
|