Alibaba-NLP
/

gme-Qwen2-VL-7B-Instruct

@@ -6937,19 +6937,23 @@ model-index:
   <img src="https://huggingface.co/Alibaba-NLP/gme-Qwen2-VL-2B-Instruct/raw/main/images/gme_logo.png" alt="GME Logo" style="width: 100%; max-width: 450px;">
 </p>
-<p align="center"><b>GME: General Multimodal Embeddings</b></p>
-## gme-Qwen2-VL-7B-Instruct
-We are excited to present `GME-Qwen2VL` models, our first generation **multimodal embedding models** for text and images,
-which are based on advanced [Qwen2-VL](https://huggingface.co/collections/Qwen/qwen2-vl-66cee7455501d7126940800d) multimodal large language models (MLLMs).
-The `GME-Qwen2VL` models support three input forms: **text**, **image**, and **image-text pair**, all of which can produce universal vector representations and have powerful retrieval performance.
-- **High Performance**: Achieves state-of-the-art (SOTA) results in our universal multimodal retrieval benchmark (**UMRB**) and strong **MTEB** evaluation scores.
 - **Dynamic Image Resolution**: Benefiting from `Qwen2-VL` and our training data, GME models support dynamic resolution image input.
-Our models are able to perform leadingly in the **visual document retrieval** task which requires fine-grained understanding of document screenshots.
-You can control to balance performance and efficiency.
 **Developed by**: Tongyi Lab, Alibaba Group
@@ -6959,11 +6963,11 @@ You can control to balance performance and efficiency.
 ## Model List
 | Models | Model Size | Max Seq. Length | Dimension | MTEB-en| UMRB |
 |:-----: | :-----: |:-----: |:-----: |:-----: | :-----: |
-|[`gme-Qwen2VL-2B`](https://huggingface.co/Alibaba-NLP/gme-Qwen2-VL-2B-Instruct) | 2.21B | 32768 | 1536 | - | 64.45 |
-|[`gme-Qwen2VL-7B`](https://huggingface.co/Alibaba-NLP/gme-Qwen2-VL-7B-Instruct) | 8.29B | 32768 | 3584 | - | 67.02 |
 ## Usage
 **Use with custom code**
 ```python
@@ -7027,13 +7031,12 @@ We validated the performance on our universal multimodal retrieval benchmark (**
 | One-Peace          | 4B   | 43.54        | 31.27     | 61.38       | 42.9        | 65.59     | 42.72       | 28.29      | 6.73       | 23.41       | 42.03      |
 | DSE                | 4.2B | 48.94        | 27.92     | 40.75       | 78.21       | 52.54     | 49.62       | 35.44      | 8.36       | 40.18       | 50.63      |
 | E5-V               | 8.4B | 52.41        | 27.36     | 46.56       | 41.22       | 47.95     | 54.13       | 32.9       | 23.17      | 7.23        | 42.48      |
-| **GME-Qwen2VL-2B** | 2.2B | 55.93 | 29.86	| 57.36	| 87.84	| **61.93** |	76.47	| 64.58	 | 37.02	| 66.47 | 64.45 |
-| **GME-Qwen2VL-7B** | 8.3B | **58.19** | 31.89 | **61.35** | **89.92** | 60.83 | **80.94** | **66.18** | **42.56** | **73.62** | **67.02** |
 The [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) English tab shows the text embeddings performence of our model.
-**More detailed experimental results can be found in the [paper](https://arxiv.org/pdf/2412.xxxxx)**.
 ## Limitations
@@ -7046,23 +7049,27 @@ We will extend to multi-image input, image-text interleaved data as well as mult
 ## Redistribution and Use
-We welcome and appreciate various applications of GME models and further improvements to the GME models themselves.
-Following Llama license,
-1. if you distribute or make available the GME models (or any derivative works thereof),
-or a product or service (including another AI model) that contains any of them,
-you shall prominently display “Built with GME” on a related website, user interface, blogpost, about page, or product documentation;
-2. if you use the GME models or any outputs or results of them to create, train, fine tune, or otherwise improve an AI model,
-which is distributed or made available, you shall also include “GME” at the beginning of any such AI model name.
 ## Cloud API Services
-In addition to the open-source [GME](https://huggingface.co/collections/Alibaba-NLP/gme-models) series models, GME series models are also available as commercial API services on Alibaba Cloud.
 - [MultiModal Embedding Models](https://help.aliyun.com/zh/model-studio/developer-reference/general-text-embedding/): The `multimodal-embedding-v1` model service is available.
 Note that the models behind the commercial APIs are not entirely identical to the open-source models.
 ## Citation
 If you find our paper or models helpful, please consider cite:

   <img src="https://huggingface.co/Alibaba-NLP/gme-Qwen2-VL-2B-Instruct/raw/main/images/gme_logo.png" alt="GME Logo" style="width: 100%; max-width: 450px;">
 </p>
+<p align="center"><b>GME: General Multimodal Embedding</b></p>
+## gme-Qwen2-VL-7B
+We are excited to present `GME-Qwen2VL` series of unified **multimodal embedding models**,
+which are based on the advanced [Qwen2-VL](https://huggingface.co/collections/Qwen/qwen2-vl-66cee7455501d7126940800d) multimodal large language models (MLLMs).
+The `GME` models support three types of input: **text**, **image**, and **image-text pair**, all of which can produce universal vector representations and have powerful retrieval performance.
+**Key Enhancements of GME Models**:
+- **Unified Multimodal Representation**: GME models can process both single-modal and combined-modal inputs, resulting in a unified vector representation. This enables versatile retrieval scenarios (Any2Any Search), supporting tasks such as text retrieval, image retrieval from text, and image-to-image searches.
+- **High Performance**: Achieves state-of-the-art (SOTA) results in our universal multimodal retrieval benchmark (**UMRB**) and demonstrate strong evaluation scores in the Multimodal Textual Evaluation Benchmark (**MTEB**).
 - **Dynamic Image Resolution**: Benefiting from `Qwen2-VL` and our training data, GME models support dynamic resolution image input.
+- **Strong Visual Retrieval Performance**: Enhanced by the Qwen2-VL model series, our models excel in visual document retrieval tasks that require a nuanced understanding of document screenshots.
+  This capability is particularly beneficial for complex document understanding scenarios,
+  such as multimodal retrieval-augmented generation (RAG) applications focused on academic papers.
 **Developed by**: Tongyi Lab, Alibaba Group
 ## Model List
 | Models | Model Size | Max Seq. Length | Dimension | MTEB-en| UMRB |
 |:-----: | :-----: |:-----: |:-----: |:-----: | :-----: |
+|[`gme-Qwen2-VL-2B`](https://huggingface.co/Alibaba-NLP/gme-Qwen2-VL-2B-Instruct) | 2.21B | 32768 | 1536 | - | 64.45 |
+|[`gme-Qwen2-VL-7B`](https://huggingface.co/Alibaba-NLP/gme-Qwen2-VL-7B-Instruct) | 8.29B | 32768 | 3584 | - | 67.44 |
 ## Usage
+```
 **Use with custom code**
 ```python
 | One-Peace          | 4B   | 43.54        | 31.27     | 61.38       | 42.9        | 65.59     | 42.72       | 28.29      | 6.73       | 23.41       | 42.03      |
 | DSE                | 4.2B | 48.94        | 27.92     | 40.75       | 78.21       | 52.54     | 49.62       | 35.44      | 8.36       | 40.18       | 50.63      |
 | E5-V               | 8.4B | 52.41        | 27.36     | 46.56       | 41.22       | 47.95     | 54.13       | 32.9       | 23.17      | 7.23        | 42.48      |
+| **[GME-Qwen2-VL-2B](https://huggingface.co/Alibaba-NLP/gme-Qwen2-VL-2B-Instruct)** | 2.2B | 55.93 | 29.86	| 57.36	| 87.84	| 61.93 |	76.47	| 64.58	 | 37.02	| 66.47 | 64.45 |
+| **[GME-Qwen2-VL-7B](https://huggingface.co/Alibaba-NLP/gme-Qwen2-VL-7B-Instruct)** | 8.3B | **58.19** | 31.89 | **61.35** | **89.92** | **65.83** | **80.94** | **66.18** | **42.56** | **73.62** | **67.44** |
 The [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) English tab shows the text embeddings performence of our model.
+**More detailed experimental results can be found in the [paper](https://arxiv.org/pdf/2407.19669)**.
 ## Limitations
 ## Redistribution and Use
+We encourage and value diverse applications of GME models and continuous enhancements to the models themselves.
+- If you distribute or make GME models (or any derivative works) available, or if you create a product or service (including another AI model) that incorporates them, you must prominently display `Built with GME` on your website, user interface, blog post, About page, or product documentation.
+- If you utilize GME models or their outputs to develop, train, fine-tune, or improve an AI model that is distributed or made available, you must prefix the name of any such AI model with `GME`.
 ## Cloud API Services
+In addition to the open-source [GME](https://huggingface.co/collections/Alibaba-NLP/gme-models-67667e092da3491f630964d6) series models, GME series models are also available as commercial API services on Alibaba Cloud.
 - [MultiModal Embedding Models](https://help.aliyun.com/zh/model-studio/developer-reference/general-text-embedding/): The `multimodal-embedding-v1` model service is available.
 Note that the models behind the commercial APIs are not entirely identical to the open-source models.
+## Hiring
+We have open positions for Research Interns and Full-Time Researchers to join our team at Tongyi Lab.
+We are seeking passionate individuals with expertise in representation learning, LLM-driven information retrieval, Retrieval-Augmented Generation (RAG), and agent-based systems.
+Our team is located in the vibrant cities of Beijing and Hangzhou, offering a collaborative and dynamic work environment where you can contribute to cutting-edge advancements in artificial intelligence and machine learning.
+If you are driven by curiosity and eager to make a meaningful impact through your work, we would love to hear from you. Please submit your resume along with a brief introduction to <a href="mailto:[email protected]">[email protected]</a>.
 ## Citation
 If you find our paper or models helpful, please consider cite: