thenlper commited on
Commit
9eed5b3
·
verified ·
1 Parent(s): 7fc49f5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -23
README.md CHANGED
@@ -6937,19 +6937,23 @@ model-index:
6937
  <img src="https://huggingface.co/Alibaba-NLP/gme-Qwen2-VL-2B-Instruct/raw/main/images/gme_logo.png" alt="GME Logo" style="width: 100%; max-width: 450px;">
6938
  </p>
6939
 
6940
- <p align="center"><b>GME: General Multimodal Embeddings</b></p>
6941
 
6942
- ## gme-Qwen2-VL-7B-Instruct
6943
 
6944
- We are excited to present `GME-Qwen2VL` models, our first generation **multimodal embedding models** for text and images,
6945
- which are based on advanced [Qwen2-VL](https://huggingface.co/collections/Qwen/qwen2-vl-66cee7455501d7126940800d) multimodal large language models (MLLMs).
6946
 
6947
- The `GME-Qwen2VL` models support three input forms: **text**, **image**, and **image-text pair**, all of which can produce universal vector representations and have powerful retrieval performance.
6948
 
6949
- - **High Performance**: Achieves state-of-the-art (SOTA) results in our universal multimodal retrieval benchmark (**UMRB**) and strong **MTEB** evaluation scores.
 
 
 
6950
  - **Dynamic Image Resolution**: Benefiting from `Qwen2-VL` and our training data, GME models support dynamic resolution image input.
6951
- Our models are able to perform leadingly in the **visual document retrieval** task which requires fine-grained understanding of document screenshots.
6952
- You can control to balance performance and efficiency.
 
6953
 
6954
  **Developed by**: Tongyi Lab, Alibaba Group
6955
 
@@ -6959,11 +6963,11 @@ You can control to balance performance and efficiency.
6959
  ## Model List
6960
  | Models | Model Size | Max Seq. Length | Dimension | MTEB-en| UMRB |
6961
  |:-----: | :-----: |:-----: |:-----: |:-----: | :-----: |
6962
- |[`gme-Qwen2VL-2B`](https://huggingface.co/Alibaba-NLP/gme-Qwen2-VL-2B-Instruct) | 2.21B | 32768 | 1536 | - | 64.45 |
6963
- |[`gme-Qwen2VL-7B`](https://huggingface.co/Alibaba-NLP/gme-Qwen2-VL-7B-Instruct) | 8.29B | 32768 | 3584 | - | 67.02 |
6964
 
6965
  ## Usage
6966
-
6967
  **Use with custom code**
6968
 
6969
  ```python
@@ -7027,13 +7031,12 @@ We validated the performance on our universal multimodal retrieval benchmark (**
7027
  | One-Peace | 4B | 43.54 | 31.27 | 61.38 | 42.9 | 65.59 | 42.72 | 28.29 | 6.73 | 23.41 | 42.03 |
7028
  | DSE | 4.2B | 48.94 | 27.92 | 40.75 | 78.21 | 52.54 | 49.62 | 35.44 | 8.36 | 40.18 | 50.63 |
7029
  | E5-V | 8.4B | 52.41 | 27.36 | 46.56 | 41.22 | 47.95 | 54.13 | 32.9 | 23.17 | 7.23 | 42.48 |
7030
- | **GME-Qwen2VL-2B** | 2.2B | 55.93 | 29.86 | 57.36 | 87.84 | **61.93** | 76.47 | 64.58 | 37.02 | 66.47 | 64.45 |
7031
- | **GME-Qwen2VL-7B** | 8.3B | **58.19** | 31.89 | **61.35** | **89.92** | 60.83 | **80.94** | **66.18** | **42.56** | **73.62** | **67.02** |
7032
 
7033
  The [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) English tab shows the text embeddings performence of our model.
7034
 
7035
- **More detailed experimental results can be found in the [paper](https://arxiv.org/pdf/2412.xxxxx)**.
7036
-
7037
 
7038
  ## Limitations
7039
 
@@ -7046,23 +7049,27 @@ We will extend to multi-image input, image-text interleaved data as well as mult
7046
 
7047
  ## Redistribution and Use
7048
 
7049
- We welcome and appreciate various applications of GME models and further improvements to the GME models themselves.
7050
- Following Llama license,
7051
- 1. if you distribute or make available the GME models (or any derivative works thereof),
7052
- or a product or service (including another AI model) that contains any of them,
7053
- you shall prominently display “Built with GME” on a related website, user interface, blogpost, about page, or product documentation;
7054
- 2. if you use the GME models or any outputs or results of them to create, train, fine tune, or otherwise improve an AI model,
7055
- which is distributed or made available, you shall also include “GME” at the beginning of any such AI model name.
7056
 
 
 
 
7057
 
7058
  ## Cloud API Services
7059
 
7060
- In addition to the open-source [GME](https://huggingface.co/collections/Alibaba-NLP/gme-models) series models, GME series models are also available as commercial API services on Alibaba Cloud.
7061
 
7062
  - [MultiModal Embedding Models](https://help.aliyun.com/zh/model-studio/developer-reference/general-text-embedding/): The `multimodal-embedding-v1` model service is available.
7063
 
7064
  Note that the models behind the commercial APIs are not entirely identical to the open-source models.
7065
 
 
 
 
 
 
 
 
7066
 
7067
  ## Citation
7068
  If you find our paper or models helpful, please consider cite:
 
6937
  <img src="https://huggingface.co/Alibaba-NLP/gme-Qwen2-VL-2B-Instruct/raw/main/images/gme_logo.png" alt="GME Logo" style="width: 100%; max-width: 450px;">
6938
  </p>
6939
 
6940
+ <p align="center"><b>GME: General Multimodal Embedding</b></p>
6941
 
6942
+ ## gme-Qwen2-VL-7B
6943
 
6944
+ We are excited to present `GME-Qwen2VL` series of unified **multimodal embedding models**,
6945
+ which are based on the advanced [Qwen2-VL](https://huggingface.co/collections/Qwen/qwen2-vl-66cee7455501d7126940800d) multimodal large language models (MLLMs).
6946
 
6947
+ The `GME` models support three types of input: **text**, **image**, and **image-text pair**, all of which can produce universal vector representations and have powerful retrieval performance.
6948
 
6949
+ **Key Enhancements of GME Models**:
6950
+
6951
+ - **Unified Multimodal Representation**: GME models can process both single-modal and combined-modal inputs, resulting in a unified vector representation. This enables versatile retrieval scenarios (Any2Any Search), supporting tasks such as text retrieval, image retrieval from text, and image-to-image searches.
6952
+ - **High Performance**: Achieves state-of-the-art (SOTA) results in our universal multimodal retrieval benchmark (**UMRB**) and demonstrate strong evaluation scores in the Multimodal Textual Evaluation Benchmark (**MTEB**).
6953
  - **Dynamic Image Resolution**: Benefiting from `Qwen2-VL` and our training data, GME models support dynamic resolution image input.
6954
+ - **Strong Visual Retrieval Performance**: Enhanced by the Qwen2-VL model series, our models excel in visual document retrieval tasks that require a nuanced understanding of document screenshots.
6955
+ This capability is particularly beneficial for complex document understanding scenarios,
6956
+ such as multimodal retrieval-augmented generation (RAG) applications focused on academic papers.
6957
 
6958
  **Developed by**: Tongyi Lab, Alibaba Group
6959
 
 
6963
  ## Model List
6964
  | Models | Model Size | Max Seq. Length | Dimension | MTEB-en| UMRB |
6965
  |:-----: | :-----: |:-----: |:-----: |:-----: | :-----: |
6966
+ |[`gme-Qwen2-VL-2B`](https://huggingface.co/Alibaba-NLP/gme-Qwen2-VL-2B-Instruct) | 2.21B | 32768 | 1536 | - | 64.45 |
6967
+ |[`gme-Qwen2-VL-7B`](https://huggingface.co/Alibaba-NLP/gme-Qwen2-VL-7B-Instruct) | 8.29B | 32768 | 3584 | - | 67.44 |
6968
 
6969
  ## Usage
6970
+ ```
6971
  **Use with custom code**
6972
 
6973
  ```python
 
7031
  | One-Peace | 4B | 43.54 | 31.27 | 61.38 | 42.9 | 65.59 | 42.72 | 28.29 | 6.73 | 23.41 | 42.03 |
7032
  | DSE | 4.2B | 48.94 | 27.92 | 40.75 | 78.21 | 52.54 | 49.62 | 35.44 | 8.36 | 40.18 | 50.63 |
7033
  | E5-V | 8.4B | 52.41 | 27.36 | 46.56 | 41.22 | 47.95 | 54.13 | 32.9 | 23.17 | 7.23 | 42.48 |
7034
+ | **[GME-Qwen2-VL-2B](https://huggingface.co/Alibaba-NLP/gme-Qwen2-VL-2B-Instruct)** | 2.2B | 55.93 | 29.86 | 57.36 | 87.84 | 61.93 | 76.47 | 64.58 | 37.02 | 66.47 | 64.45 |
7035
+ | **[GME-Qwen2-VL-7B](https://huggingface.co/Alibaba-NLP/gme-Qwen2-VL-7B-Instruct)** | 8.3B | **58.19** | 31.89 | **61.35** | **89.92** | **65.83** | **80.94** | **66.18** | **42.56** | **73.62** | **67.44** |
7036
 
7037
  The [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) English tab shows the text embeddings performence of our model.
7038
 
7039
+ **More detailed experimental results can be found in the [paper](https://arxiv.org/pdf/2407.19669)**.
 
7040
 
7041
  ## Limitations
7042
 
 
7049
 
7050
  ## Redistribution and Use
7051
 
7052
+ We encourage and value diverse applications of GME models and continuous enhancements to the models themselves.
 
 
 
 
 
 
7053
 
7054
+ - If you distribute or make GME models (or any derivative works) available, or if you create a product or service (including another AI model) that incorporates them, you must prominently display `Built with GME` on your website, user interface, blog post, About page, or product documentation.
7055
+
7056
+ - If you utilize GME models or their outputs to develop, train, fine-tune, or improve an AI model that is distributed or made available, you must prefix the name of any such AI model with `GME`.
7057
 
7058
  ## Cloud API Services
7059
 
7060
+ In addition to the open-source [GME](https://huggingface.co/collections/Alibaba-NLP/gme-models-67667e092da3491f630964d6) series models, GME series models are also available as commercial API services on Alibaba Cloud.
7061
 
7062
  - [MultiModal Embedding Models](https://help.aliyun.com/zh/model-studio/developer-reference/general-text-embedding/): The `multimodal-embedding-v1` model service is available.
7063
 
7064
  Note that the models behind the commercial APIs are not entirely identical to the open-source models.
7065
 
7066
+ ## Hiring
7067
+
7068
+ We have open positions for Research Interns and Full-Time Researchers to join our team at Tongyi Lab.
7069
+ We are seeking passionate individuals with expertise in representation learning, LLM-driven information retrieval, Retrieval-Augmented Generation (RAG), and agent-based systems.
7070
+ Our team is located in the vibrant cities of Beijing and Hangzhou, offering a collaborative and dynamic work environment where you can contribute to cutting-edge advancements in artificial intelligence and machine learning.
7071
+ If you are driven by curiosity and eager to make a meaningful impact through your work, we would love to hear from you. Please submit your resume along with a brief introduction to <a href="mailto:[email protected]">[email protected]</a>.
7072
+
7073
 
7074
  ## Citation
7075
  If you find our paper or models helpful, please consider cite: