YuxinJiang
/

unsup-promcse-bert-base-uncased

Transformers

PyTorch

bert

Model card Files Files and versions Community

YuxinJiang commited on Jan 16, 2023

Commit

c1fe62b

1 Parent(s): 1a404a7

Update README.md

Browse files

Files changed (1) hide show

README.md +73 -65

README.md CHANGED Viewed

@@ -1,30 +1,30 @@
----
-license: apache-2.0
----
 # PromCSE: Improved Universal Sentence Embeddings with Prompt-based Contrastive Learning and Energy-based Learning
 [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sick)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sick?p=deep-continuous-prompt-for-contrastive-1)
 [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts12)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts12?p=deep-continuous-prompt-for-contrastive-1)
 [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts13)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts13?p=deep-continuous-prompt-for-contrastive-1)
 [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts14)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts14?p=deep-continuous-prompt-for-contrastive-1)
 [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts16)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts16?p=deep-continuous-prompt-for-contrastive-1)
-[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts15)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts15?p=deep-continuous-prompt-for-contrastive-1)
-arXiv link: https://arxiv.org/abs/2203.06875v2
-To be published in [**EMNLP 2022**](https://2022.emnlp.org/)
-Our code is modified based on [SimCSE](https://github.com/princeton-nlp/SimCSE) and [P-tuning v2](https://github.com/THUDM/P-tuning-v2/). Here we would like to sincerely thank them for their excellent works.
-We release our best model checkpoint which acquires **Top 1** results on four STS tasks:
 <!-- <img src="https://github.com/YJiangcm/DCPCSE/blob/master/figure/leaderboard.png" width="700" height="380"> -->
 |          Model          | STS12 | STS13 | STS14 | STS15 | STS16 | STS-B | SICK-R | Avg. |
 |:-----------------------:|:-----:|:----------:|:---------:|:-----:|:-----:|:-----:|:-----:|:-----:|
-|  sup-PromCSE-RoBERTa-large ([huggingface](https://huggingface.co/YuxinJiang/unsup-promcse-bert-base-uncased))  |  79.14 |88.64| 83.73| 87.33 |84.57| 87.84| 82.07| 84.76|
-|  unsup-PromCSE-BERT-base ([huggingface](https://huggingface.co/YuxinJiang/sup-promcse-roberta-large))  |  73.03 |85.18| 76.70| 84.19 |79.69| 80.62| 70.00| 78.49|
 If you have any questions, feel free to raise an issue.
@@ -40,7 +40,65 @@ Run the following script to install the remaining dependencies,
 pip install -r requirements.txt
 ```
-## Training
 **Data**
@@ -114,57 +172,6 @@ All our experiments are conducted on Nvidia 3090 GPUs.
 | Valid steps | 125 | 125 | 125 | 125 |
-## Evaluation
-Our evaluation code for sentence embeddings is based on a modified version of [SentEval](https://github.com/facebookresearch/SentEval). It evaluates sentence embeddings on semantic textual similarity (STS) tasks and downstream transfer tasks. For STS tasks, our evaluation takes the "all" setting, and report Spearman's correlation. The STS tasks include seven standard STS tasks (STS12-16, STSB, SICK-R) and one domain-shifted STS task (CxC).
-Before evaluation, please download the evaluation datasets by running
-```bash
-cd SentEval/data/downstream/
-bash download_dataset.sh
-```
-To evaluate the domain shift robustness of sentence embedding, we need to download [CxC](https://drive.google.com/drive/folders/1ZnRlVlc4kFsKbaWj9cFbb8bQU0fxzz1c?usp=sharing), and put the data into *SentEval/data/downstream/CocoCXC*
-Then come back to the root directory, you can evaluate the well trained models using our evaluation code. For example,
-```bash
-python evaluation.py \
-    --model_name_or_path YuxinJiang/sup-promcse-roberta-large \
-    --pooler_type cls \
-    --task_set sts \
-    --mode test \
-    --pre_seq_len 10
-```
-which is expected to output the results in a tabular format:
-```
------- test ------
-+-------+-------+-------+-------+-------+--------------+-----------------+-------+
-| STS12 | STS13 | STS14 | STS15 | STS16 | STSBenchmark | SICKRelatedness |  Avg. |
-+-------+-------+-------+-------+-------+--------------+-----------------+-------+
-| 79.14 | 88.64 | 83.73 | 87.33 | 84.57 |    87.84     |      82.07      | 84.76 |
-+-------+-------+-------+-------+-------+--------------+-----------------+-------+
-```
-Arguments for the evaluation script are as follows,
-* `--model_name_or_path`: The name or path of a `transformers`-based pre-trained checkpoint.
-* `--pooler_type`: Pooling method. Now we support
-    * `cls` (default): Use the representation of `[CLS]` token. A linear+activation layer is applied after the representation (it's in the standard BERT implementation). If you use **supervised PromCSE**, you should use this option.
-    * `cls_before_pooler`: Use the representation of `[CLS]` token without the extra linear+activation. If you use **unsupervised PromCSE**, you should take this option.
-    * `avg`: Average embeddings of the last layer. If you use checkpoints of SBERT/SRoBERTa ([paper](https://arxiv.org/abs/1908.10084)), you should use this option.
-    * `avg_top2`: Average embeddings of the last two layers.
-    * `avg_first_last`: Average embeddings of the first and last layers. If you use vanilla BERT or RoBERTa, this works the best.
-* `--mode`: Evaluation mode
-    * `test` (default): The default test mode. To faithfully reproduce our results, you should use this option.
-    * `dev`: Report the development set results. Note that in STS tasks, only `STS-B` and `SICK-R` have development sets, so we only report their numbers. It also takes a fast mode for transfer tasks, so the running time is much shorter than the `test` mode (though numbers are slightly lower).
-    * `fasttest`: It is the same as `test`, but with a fast mode so the running time is much shorter, but the reported numbers may be lower (only for transfer tasks).
-* `--task_set`: What set of tasks to evaluate on (if set, it will override `--tasks`)
-    * `sts` (default): Evaluate on STS tasks, including `STS 12~16`, `STS-B` and `SICK-R`. This is the most commonly-used set of tasks to evaluate the quality of sentence embeddings.
-    * `cococxc`: Evaluate on domain-shifted CXC task.
-    * `transfer`: Evaluate on transfer tasks.
-    * `full`: Evaluate on both STS and transfer tasks.
-    * `na`: Manually set tasks by `--tasks`.
-* `--tasks`: Specify which dataset(s) to evaluate on. Will be overridden if `--task_set` is not `na`. See the code for a full list of tasks.
-* `--pre_seq_len`: The length of deep continuous prompt.
 ## Usage
 We provide *tool.py* to easily compute the cosine similarities between two groups of sentences as well as build index for a group of sentences and search among them. You can have a try by runing
 ```bash
@@ -238,6 +245,7 @@ Retrieval results for query: A woman is making a photo.
     An animal is biting a persons finger.  (cosine similarity: 0.6126)
 ```
 ## Citation
 Please cite our paper by:
@@ -251,4 +259,4 @@ Please cite our paper by:
       archivePrefix={arXiv},
       primaryClass={cs.CL}
 }
-```

 # PromCSE: Improved Universal Sentence Embeddings with Prompt-based Contrastive Learning and Energy-based Learning
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Ubtve9fqljSTbFH4dYZkXOxitrUl6Az3?usp=sharing)
+arXiv link: https://arxiv.org/abs/2203.06875v2
+To be published in [**EMNLP 2022**](https://2022.emnlp.org/)
+Our code is modified based on [SimCSE](https://github.com/princeton-nlp/SimCSE) and [P-tuning v2](https://github.com/THUDM/P-tuning-v2/). Here we would like to sincerely thank them for their excellent works.
+We have released our supervised and unsupervised models on huggingface, which acquire **Top 1** results on 4 standard STS tasks:
 [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sick)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sick?p=deep-continuous-prompt-for-contrastive-1)
 [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts12)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts12?p=deep-continuous-prompt-for-contrastive-1)
 [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts13)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts13?p=deep-continuous-prompt-for-contrastive-1)
 [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts14)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts14?p=deep-continuous-prompt-for-contrastive-1)
 [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts16)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts16?p=deep-continuous-prompt-for-contrastive-1)
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts15)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts15?p=deep-continuous-prompt-for-contrastive-1)
 <!-- <img src="https://github.com/YJiangcm/DCPCSE/blob/master/figure/leaderboard.png" width="700" height="380"> -->
 |          Model          | STS12 | STS13 | STS14 | STS15 | STS16 | STS-B | SICK-R | Avg. |
 |:-----------------------:|:-----:|:----------:|:---------:|:-----:|:-----:|:-----:|:-----:|:-----:|
+|  sup-PromCSE-RoBERTa-large ([huggingface](https://huggingface.co/YuxinJiang/sup-promcse-roberta-large))  |  79.14 |88.64| 83.73| 87.33 |84.57| 87.84| 82.07| 84.76|
+|  unsup-PromCSE-BERT-base ([huggingface](https://huggingface.co/YuxinJiang/unsup-promcse-bert-base-uncased))  |  73.03 |85.18| 76.70| 84.19 |79.69| 80.62| 70.00| 78.49|
 If you have any questions, feel free to raise an issue.
 pip install -r requirements.txt
 ```
+## Train PromCSE
+In the following section, we describe how to train a PromCSE model by using our code.
+### Evaluation
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Ubtve9fqljSTbFH4dYZkXOxitrUl6Az3?usp=sharing)
+Our evaluation code for sentence embeddings is based on a modified version of [SentEval](https://github.com/facebookresearch/SentEval). It evaluates sentence embeddings on semantic textual similarity (STS) tasks and downstream transfer tasks. For STS tasks, our evaluation takes the "all" setting, and report Spearman's correlation. The STS tasks include seven standard STS tasks (STS12-16, STSB, SICK-R) and one domain-shifted STS task (CxC).
+Before evaluation, please download the evaluation datasets by running
+```bash
+cd SentEval/data/downstream/
+bash download_dataset.sh
+```
+To evaluate the domain shift robustness of sentence embedding, we need to download [CxC](https://drive.google.com/drive/folders/1ZnRlVlc4kFsKbaWj9cFbb8bQU0fxzz1c?usp=sharing), and put the data into *SentEval/data/downstream/CocoCXC*
+Then come back to the root directory, you can evaluate the well trained models using our evaluation code. For example,
+```bash
+python evaluation.py \
+    --model_name_or_path YuxinJiang/sup-promcse-roberta-large \
+    --pooler_type cls \
+    --task_set sts \
+    --mode test \
+    --pre_seq_len 10
+```
+which is expected to output the results in a tabular format:
+```
+------ test ------
++-------+-------+-------+-------+-------+--------------+-----------------+-------+
+| STS12 | STS13 | STS14 | STS15 | STS16 | STSBenchmark | SICKRelatedness |  Avg. |
++-------+-------+-------+-------+-------+--------------+-----------------+-------+
+| 79.14 | 88.64 | 83.73 | 87.33 | 84.57 |    87.84     |      82.07      | 84.76 |
++-------+-------+-------+-------+-------+--------------+-----------------+-------+
+```
+Arguments for the evaluation script are as follows,
+* `--model_name_or_path`: The name or path of a `transformers`-based pre-trained checkpoint.
+* `--pooler_type`: Pooling method. Now we support
+    * `cls` (default): Use the representation of `[CLS]` token. A linear+activation layer is applied after the representation (it's in the standard BERT implementation). If you use **supervised PromCSE**, you should use this option.
+    * `cls_before_pooler`: Use the representation of `[CLS]` token without the extra linear+activation. If you use **unsupervised PromCSE**, you should take this option.
+    * `avg`: Average embeddings of the last layer. If you use checkpoints of SBERT/SRoBERTa ([paper](https://arxiv.org/abs/1908.10084)), you should use this option.
+    * `avg_top2`: Average embeddings of the last two layers.
+    * `avg_first_last`: Average embeddings of the first and last layers. If you use vanilla BERT or RoBERTa, this works the best.
+* `--mode`: Evaluation mode
+    * `test` (default): The default test mode. To faithfully reproduce our results, you should use this option.
+    * `dev`: Report the development set results. Note that in STS tasks, only `STS-B` and `SICK-R` have development sets, so we only report their numbers. It also takes a fast mode for transfer tasks, so the running time is much shorter than the `test` mode (though numbers are slightly lower).
+    * `fasttest`: It is the same as `test`, but with a fast mode so the running time is much shorter, but the reported numbers may be lower (only for transfer tasks).
+* `--task_set`: What set of tasks to evaluate on (if set, it will override `--tasks`)
+    * `sts` (default): Evaluate on STS tasks, including `STS 12~16`, `STS-B` and `SICK-R`. This is the most commonly-used set of tasks to evaluate the quality of sentence embeddings.
+    * `cococxc`: Evaluate on domain-shifted CXC task.
+    * `transfer`: Evaluate on transfer tasks.
+    * `full`: Evaluate on both STS and transfer tasks.
+    * `na`: Manually set tasks by `--tasks`.
+* `--tasks`: Specify which dataset(s) to evaluate on. Will be overridden if `--task_set` is not `na`. See the code for a full list of tasks.
+* `--pre_seq_len`: The length of deep continuous prompt.
+### Training
 **Data**
 | Valid steps | 125 | 125 | 125 | 125 |
 ## Usage
 We provide *tool.py* to easily compute the cosine similarities between two groups of sentences as well as build index for a group of sentences and search among them. You can have a try by runing
 ```bash
     An animal is biting a persons finger.  (cosine similarity: 0.6126)
 ```
 ## Citation
 Please cite our paper by:
       archivePrefix={arXiv},
       primaryClass={cs.CL}
 }
+```