Transformers
PyTorch
bert
YuxinJiang commited on
Commit
c1fe62b
·
1 Parent(s): 1a404a7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +73 -65
README.md CHANGED
@@ -1,30 +1,30 @@
1
- ---
2
- license: apache-2.0
3
- ---
4
  # PromCSE: Improved Universal Sentence Embeddings with Prompt-based Contrastive Learning and Energy-based Learning
 
 
 
 
 
 
 
5
 
6
  [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sick)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sick?p=deep-continuous-prompt-for-contrastive-1)
 
7
  [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts12)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts12?p=deep-continuous-prompt-for-contrastive-1)
8
 
9
  [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts13)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts13?p=deep-continuous-prompt-for-contrastive-1)
 
10
  [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts14)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts14?p=deep-continuous-prompt-for-contrastive-1)
11
 
12
  [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts16)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts16?p=deep-continuous-prompt-for-contrastive-1)
13
- [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts15)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts15?p=deep-continuous-prompt-for-contrastive-1)
14
 
15
- arXiv link: https://arxiv.org/abs/2203.06875v2
16
- To be published in [**EMNLP 2022**](https://2022.emnlp.org/)
17
-
18
- Our code is modified based on [SimCSE](https://github.com/princeton-nlp/SimCSE) and [P-tuning v2](https://github.com/THUDM/P-tuning-v2/). Here we would like to sincerely thank them for their excellent works.
19
-
20
- We release our best model checkpoint which acquires **Top 1** results on four STS tasks:
21
 
22
  <!-- <img src="https://github.com/YJiangcm/DCPCSE/blob/master/figure/leaderboard.png" width="700" height="380"> -->
23
 
24
  | Model | STS12 | STS13 | STS14 | STS15 | STS16 | STS-B | SICK-R | Avg. |
25
  |:-----------------------:|:-----:|:----------:|:---------:|:-----:|:-----:|:-----:|:-----:|:-----:|
26
- | sup-PromCSE-RoBERTa-large ([huggingface](https://huggingface.co/YuxinJiang/unsup-promcse-bert-base-uncased)) | 79.14 |88.64| 83.73| 87.33 |84.57| 87.84| 82.07| 84.76|
27
- | unsup-PromCSE-BERT-base ([huggingface](https://huggingface.co/YuxinJiang/sup-promcse-roberta-large)) | 73.03 |85.18| 76.70| 84.19 |79.69| 80.62| 70.00| 78.49|
28
 
29
  If you have any questions, feel free to raise an issue.
30
 
@@ -40,7 +40,65 @@ Run the following script to install the remaining dependencies,
40
  pip install -r requirements.txt
41
  ```
42
 
43
- ## Training
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
 
45
  **Data**
46
 
@@ -114,57 +172,6 @@ All our experiments are conducted on Nvidia 3090 GPUs.
114
  | Valid steps | 125 | 125 | 125 | 125 |
115
 
116
 
117
- ## Evaluation
118
- Our evaluation code for sentence embeddings is based on a modified version of [SentEval](https://github.com/facebookresearch/SentEval). It evaluates sentence embeddings on semantic textual similarity (STS) tasks and downstream transfer tasks. For STS tasks, our evaluation takes the "all" setting, and report Spearman's correlation. The STS tasks include seven standard STS tasks (STS12-16, STSB, SICK-R) and one domain-shifted STS task (CxC).
119
-
120
- Before evaluation, please download the evaluation datasets by running
121
- ```bash
122
- cd SentEval/data/downstream/
123
- bash download_dataset.sh
124
- ```
125
- To evaluate the domain shift robustness of sentence embedding, we need to download [CxC](https://drive.google.com/drive/folders/1ZnRlVlc4kFsKbaWj9cFbb8bQU0fxzz1c?usp=sharing), and put the data into *SentEval/data/downstream/CocoCXC*
126
-
127
- Then come back to the root directory, you can evaluate the well trained models using our evaluation code. For example,
128
- ```bash
129
- python evaluation.py \
130
- --model_name_or_path YuxinJiang/sup-promcse-roberta-large \
131
- --pooler_type cls \
132
- --task_set sts \
133
- --mode test \
134
- --pre_seq_len 10
135
- ```
136
- which is expected to output the results in a tabular format:
137
- ```
138
- ------ test ------
139
- +-------+-------+-------+-------+-------+--------------+-----------------+-------+
140
- | STS12 | STS13 | STS14 | STS15 | STS16 | STSBenchmark | SICKRelatedness | Avg. |
141
- +-------+-------+-------+-------+-------+--------------+-----------------+-------+
142
- | 79.14 | 88.64 | 83.73 | 87.33 | 84.57 | 87.84 | 82.07 | 84.76 |
143
- +-------+-------+-------+-------+-------+--------------+-----------------+-------+
144
- ```
145
-
146
- Arguments for the evaluation script are as follows,
147
-
148
- * `--model_name_or_path`: The name or path of a `transformers`-based pre-trained checkpoint.
149
- * `--pooler_type`: Pooling method. Now we support
150
- * `cls` (default): Use the representation of `[CLS]` token. A linear+activation layer is applied after the representation (it's in the standard BERT implementation). If you use **supervised PromCSE**, you should use this option.
151
- * `cls_before_pooler`: Use the representation of `[CLS]` token without the extra linear+activation. If you use **unsupervised PromCSE**, you should take this option.
152
- * `avg`: Average embeddings of the last layer. If you use checkpoints of SBERT/SRoBERTa ([paper](https://arxiv.org/abs/1908.10084)), you should use this option.
153
- * `avg_top2`: Average embeddings of the last two layers.
154
- * `avg_first_last`: Average embeddings of the first and last layers. If you use vanilla BERT or RoBERTa, this works the best.
155
- * `--mode`: Evaluation mode
156
- * `test` (default): The default test mode. To faithfully reproduce our results, you should use this option.
157
- * `dev`: Report the development set results. Note that in STS tasks, only `STS-B` and `SICK-R` have development sets, so we only report their numbers. It also takes a fast mode for transfer tasks, so the running time is much shorter than the `test` mode (though numbers are slightly lower).
158
- * `fasttest`: It is the same as `test`, but with a fast mode so the running time is much shorter, but the reported numbers may be lower (only for transfer tasks).
159
- * `--task_set`: What set of tasks to evaluate on (if set, it will override `--tasks`)
160
- * `sts` (default): Evaluate on STS tasks, including `STS 12~16`, `STS-B` and `SICK-R`. This is the most commonly-used set of tasks to evaluate the quality of sentence embeddings.
161
- * `cococxc`: Evaluate on domain-shifted CXC task.
162
- * `transfer`: Evaluate on transfer tasks.
163
- * `full`: Evaluate on both STS and transfer tasks.
164
- * `na`: Manually set tasks by `--tasks`.
165
- * `--tasks`: Specify which dataset(s) to evaluate on. Will be overridden if `--task_set` is not `na`. See the code for a full list of tasks.
166
- * `--pre_seq_len`: The length of deep continuous prompt.
167
-
168
  ## Usage
169
  We provide *tool.py* to easily compute the cosine similarities between two groups of sentences as well as build index for a group of sentences and search among them. You can have a try by runing
170
  ```bash
@@ -238,6 +245,7 @@ Retrieval results for query: A woman is making a photo.
238
  An animal is biting a persons finger. (cosine similarity: 0.6126)
239
  ```
240
 
 
241
  ## Citation
242
 
243
  Please cite our paper by:
@@ -251,4 +259,4 @@ Please cite our paper by:
251
  archivePrefix={arXiv},
252
  primaryClass={cs.CL}
253
  }
254
- ```
 
 
 
 
1
  # PromCSE: Improved Universal Sentence Embeddings with Prompt-based Contrastive Learning and Energy-based Learning
2
+ [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Ubtve9fqljSTbFH4dYZkXOxitrUl6Az3?usp=sharing)
3
+ arXiv link: https://arxiv.org/abs/2203.06875v2
4
+ To be published in [**EMNLP 2022**](https://2022.emnlp.org/)
5
+
6
+ Our code is modified based on [SimCSE](https://github.com/princeton-nlp/SimCSE) and [P-tuning v2](https://github.com/THUDM/P-tuning-v2/). Here we would like to sincerely thank them for their excellent works.
7
+
8
+ We have released our supervised and unsupervised models on huggingface, which acquire **Top 1** results on 4 standard STS tasks:
9
 
10
  [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sick)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sick?p=deep-continuous-prompt-for-contrastive-1)
11
+
12
  [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts12)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts12?p=deep-continuous-prompt-for-contrastive-1)
13
 
14
  [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts13)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts13?p=deep-continuous-prompt-for-contrastive-1)
15
+
16
  [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts14)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts14?p=deep-continuous-prompt-for-contrastive-1)
17
 
18
  [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts16)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts16?p=deep-continuous-prompt-for-contrastive-1)
 
19
 
20
+ [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts15)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts15?p=deep-continuous-prompt-for-contrastive-1)
 
 
 
 
 
21
 
22
  <!-- <img src="https://github.com/YJiangcm/DCPCSE/blob/master/figure/leaderboard.png" width="700" height="380"> -->
23
 
24
  | Model | STS12 | STS13 | STS14 | STS15 | STS16 | STS-B | SICK-R | Avg. |
25
  |:-----------------------:|:-----:|:----------:|:---------:|:-----:|:-----:|:-----:|:-----:|:-----:|
26
+ | sup-PromCSE-RoBERTa-large ([huggingface](https://huggingface.co/YuxinJiang/sup-promcse-roberta-large)) | 79.14 |88.64| 83.73| 87.33 |84.57| 87.84| 82.07| 84.76|
27
+ | unsup-PromCSE-BERT-base ([huggingface](https://huggingface.co/YuxinJiang/unsup-promcse-bert-base-uncased)) | 73.03 |85.18| 76.70| 84.19 |79.69| 80.62| 70.00| 78.49|
28
 
29
  If you have any questions, feel free to raise an issue.
30
 
 
40
  pip install -r requirements.txt
41
  ```
42
 
43
+ ## Train PromCSE
44
+
45
+ In the following section, we describe how to train a PromCSE model by using our code.
46
+
47
+
48
+ ### Evaluation
49
+ [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Ubtve9fqljSTbFH4dYZkXOxitrUl6Az3?usp=sharing)
50
+
51
+ Our evaluation code for sentence embeddings is based on a modified version of [SentEval](https://github.com/facebookresearch/SentEval). It evaluates sentence embeddings on semantic textual similarity (STS) tasks and downstream transfer tasks. For STS tasks, our evaluation takes the "all" setting, and report Spearman's correlation. The STS tasks include seven standard STS tasks (STS12-16, STSB, SICK-R) and one domain-shifted STS task (CxC).
52
+
53
+ Before evaluation, please download the evaluation datasets by running
54
+ ```bash
55
+ cd SentEval/data/downstream/
56
+ bash download_dataset.sh
57
+ ```
58
+ To evaluate the domain shift robustness of sentence embedding, we need to download [CxC](https://drive.google.com/drive/folders/1ZnRlVlc4kFsKbaWj9cFbb8bQU0fxzz1c?usp=sharing), and put the data into *SentEval/data/downstream/CocoCXC*
59
+
60
+ Then come back to the root directory, you can evaluate the well trained models using our evaluation code. For example,
61
+ ```bash
62
+ python evaluation.py \
63
+ --model_name_or_path YuxinJiang/sup-promcse-roberta-large \
64
+ --pooler_type cls \
65
+ --task_set sts \
66
+ --mode test \
67
+ --pre_seq_len 10
68
+ ```
69
+ which is expected to output the results in a tabular format:
70
+ ```
71
+ ------ test ------
72
+ +-------+-------+-------+-------+-------+--------------+-----------------+-------+
73
+ | STS12 | STS13 | STS14 | STS15 | STS16 | STSBenchmark | SICKRelatedness | Avg. |
74
+ +-------+-------+-------+-------+-------+--------------+-----------------+-------+
75
+ | 79.14 | 88.64 | 83.73 | 87.33 | 84.57 | 87.84 | 82.07 | 84.76 |
76
+ +-------+-------+-------+-------+-------+--------------+-----------------+-------+
77
+ ```
78
+ Arguments for the evaluation script are as follows,
79
+
80
+ * `--model_name_or_path`: The name or path of a `transformers`-based pre-trained checkpoint.
81
+ * `--pooler_type`: Pooling method. Now we support
82
+ * `cls` (default): Use the representation of `[CLS]` token. A linear+activation layer is applied after the representation (it's in the standard BERT implementation). If you use **supervised PromCSE**, you should use this option.
83
+ * `cls_before_pooler`: Use the representation of `[CLS]` token without the extra linear+activation. If you use **unsupervised PromCSE**, you should take this option.
84
+ * `avg`: Average embeddings of the last layer. If you use checkpoints of SBERT/SRoBERTa ([paper](https://arxiv.org/abs/1908.10084)), you should use this option.
85
+ * `avg_top2`: Average embeddings of the last two layers.
86
+ * `avg_first_last`: Average embeddings of the first and last layers. If you use vanilla BERT or RoBERTa, this works the best.
87
+ * `--mode`: Evaluation mode
88
+ * `test` (default): The default test mode. To faithfully reproduce our results, you should use this option.
89
+ * `dev`: Report the development set results. Note that in STS tasks, only `STS-B` and `SICK-R` have development sets, so we only report their numbers. It also takes a fast mode for transfer tasks, so the running time is much shorter than the `test` mode (though numbers are slightly lower).
90
+ * `fasttest`: It is the same as `test`, but with a fast mode so the running time is much shorter, but the reported numbers may be lower (only for transfer tasks).
91
+ * `--task_set`: What set of tasks to evaluate on (if set, it will override `--tasks`)
92
+ * `sts` (default): Evaluate on STS tasks, including `STS 12~16`, `STS-B` and `SICK-R`. This is the most commonly-used set of tasks to evaluate the quality of sentence embeddings.
93
+ * `cococxc`: Evaluate on domain-shifted CXC task.
94
+ * `transfer`: Evaluate on transfer tasks.
95
+ * `full`: Evaluate on both STS and transfer tasks.
96
+ * `na`: Manually set tasks by `--tasks`.
97
+ * `--tasks`: Specify which dataset(s) to evaluate on. Will be overridden if `--task_set` is not `na`. See the code for a full list of tasks.
98
+ * `--pre_seq_len`: The length of deep continuous prompt.
99
+
100
+
101
+ ### Training
102
 
103
  **Data**
104
 
 
172
  | Valid steps | 125 | 125 | 125 | 125 |
173
 
174
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
175
  ## Usage
176
  We provide *tool.py* to easily compute the cosine similarities between two groups of sentences as well as build index for a group of sentences and search among them. You can have a try by runing
177
  ```bash
 
245
  An animal is biting a persons finger. (cosine similarity: 0.6126)
246
  ```
247
 
248
+
249
  ## Citation
250
 
251
  Please cite our paper by:
 
259
  archivePrefix={arXiv},
260
  primaryClass={cs.CL}
261
  }
262
+ ```