|
--- |
|
language: |
|
- en |
|
license: apache-2.0 |
|
tags: |
|
- sentence-transformers |
|
- cross-encoder |
|
- generated_from_trainer |
|
- dataset_size:578402 |
|
- loss:BinaryCrossEntropyLoss |
|
base_model: answerdotai/ModernBERT-base |
|
pipeline_tag: text-ranking |
|
library_name: sentence-transformers |
|
metrics: |
|
- map |
|
- mrr@10 |
|
- ndcg@10 |
|
model-index: |
|
- name: ModernBERT-base trained on GooAQ |
|
results: |
|
- task: |
|
type: cross-encoder-reranking |
|
name: Cross Encoder Reranking |
|
dataset: |
|
name: gooaq dev |
|
type: gooaq-dev |
|
metrics: |
|
- type: map |
|
value: 0.7308 |
|
name: Map |
|
- type: mrr@10 |
|
value: 0.7292 |
|
name: Mrr@10 |
|
- type: ndcg@10 |
|
value: 0.7713 |
|
name: Ndcg@10 |
|
- task: |
|
type: cross-encoder-reranking |
|
name: Cross Encoder Reranking |
|
dataset: |
|
name: NanoMSMARCO R100 |
|
type: NanoMSMARCO_R100 |
|
metrics: |
|
- type: map |
|
value: 0.4579 |
|
name: Map |
|
- type: mrr@10 |
|
value: 0.4479 |
|
name: Mrr@10 |
|
- type: ndcg@10 |
|
value: 0.5275 |
|
name: Ndcg@10 |
|
- task: |
|
type: cross-encoder-reranking |
|
name: Cross Encoder Reranking |
|
dataset: |
|
name: NanoNFCorpus R100 |
|
type: NanoNFCorpus_R100 |
|
metrics: |
|
- type: map |
|
value: 0.3414 |
|
name: Map |
|
- type: mrr@10 |
|
value: 0.534 |
|
name: Mrr@10 |
|
- type: ndcg@10 |
|
value: 0.3821 |
|
name: Ndcg@10 |
|
- task: |
|
type: cross-encoder-reranking |
|
name: Cross Encoder Reranking |
|
dataset: |
|
name: NanoNQ R100 |
|
type: NanoNQ_R100 |
|
metrics: |
|
- type: map |
|
value: 0.3932 |
|
name: Map |
|
- type: mrr@10 |
|
value: 0.3918 |
|
name: Mrr@10 |
|
- type: ndcg@10 |
|
value: 0.463 |
|
name: Ndcg@10 |
|
- task: |
|
type: cross-encoder-nano-beir |
|
name: Cross Encoder Nano BEIR |
|
dataset: |
|
name: NanoBEIR R100 mean |
|
type: NanoBEIR_R100_mean |
|
metrics: |
|
- type: map |
|
value: 0.3975 |
|
name: Map |
|
- type: mrr@10 |
|
value: 0.4579 |
|
name: Mrr@10 |
|
- type: ndcg@10 |
|
value: 0.4575 |
|
name: Ndcg@10 |
|
--- |
|
|
|
# ModernBERT-base trained on GooAQ |
|
|
|
This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search. |
|
|
|
See [training_gooaq_bce.py](https://github.com/UKPLab/sentence-transformers/blob/feat/cross_encoder_trainer/examples/cross_encoder/training/rerankers/training_gooaq_bce.py) for the training script. This script is also described in the [Cross Encoder > Training Overview](https://sbert.net/docs/cross_encoder/training_overview.html) documentation and the [Training and Finetuning Reranker Models with Sentence Transformers v4](https://huggingface.co/blog/train-reranker) blogpost. |
|
|
|
 |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
- **Model Type:** Cross Encoder |
|
- **Base model:** [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) <!-- at revision 8949b909ec900327062f0ebf497f51aef5e6f0c8 --> |
|
- **Maximum Sequence Length:** 8192 tokens |
|
- **Number of Output Labels:** 1 label |
|
<!-- - **Training Dataset:** Unknown --> |
|
- **Language:** en |
|
- **License:** apache-2.0 |
|
|
|
### Model Sources |
|
|
|
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net) |
|
- **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html) |
|
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) |
|
- **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder) |
|
|
|
## Usage |
|
|
|
### Direct Usage (Sentence Transformers) |
|
|
|
First install the Sentence Transformers library: |
|
|
|
```bash |
|
pip install -U sentence-transformers |
|
``` |
|
|
|
Then you can load this model and run inference. |
|
```python |
|
from sentence_transformers import CrossEncoder |
|
|
|
# Download from the 🤗 Hub |
|
model = CrossEncoder("tomaarsen/reranker-ModernBERT-base-gooaq-bce") |
|
# Get scores for pairs of texts |
|
pairs = [ |
|
['why are rye chips so good?', "It makes them taste that much better! The rye chips are tasty because they stand out--they're the saltiest thing in the bag. It's not because rye bread is inherently awesome. ... You could just buy a bag of rye chips."], |
|
['why are rye chips so good?', 'There are no substantial technical, nutritional or performance issues associated with rye that would limit its use for pets. Rye is a fairly common ingredient in human foods and beverages. The most prevalent occurrence is in crackers and breads.'], |
|
['why are rye chips so good?', 'Bread made wholly from rye flour is made in Germany and called pumpernickel. Rye is unique among grains for having a high level of fibre in its endosperm – not just in its bran. As such, the glycemic index (GI) of rye products is generally lower than products made from wheat and most other grains.'], |
|
['why are rye chips so good?', 'KFC Chips – The salt mix on the seasoned chips and the actual chips do not contain any animal products. Our supplier/s of chips and seasoning have confirmed they are suitable for vegans.'], |
|
['why are rye chips so good?', 'A study in the American Journal of Clinical Nutrition found that eating rye leads to better blood-sugar control compared to wheat. Rye bread is packed with magnesium, which helps control blood pressure and optimize heart health. Its high levels of soluble fibre can also reduce cholesterol.'], |
|
] |
|
scores = model.predict(pairs) |
|
print(scores.shape) |
|
# (5,) |
|
|
|
# Or rank different texts based on similarity to a single text |
|
ranks = model.rank( |
|
'why are rye chips so good?', |
|
[ |
|
"It makes them taste that much better! The rye chips are tasty because they stand out--they're the saltiest thing in the bag. It's not because rye bread is inherently awesome. ... You could just buy a bag of rye chips.", |
|
'There are no substantial technical, nutritional or performance issues associated with rye that would limit its use for pets. Rye is a fairly common ingredient in human foods and beverages. The most prevalent occurrence is in crackers and breads.', |
|
'Bread made wholly from rye flour is made in Germany and called pumpernickel. Rye is unique among grains for having a high level of fibre in its endosperm – not just in its bran. As such, the glycemic index (GI) of rye products is generally lower than products made from wheat and most other grains.', |
|
'KFC Chips – The salt mix on the seasoned chips and the actual chips do not contain any animal products. Our supplier/s of chips and seasoning have confirmed they are suitable for vegans.', |
|
'A study in the American Journal of Clinical Nutrition found that eating rye leads to better blood-sugar control compared to wheat. Rye bread is packed with magnesium, which helps control blood pressure and optimize heart health. Its high levels of soluble fibre can also reduce cholesterol.', |
|
] |
|
) |
|
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...] |
|
``` |
|
|
|
<!-- |
|
### Direct Usage (Transformers) |
|
|
|
<details><summary>Click to see the direct usage in Transformers</summary> |
|
|
|
</details> |
|
--> |
|
|
|
<!-- |
|
### Downstream Usage (Sentence Transformers) |
|
|
|
You can finetune this model on your own dataset. |
|
|
|
<details><summary>Click to expand</summary> |
|
|
|
</details> |
|
--> |
|
|
|
<!-- |
|
### Out-of-Scope Use |
|
|
|
*List how the model may foreseeably be misused and address what users ought not to do with the model.* |
|
--> |
|
|
|
## Evaluation |
|
|
|
### Metrics |
|
|
|
#### Cross Encoder Reranking |
|
|
|
* Dataset: `gooaq-dev` |
|
* Evaluated with [<code>CrossEncoderRerankingEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderRerankingEvaluator) with these parameters: |
|
```json |
|
{ |
|
"at_k": 10, |
|
"always_rerank_positives": false |
|
} |
|
``` |
|
|
|
| Metric | Value | |
|
|:------------|:---------------------| |
|
| map | 0.7308 (+0.1997) | |
|
| mrr@10 | 0.7292 (+0.2052) | |
|
| **ndcg@10** | **0.7713 (+0.1801)** | |
|
|
|
|
|
#### Cross Encoder Reranking |
|
|
|
* Dataset: `gooaq-dev` |
|
* Evaluated with [<code>CrossEncoderRerankingEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderRerankingEvaluator) with these parameters: |
|
```json |
|
{ |
|
"at_k": 10, |
|
"always_rerank_positives": true |
|
} |
|
``` |
|
|
|
| Metric | Value | |
|
|:------------|:---------------------| |
|
| map | 0.7908 (+0.2597) | |
|
| mrr@10 | 0.7890 (+0.2650) | |
|
| **ndcg@10** | **0.8351 (+0.2439)** | |
|
|
|
#### Cross Encoder Reranking |
|
|
|
* Datasets: `NanoMSMARCO_R100`, `NanoNFCorpus_R100` and `NanoNQ_R100` |
|
* Evaluated with [<code>CrossEncoderRerankingEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderRerankingEvaluator) with these parameters: |
|
```json |
|
{ |
|
"at_k": 10, |
|
"always_rerank_positives": true |
|
} |
|
``` |
|
|
|
| Metric | NanoMSMARCO_R100 | NanoNFCorpus_R100 | NanoNQ_R100 | |
|
|:------------|:---------------------|:---------------------|:---------------------| |
|
| map | 0.4579 (-0.0317) | 0.3414 (+0.0804) | 0.3932 (-0.0264) | |
|
| mrr@10 | 0.4479 (-0.0296) | 0.5340 (+0.0342) | 0.3918 (-0.0349) | |
|
| **ndcg@10** | **0.5275 (-0.0130)** | **0.3821 (+0.0571)** | **0.4630 (-0.0377)** | |
|
|
|
#### Cross Encoder Nano BEIR |
|
|
|
* Dataset: `NanoBEIR_R100_mean` |
|
* Evaluated with [<code>CrossEncoderNanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderNanoBEIREvaluator) with these parameters: |
|
```json |
|
{ |
|
"dataset_names": [ |
|
"msmarco", |
|
"nfcorpus", |
|
"nq" |
|
], |
|
"rerank_k": 100, |
|
"at_k": 10, |
|
"always_rerank_positives": true |
|
} |
|
``` |
|
|
|
| Metric | Value | |
|
|:------------|:---------------------| |
|
| map | 0.3975 (+0.0074) | |
|
| mrr@10 | 0.4579 (-0.0101) | |
|
| **ndcg@10** | **0.4575 (+0.0022)** | |
|
|
|
<!-- |
|
## Bias, Risks and Limitations |
|
|
|
*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.* |
|
--> |
|
|
|
<!-- |
|
### Recommendations |
|
|
|
*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.* |
|
--> |
|
|
|
## Training Details |
|
|
|
### Training Dataset |
|
|
|
#### Unnamed Dataset |
|
|
|
* Size: 578,402 training samples |
|
* Columns: <code>question</code>, <code>answer</code>, and <code>label</code> |
|
* Approximate statistics based on the first 1000 samples: |
|
| | question | answer | label | |
|
|:--------|:-----------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------|:------------------------------------------------| |
|
| type | string | string | int | |
|
| details | <ul><li>min: 19 characters</li><li>mean: 45.14 characters</li><li>max: 85 characters</li></ul> | <ul><li>min: 65 characters</li><li>mean: 254.8 characters</li><li>max: 379 characters</li></ul> | <ul><li>0: ~82.90%</li><li>1: ~17.10%</li></ul> | |
|
* Samples: |
|
| question | answer | label | |
|
|:----------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------| |
|
| <code>why are rye chips so good?</code> | <code>It makes them taste that much better! The rye chips are tasty because they stand out--they're the saltiest thing in the bag. It's not because rye bread is inherently awesome. ... You could just buy a bag of rye chips.</code> | <code>1</code> | |
|
| <code>why are rye chips so good?</code> | <code>There are no substantial technical, nutritional or performance issues associated with rye that would limit its use for pets. Rye is a fairly common ingredient in human foods and beverages. The most prevalent occurrence is in crackers and breads.</code> | <code>0</code> | |
|
| <code>why are rye chips so good?</code> | <code>Bread made wholly from rye flour is made in Germany and called pumpernickel. Rye is unique among grains for having a high level of fibre in its endosperm – not just in its bran. As such, the glycemic index (GI) of rye products is generally lower than products made from wheat and most other grains.</code> | <code>0</code> | |
|
* Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters: |
|
```json |
|
{ |
|
"activation_fct": "torch.nn.modules.linear.Identity", |
|
"pos_weight": 5 |
|
} |
|
``` |
|
|
|
### Training Hyperparameters |
|
#### Non-Default Hyperparameters |
|
|
|
- `eval_strategy`: steps |
|
- `per_device_train_batch_size`: 64 |
|
- `per_device_eval_batch_size`: 64 |
|
- `learning_rate`: 2e-05 |
|
- `num_train_epochs`: 1 |
|
- `warmup_ratio`: 0.1 |
|
- `seed`: 12 |
|
- `bf16`: True |
|
- `dataloader_num_workers`: 4 |
|
- `load_best_model_at_end`: True |
|
|
|
#### All Hyperparameters |
|
<details><summary>Click to expand</summary> |
|
|
|
- `overwrite_output_dir`: False |
|
- `do_predict`: False |
|
- `eval_strategy`: steps |
|
- `prediction_loss_only`: True |
|
- `per_device_train_batch_size`: 64 |
|
- `per_device_eval_batch_size`: 64 |
|
- `per_gpu_train_batch_size`: None |
|
- `per_gpu_eval_batch_size`: None |
|
- `gradient_accumulation_steps`: 1 |
|
- `eval_accumulation_steps`: None |
|
- `torch_empty_cache_steps`: None |
|
- `learning_rate`: 2e-05 |
|
- `weight_decay`: 0.0 |
|
- `adam_beta1`: 0.9 |
|
- `adam_beta2`: 0.999 |
|
- `adam_epsilon`: 1e-08 |
|
- `max_grad_norm`: 1.0 |
|
- `num_train_epochs`: 1 |
|
- `max_steps`: -1 |
|
- `lr_scheduler_type`: linear |
|
- `lr_scheduler_kwargs`: {} |
|
- `warmup_ratio`: 0.1 |
|
- `warmup_steps`: 0 |
|
- `log_level`: passive |
|
- `log_level_replica`: warning |
|
- `log_on_each_node`: True |
|
- `logging_nan_inf_filter`: True |
|
- `save_safetensors`: True |
|
- `save_on_each_node`: False |
|
- `save_only_model`: False |
|
- `restore_callback_states_from_checkpoint`: False |
|
- `no_cuda`: False |
|
- `use_cpu`: False |
|
- `use_mps_device`: False |
|
- `seed`: 12 |
|
- `data_seed`: None |
|
- `jit_mode_eval`: False |
|
- `use_ipex`: False |
|
- `bf16`: True |
|
- `fp16`: False |
|
- `fp16_opt_level`: O1 |
|
- `half_precision_backend`: auto |
|
- `bf16_full_eval`: False |
|
- `fp16_full_eval`: False |
|
- `tf32`: None |
|
- `local_rank`: 0 |
|
- `ddp_backend`: None |
|
- `tpu_num_cores`: None |
|
- `tpu_metrics_debug`: False |
|
- `debug`: [] |
|
- `dataloader_drop_last`: False |
|
- `dataloader_num_workers`: 4 |
|
- `dataloader_prefetch_factor`: None |
|
- `past_index`: -1 |
|
- `disable_tqdm`: False |
|
- `remove_unused_columns`: True |
|
- `label_names`: None |
|
- `load_best_model_at_end`: True |
|
- `ignore_data_skip`: False |
|
- `fsdp`: [] |
|
- `fsdp_min_num_params`: 0 |
|
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} |
|
- `fsdp_transformer_layer_cls_to_wrap`: None |
|
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} |
|
- `deepspeed`: None |
|
- `label_smoothing_factor`: 0.0 |
|
- `optim`: adamw_torch |
|
- `optim_args`: None |
|
- `adafactor`: False |
|
- `group_by_length`: False |
|
- `length_column_name`: length |
|
- `ddp_find_unused_parameters`: None |
|
- `ddp_bucket_cap_mb`: None |
|
- `ddp_broadcast_buffers`: False |
|
- `dataloader_pin_memory`: True |
|
- `dataloader_persistent_workers`: False |
|
- `skip_memory_metrics`: True |
|
- `use_legacy_prediction_loop`: False |
|
- `push_to_hub`: False |
|
- `resume_from_checkpoint`: None |
|
- `hub_model_id`: None |
|
- `hub_strategy`: every_save |
|
- `hub_private_repo`: None |
|
- `hub_always_push`: False |
|
- `gradient_checkpointing`: False |
|
- `gradient_checkpointing_kwargs`: None |
|
- `include_inputs_for_metrics`: False |
|
- `include_for_metrics`: [] |
|
- `eval_do_concat_batches`: True |
|
- `fp16_backend`: auto |
|
- `push_to_hub_model_id`: None |
|
- `push_to_hub_organization`: None |
|
- `mp_parameters`: |
|
- `auto_find_batch_size`: False |
|
- `full_determinism`: False |
|
- `torchdynamo`: None |
|
- `ray_scope`: last |
|
- `ddp_timeout`: 1800 |
|
- `torch_compile`: False |
|
- `torch_compile_backend`: None |
|
- `torch_compile_mode`: None |
|
- `dispatch_batches`: None |
|
- `split_batches`: None |
|
- `include_tokens_per_second`: False |
|
- `include_num_input_tokens_seen`: False |
|
- `neftune_noise_alpha`: None |
|
- `optim_target_modules`: None |
|
- `batch_eval_metrics`: False |
|
- `eval_on_start`: False |
|
- `use_liger_kernel`: False |
|
- `eval_use_gather_object`: False |
|
- `average_tokens_across_devices`: False |
|
- `prompts`: None |
|
- `batch_sampler`: batch_sampler |
|
- `multi_dataset_batch_sampler`: proportional |
|
|
|
</details> |
|
|
|
### Training Logs |
|
| Epoch | Step | Training Loss | gooaq-dev_ndcg@10 | NanoMSMARCO_R100_ndcg@10 | NanoNFCorpus_R100_ndcg@10 | NanoNQ_R100_ndcg@10 | NanoBEIR_R100_mean_ndcg@10 | |
|
|:----------:|:--------:|:-------------:|:--------------------:|:------------------------:|:-------------------------:|:--------------------:|:--------------------------:| |
|
| -1 | -1 | - | 0.1288 (-0.4624) | 0.0149 (-0.5255) | 0.2278 (-0.0972) | 0.0229 (-0.4777) | 0.0885 (-0.3668) | |
|
| 0.0001 | 1 | 1.0435 | - | - | - | - | - | |
|
| 0.0221 | 200 | 1.1924 | - | - | - | - | - | |
|
| 0.0443 | 400 | 1.1531 | - | - | - | - | - | |
|
| 0.0664 | 600 | 0.9371 | - | - | - | - | - | |
|
| 0.0885 | 800 | 0.6993 | - | - | - | - | - | |
|
| 0.1106 | 1000 | 0.669 | 0.7042 (+0.1130) | 0.4353 (-0.1051) | 0.3289 (+0.0039) | 0.4250 (-0.0757) | 0.3964 (-0.0590) | |
|
| 0.1328 | 1200 | 0.6257 | - | - | - | - | - | |
|
| 0.1549 | 1400 | 0.6283 | - | - | - | - | - | |
|
| 0.1770 | 1600 | 0.6014 | - | - | - | - | - | |
|
| 0.1992 | 1800 | 0.5888 | - | - | - | - | - | |
|
| 0.2213 | 2000 | 0.5493 | 0.7425 (+0.1513) | 0.4947 (-0.0457) | 0.3568 (+0.0318) | 0.4634 (-0.0373) | 0.4383 (-0.0171) | |
|
| 0.2434 | 2200 | 0.5479 | - | - | - | - | - | |
|
| 0.2655 | 2400 | 0.5329 | - | - | - | - | - | |
|
| 0.2877 | 2600 | 0.5208 | - | - | - | - | - | |
|
| 0.3098 | 2800 | 0.5259 | - | - | - | - | - | |
|
| 0.3319 | 3000 | 0.5221 | 0.7479 (+0.1567) | 0.5146 (-0.0258) | 0.3710 (+0.0460) | 0.4846 (-0.0160) | 0.4568 (+0.0014) | |
|
| 0.3541 | 3200 | 0.4977 | - | - | - | - | - | |
|
| 0.3762 | 3400 | 0.4965 | - | - | - | - | - | |
|
| 0.3983 | 3600 | 0.4985 | - | - | - | - | - | |
|
| 0.4204 | 3800 | 0.4907 | - | - | - | - | - | |
|
| 0.4426 | 4000 | 0.5058 | 0.7624 (+0.1712) | 0.5166 (-0.0238) | 0.3665 (+0.0415) | 0.4868 (-0.0138) | 0.4567 (+0.0013) | |
|
| 0.4647 | 4200 | 0.4885 | - | - | - | - | - | |
|
| 0.4868 | 4400 | 0.495 | - | - | - | - | - | |
|
| 0.5090 | 4600 | 0.4839 | - | - | - | - | - | |
|
| 0.5311 | 4800 | 0.4983 | - | - | - | - | - | |
|
| 0.5532 | 5000 | 0.4778 | 0.7603 (+0.1691) | 0.5110 (-0.0294) | 0.3540 (+0.0290) | 0.4809 (-0.0197) | 0.4487 (-0.0067) | |
|
| 0.5753 | 5200 | 0.4726 | - | - | - | - | - | |
|
| 0.5975 | 5400 | 0.477 | - | - | - | - | - | |
|
| 0.6196 | 5600 | 0.4613 | - | - | - | - | - | |
|
| 0.6417 | 5800 | 0.4492 | - | - | - | - | - | |
|
| 0.6639 | 6000 | 0.4506 | 0.7643 (+0.1731) | 0.5275 (-0.0129) | 0.3639 (+0.0389) | 0.4913 (-0.0094) | 0.4609 (+0.0055) | |
|
| 0.6860 | 6200 | 0.4618 | - | - | - | - | - | |
|
| 0.7081 | 6400 | 0.463 | - | - | - | - | - | |
|
| 0.7303 | 6600 | 0.4585 | - | - | - | - | - | |
|
| 0.7524 | 6800 | 0.4612 | - | - | - | - | - | |
|
| 0.7745 | 7000 | 0.4621 | 0.7649 (+0.1736) | 0.5105 (-0.0299) | 0.3688 (+0.0437) | 0.4552 (-0.0454) | 0.4448 (-0.0105) | |
|
| 0.7966 | 7200 | 0.4536 | - | - | - | - | - | |
|
| 0.8188 | 7400 | 0.4515 | - | - | - | - | - | |
|
| 0.8409 | 7600 | 0.4396 | - | - | - | - | - | |
|
| 0.8630 | 7800 | 0.4542 | - | - | - | - | - | |
|
| 0.8852 | 8000 | 0.4332 | 0.7669 (+0.1757) | 0.5247 (-0.0157) | 0.3794 (+0.0544) | 0.4370 (-0.0637) | 0.4470 (-0.0083) | |
|
| 0.9073 | 8200 | 0.447 | - | - | - | - | - | |
|
| 0.9294 | 8400 | 0.4335 | - | - | - | - | - | |
|
| 0.9515 | 8600 | 0.4179 | - | - | - | - | - | |
|
| 0.9737 | 8800 | 0.4459 | - | - | - | - | - | |
|
| **0.9958** | **9000** | **0.4196** | **0.7713 (+0.1801)** | **0.5275 (-0.0130)** | **0.3821 (+0.0571)** | **0.4630 (-0.0377)** | **0.4575 (+0.0022)** | |
|
| -1 | -1 | - | 0.7713 (+0.1801) | 0.5275 (-0.0130) | 0.3821 (+0.0571) | 0.4630 (-0.0377) | 0.4575 (+0.0022) | |
|
|
|
* The bold row denotes the saved checkpoint. |
|
|
|
### Framework Versions |
|
- Python: 3.11.10 |
|
- Sentence Transformers: 3.5.0.dev0 |
|
- Transformers: 4.49.0 |
|
- PyTorch: 2.5.1+cu124 |
|
- Accelerate: 1.5.2 |
|
- Datasets: 2.21.0 |
|
- Tokenizers: 0.21.0 |
|
|
|
## Citation |
|
|
|
### BibTeX |
|
|
|
#### Sentence Transformers |
|
```bibtex |
|
@inproceedings{reimers-2019-sentence-bert, |
|
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", |
|
author = "Reimers, Nils and Gurevych, Iryna", |
|
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", |
|
month = "11", |
|
year = "2019", |
|
publisher = "Association for Computational Linguistics", |
|
url = "https://arxiv.org/abs/1908.10084", |
|
} |
|
``` |
|
|
|
<!-- |
|
## Glossary |
|
|
|
*Clearly define terms in order to be accessible across audiences.* |
|
--> |
|
|
|
<!-- |
|
## Model Card Authors |
|
|
|
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.* |
|
--> |
|
|
|
<!-- |
|
## Model Card Contact |
|
|
|
*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.* |
|
--> |