--- tags: - sentence-transformers - sentence-similarity - feature-extraction - generated_from_trainer - dataset_size:1000000 - loss:MatryoshkaLoss - loss:MultipleNegativesRankingLoss base_model: EuroBERT/EuroBERT-210m widget: - source_sentence: امرأة شقراء تطل على مشهد (سياتل سبيس نيدل) sentences: - رجل يستمتع بمناظر جسر البوابة الذهبية - فتاة بالخارج تلعب في الثلج - شخص ما يأخذ في نظرة إبرة الفضاء. - source_sentence: سوق الشرق الأوسط sentences: - مسرح أمريكي - متجر في الشرق الأوسط - البالغون صغار - source_sentence: رجلين يتنافسان في ملابس فنون الدفاع عن النفس sentences: - هناك العديد من الناس الحاضرين. - الكلب الأبيض على الشاطئ - هناك شخص واحد فقط موجود. - source_sentence: مجموعة من الناس تمشي بجانب شاحنة. sentences: - الناس يقفون - بعض الناس بالخارج - بعض الرجال يقودون على الطريق - source_sentence: لاعبة كرة ناعمة ترمي الكرة إلى زميلتها في الفريق sentences: - شخصان يلعبان كرة البيسبول - الرجل ينظف - لاعبين لكرة البيسبول يجلسان على مقعد pipeline_tag: sentence-similarity library_name: sentence-transformers metrics: - pearson_cosine - spearman_cosine model-index: - name: SentenceTransformer based on EuroBERT/EuroBERT-210m results: - task: type: semantic-similarity name: Semantic Similarity dataset: name: sts dev 768 type: sts-dev-768 metrics: - type: pearson_cosine value: 0.8111988062913815 name: Pearson Cosine - type: spearman_cosine value: 0.8100586279907306 name: Spearman Cosine - task: type: semantic-similarity name: Semantic Similarity dataset: name: sts dev 512 type: sts-dev-512 metrics: - type: pearson_cosine value: 0.8092891955563192 name: Pearson Cosine - type: spearman_cosine value: 0.8087644228771842 name: Spearman Cosine - task: type: semantic-similarity name: Semantic Similarity dataset: name: sts dev 256 type: sts-dev-256 metrics: - type: pearson_cosine value: 0.8076510620939634 name: Pearson Cosine - type: spearman_cosine value: 0.8080588277305082 name: Spearman Cosine - task: type: semantic-similarity name: Semantic Similarity dataset: name: sts dev 128 type: sts-dev-128 metrics: - type: pearson_cosine value: 0.8028710019029521 name: Pearson Cosine - type: spearman_cosine value: 0.8054855987917489 name: Spearman Cosine - task: type: semantic-similarity name: Semantic Similarity dataset: name: sts dev 64 type: sts-dev-64 metrics: - type: pearson_cosine value: 0.7923252906438638 name: Pearson Cosine - type: spearman_cosine value: 0.7975941111911333 name: Spearman Cosine --- # SentenceTransformer based on EuroBERT/EuroBERT-210m This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [EuroBERT/EuroBERT-210m](https://huggingface.co/EuroBERT/EuroBERT-210m). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [EuroBERT/EuroBERT-210m](https://huggingface.co/EuroBERT/EuroBERT-210m) - **Maximum Sequence Length:** 8192 tokens - **Output Dimensionality:** 768 dimensions - **Similarity Function:** Cosine Similarity ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: EuroBertModel (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("sentence_transformers_model_id") # Run inference sentences = [ 'لاعبة كرة ناعمة ترمي الكرة إلى زميلتها في الفريق', 'شخصان يلعبان كرة البيسبول', 'لاعبين لكرة البيسبول يجلسان على مقعد', ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 768] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] ``` ## Evaluation ### Metrics #### Semantic Similarity * Datasets: `sts-dev-768`, `sts-dev-512`, `sts-dev-256`, `sts-dev-128` and `sts-dev-64` * Evaluated with [EmbeddingSimilarityEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator) | Metric | sts-dev-768 | sts-dev-512 | sts-dev-256 | sts-dev-128 | sts-dev-64 | |:--------------------|:------------|:------------|:------------|:------------|:-----------| | pearson_cosine | 0.8112 | 0.8093 | 0.8077 | 0.8029 | 0.7923 | | **spearman_cosine** | **0.8101** | **0.8088** | **0.8081** | **0.8055** | **0.7976** | ## Training Details ### Training Dataset #### Unnamed Dataset * Size: 1,000,000 training samples * Columns: anchor, positive, and negative * Approximate statistics based on the first 1000 samples: | | anchor | positive | negative | |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------| | type | string | string | string | | details | | | | * Samples: | anchor | positive | negative | |:--------------------------------|:--------------------------------------------------------|:----------------------------------------------------------------------| | هناك رجل في الشارع | رجل يحمل مالاً يقف أمام فرقة موسيقية ومتجر | رجلين و صبي صغير في سترة أرجوانية يمسكون منشورات ترويجية | | الكلب يلعب بالجلب. | هناك كلب سمراء في منتصف الحقل يجلب كرة تنس | هناك كلب على العشب يهز نفسه حتى يجف. | | شخصان يسيران. | شخصان يضحكان | رجل وامرأة يركبان دراجة مزدوجة معاً | * Loss: [MatryoshkaLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters: ```json { "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 768, 512, 256, 128, 64 ], "matryoshka_weights": [ 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 } ``` ### Evaluation Dataset #### Unnamed Dataset * Size: 6,609 evaluation samples * Columns: anchor, positive, and negative * Approximate statistics based on the first 1000 samples: | | anchor | positive | negative | |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------| | type | string | string | string | | details | | | | * Samples: | anchor | positive | negative | |:----------------------------------------------------------------------------------------------|:-----------------------------------|:-------------------------------------------| | هذه الجوقة الكنيسة تغني للجماهير وهم يغنون الأغاني السعيدة من الكتاب في الكنيسة. | الكنيسة مليئة بالغناء | جوقة تغني في مباراة بيسبول | | امرأة ترتدي حجاب أخضر، وقميص أزرق وابتسامة كبيرة | المرأة سعيدة جداً | لقد تم إطلاق النار على المرأة | | رجل عجوز يحمل طردًا يتصور أمام إعلان. | رجل يتصور أمام إعلان. | رجل يمشي بجانب إعلان | * Loss: [MatryoshkaLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters: ```json { "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 768, 512, 256, 128, 64 ], "matryoshka_weights": [ 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 } ``` ### Training Hyperparameters #### Non-Default Hyperparameters - `eval_strategy`: steps - `per_device_train_batch_size`: 32 - `per_device_eval_batch_size`: 32 - `num_train_epochs`: 1 - `warmup_ratio`: 0.1 - `fp16`: True - `load_best_model_at_end`: True - `batch_sampler`: no_duplicates #### All Hyperparameters
Click to expand - `overwrite_output_dir`: False - `do_predict`: False - `eval_strategy`: steps - `prediction_loss_only`: True - `per_device_train_batch_size`: 32 - `per_device_eval_batch_size`: 32 - `per_gpu_train_batch_size`: None - `per_gpu_eval_batch_size`: None - `gradient_accumulation_steps`: 1 - `eval_accumulation_steps`: None - `torch_empty_cache_steps`: None - `learning_rate`: 5e-05 - `weight_decay`: 0.0 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1.0 - `num_train_epochs`: 1 - `max_steps`: -1 - `lr_scheduler_type`: linear - `lr_scheduler_kwargs`: {} - `warmup_ratio`: 0.1 - `warmup_steps`: 0 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `save_safetensors`: True - `save_on_each_node`: False - `save_only_model`: False - `restore_callback_states_from_checkpoint`: False - `no_cuda`: False - `use_cpu`: False - `use_mps_device`: False - `seed`: 42 - `data_seed`: None - `jit_mode_eval`: False - `use_ipex`: False - `bf16`: False - `fp16`: True - `fp16_opt_level`: O1 - `half_precision_backend`: auto - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: None - `local_rank`: 0 - `ddp_backend`: None - `tpu_num_cores`: None - `tpu_metrics_debug`: False - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_prefetch_factor`: None - `past_index`: -1 - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: True - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_min_num_params`: 0 - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `fsdp_transformer_layer_cls_to_wrap`: None - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch - `optim_args`: None - `adafactor`: False - `group_by_length`: False - `length_column_name`: length - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `use_legacy_prediction_loop`: False - `push_to_hub`: False - `resume_from_checkpoint`: None - `hub_model_id`: None - `hub_strategy`: every_save - `hub_private_repo`: None - `hub_always_push`: False - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `include_inputs_for_metrics`: False - `include_for_metrics`: [] - `eval_do_concat_batches`: True - `fp16_backend`: auto - `push_to_hub_model_id`: None - `push_to_hub_organization`: None - `mp_parameters`: - `auto_find_batch_size`: False - `full_determinism`: False - `torchdynamo`: None - `ray_scope`: last - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `dispatch_batches`: None - `split_batches`: None - `include_tokens_per_second`: False - `include_num_input_tokens_seen`: False - `neftune_noise_alpha`: None - `optim_target_modules`: None - `batch_eval_metrics`: False - `eval_on_start`: False - `use_liger_kernel`: False - `eval_use_gather_object`: False - `average_tokens_across_devices`: False - `prompts`: None - `batch_sampler`: no_duplicates - `multi_dataset_batch_sampler`: proportional
### Training Logs | Epoch | Step | Training Loss | Validation Loss | sts-dev-768_spearman_cosine | sts-dev-512_spearman_cosine | sts-dev-256_spearman_cosine | sts-dev-128_spearman_cosine | sts-dev-64_spearman_cosine | |:------:|:----:|:-------------:|:---------------:|:---------------------------:|:---------------------------:|:---------------------------:|:---------------------------:|:--------------------------:| | 0.0256 | 200 | 8.8816 | - | - | - | - | - | - | | 0.0512 | 400 | 5.1404 | - | - | - | - | - | - | | 0.0640 | 500 | - | 6.5304 | 0.7855 | 0.7818 | 0.7766 | 0.7712 | 0.7635 | | 0.0768 | 600 | 4.7789 | - | - | - | - | - | - | | 0.1024 | 800 | 4.6845 | - | - | - | - | - | - | | 0.1280 | 1000 | 4.6628 | 6.3298 | 0.7900 | 0.7888 | 0.7855 | 0.7829 | 0.7757 | | 0.1536 | 1200 | 4.2947 | - | - | - | - | - | - | | 0.1792 | 1400 | 4.0669 | - | - | - | - | - | - | | 0.1920 | 1500 | - | 6.1192 | 0.7762 | 0.7717 | 0.7688 | 0.7651 | 0.7546 | | 0.2048 | 1600 | 3.7798 | - | - | - | - | - | - | | 0.2304 | 1800 | 3.6295 | - | - | - | - | - | - | | 0.2560 | 2000 | 3.4326 | 5.5251 | 0.7968 | 0.7941 | 0.7926 | 0.7905 | 0.7822 | | 0.2816 | 2200 | 3.5024 | - | - | - | - | - | - | | 0.3072 | 2400 | 3.2039 | - | - | - | - | - | - | | 0.3200 | 2500 | - | 5.4173 | 0.7985 | 0.7957 | 0.7946 | 0.7904 | 0.7806 | | 0.3328 | 2600 | 3.1517 | - | - | - | - | - | - | | 0.3584 | 2800 | 3.0409 | - | - | - | - | - | - | | 0.3840 | 3000 | 2.9611 | 5.0394 | 0.7923 | 0.7894 | 0.7871 | 0.7848 | 0.7789 | | 0.4096 | 3200 | 2.8913 | - | - | - | - | - | - | | 0.4352 | 3400 | 2.6737 | - | - | - | - | - | - | | 0.4480 | 3500 | - | 4.8450 | 0.8124 | 0.8111 | 0.8075 | 0.8076 | 0.7968 | | 0.4608 | 3600 | 2.6488 | - | - | - | - | - | - | | 0.4864 | 3800 | 2.6208 | - | - | - | - | - | - | | 0.5120 | 4000 | 2.4823 | 4.5711 | 0.8111 | 0.8102 | 0.8082 | 0.8075 | 0.8015 | | 0.5376 | 4200 | 2.5081 | - | - | - | - | - | - | | 0.5632 | 4400 | 2.3827 | - | - | - | - | - | - | | 0.5760 | 4500 | - | 4.5276 | 0.8237 | 0.8227 | 0.8205 | 0.8200 | 0.8117 | | 0.5888 | 4600 | 2.2867 | - | - | - | - | - | - | | 0.6144 | 4800 | 2.2608 | - | - | - | - | - | - | | 0.6400 | 5000 | 2.6285 | 2.6928 | 0.8124 | 0.8113 | 0.8099 | 0.8087 | 0.8023 | | 0.6656 | 5200 | 3.2569 | - | - | - | - | - | - | | 0.6912 | 5400 | 2.7108 | - | - | - | - | - | - | | 0.7040 | 5500 | - | 3.4081 | 0.8112 | 0.8100 | 0.8080 | 0.8060 | 0.7994 | | 0.7168 | 5600 | 2.2756 | - | - | - | - | - | - | | 0.7424 | 5800 | 1.9964 | - | - | - | - | - | - | | 0.7680 | 6000 | 1.8278 | 3.6261 | 0.8116 | 0.8101 | 0.8088 | 0.8071 | 0.8013 | | 0.7935 | 6200 | 1.7105 | - | - | - | - | - | - | | 0.8191 | 6400 | 1.5719 | - | - | - | - | - | - | | 0.8319 | 6500 | - | 3.7826 | 0.8097 | 0.8085 | 0.8072 | 0.8040 | 0.7966 | | 0.8447 | 6600 | 1.4569 | - | - | - | - | - | - | | 0.8703 | 6800 | 1.3572 | - | - | - | - | - | - | | 0.8959 | 7000 | 1.2607 | 3.7323 | 0.8114 | 0.8102 | 0.8093 | 0.8070 | 0.8005 | | 0.9215 | 7200 | 1.1676 | - | - | - | - | - | - | | 0.9471 | 7400 | 1.1663 | - | - | - | - | - | - | | 0.9599 | 7500 | - | 3.8307 | 0.8101 | 0.8088 | 0.8081 | 0.8055 | 0.7976 | | 0.9727 | 7600 | 1.1079 | - | - | - | - | - | - | | 0.9983 | 7800 | 1.0827 | - | - | - | - | - | - | ### Framework Versions - Python: 3.10.12 - Sentence Transformers: 3.3.1 - Transformers: 4.48.0 - PyTorch: 2.5.1+cu124 - Accelerate: 1.2.1 - Datasets: 2.21.0 - Tokenizers: 0.21.0 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` #### MatryoshkaLoss ```bibtex @misc{kusupati2024matryoshka, title={Matryoshka Representation Learning}, author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi}, year={2024}, eprint={2205.13147}, archivePrefix={arXiv}, primaryClass={cs.LG} } ``` #### MultipleNegativesRankingLoss ```bibtex @misc{henderson2017efficient, title={Efficient Natural Language Response Suggestion for Smart Reply}, author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil}, year={2017}, eprint={1705.00652}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```