jina-reranker-v2-base-multilingual test
This is a Cross Encoder model finetuned from jinaai/jina-reranker-v2-base-multilingual using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
Model Details
Model Description
- Model Type: Cross Encoder
- Base model: jinaai/jina-reranker-v2-base-multilingual
- Maximum Sequence Length: 1024 tokens
- Number of Output Labels: 1 label
- License: apache-2.0
Model Sources
- Documentation: Sentence Transformers Documentation
- Documentation: Cross Encoder Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Cross Encoders on Hugging Face
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import CrossEncoder
# Download from the ๐ค Hub
model = CrossEncoder("SMARTICT/jina-reranker-v2-base-multilingual-wiki-tr-rag-prefix")
# Get scores for pairs of texts
pairs = [
['query: Kumbara tasarruf bilincinin aลฤฑlanmasฤฑnda nasฤฑl bir araรงtฤฑr?', 'passage: Kumbara, รถzellikle รงocuklara kรผรงรผk yaลta para biriktirmenin ve tasarrufun รถnemini anlamalarฤฑnฤฑ saฤlamak iรงin eฤlenceli ve gรถrsel bir araรง sunar. ฤฐรงine attฤฑklarฤฑ her kuruลu gรถrerek birikimlerinin artฤฑลฤฑnฤฑ gรถzlemlemeleri, onlarda tasarruf alฤฑลkanlฤฑฤฤฑ kazanmalarฤฑna yardฤฑmcฤฑ olur.'],
['query: Kumbara tasarruf bilincinin aลฤฑlanmasฤฑnda nasฤฑl bir araรงtฤฑr?', 'passage: Uzay araรงlarฤฑnda yakฤฑt tasarrufu saฤlamak iรงin reaksiyon kontrol sistemlerine alternatif olarak ark jetleri, iyon iticileri veya Hall etkili iticiler gibi yรผksek รถzgรผl itki motorlarฤฑ kullanฤฑlabilir. Ayrฤฑca, ISS dahil bazฤฑ uzay araรงlarฤฑ, dรถnme oranlarฤฑnฤฑ kontrol etmek iรงin dรถnen momentum รงarklarฤฑndan yararlanฤฑr.'],
['query: Kumbara tasarruf bilincinin aลฤฑlanmasฤฑnda nasฤฑl bir araรงtฤฑr?', 'passage: Kubar, genellikle pipo, bong veya vaporizรถr kullanฤฑlarak iรงilir. Ayrฤฑca sigara gibi sarฤฑlarak da tรผketilebilir. Ancak kubar tek baลฤฑna yanmadฤฑฤฤฑ iรงin, bu ลekilde iรงildiฤinde genellikle normal esrar veya tรผtรผn ile karฤฑลtฤฑrฤฑlฤฑr. Dekarboksile edilmiล kubar ise oral yolla da kullanฤฑlabilir.'],
['query: Kumbara tasarruf bilincinin aลฤฑlanmasฤฑnda nasฤฑl bir araรงtฤฑr?', 'passage: Taลฤฑma kuvveti, bir cismin havada yukarฤฑ doฤru kaldฤฑrฤฑlmasฤฑna neden olan kuvvettir. Direnรง kuvveti ise cismin hareketini yavaลlatan, ona karลฤฑ koyan kuvvettir. Hava taลฤฑmacฤฑlฤฑฤฤฑnda her iki kuvvet de รถnemlidir. Uรงaklar uรงabilmek iรงin yeterli taลฤฑma kuvveti รผretmelidir. Ancak aynฤฑ zamanda direnci minimize etmek iรงin tasarlanฤฑrlar รงรผnkรผ direnรง yakฤฑt tรผketimini artฤฑrฤฑr. Kara taลฤฑtlarฤฑnda ise dรผลรผk hฤฑzlarda direnรง kuvveti รถn plandadฤฑr. Ancak yรผksek hฤฑzlarda, รถrneฤin Formula 1 araรงlarฤฑnda, taลฤฑma kuvveti de รถnemli hale gelir รงรผnkรผ aracฤฑn yol tutuลunu saฤlar.'],
['query: Kumbara tasarruf bilincinin aลฤฑlanmasฤฑnda nasฤฑl bir araรงtฤฑr?', 'passage: Evet, yazฤฑda da belirtildiฤi gibi kuvvet makineleri yakฤฑt kullanan ฤฑsฤฑ makineleri ve doฤal enerji kaynaklarฤฑnฤฑ kullanan makinelere ayrฤฑlฤฑr. รrneฤin, araรงlarda kullanฤฑlan motorlar ฤฑsฤฑ makineleridir รงรผnkรผ benzin veya dizel yakฤฑtฤฑ kullanarak mekanik enerji รผretirler. Rรผzgar tรผrbinleri ise rรผzgarฤฑn kinetik enerjisini elektrik enerjisine dรถnรผลtรผren doฤal enerji kaynaklฤฑ kuvvet makineleridir.'],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)
# Or rank different texts based on similarity to a single text
ranks = model.rank(
'query: Kumbara tasarruf bilincinin aลฤฑlanmasฤฑnda nasฤฑl bir araรงtฤฑr?',
[
'passage: Kumbara, รถzellikle รงocuklara kรผรงรผk yaลta para biriktirmenin ve tasarrufun รถnemini anlamalarฤฑnฤฑ saฤlamak iรงin eฤlenceli ve gรถrsel bir araรง sunar. ฤฐรงine attฤฑklarฤฑ her kuruลu gรถrerek birikimlerinin artฤฑลฤฑnฤฑ gรถzlemlemeleri, onlarda tasarruf alฤฑลkanlฤฑฤฤฑ kazanmalarฤฑna yardฤฑmcฤฑ olur.',
'passage: Uzay araรงlarฤฑnda yakฤฑt tasarrufu saฤlamak iรงin reaksiyon kontrol sistemlerine alternatif olarak ark jetleri, iyon iticileri veya Hall etkili iticiler gibi yรผksek รถzgรผl itki motorlarฤฑ kullanฤฑlabilir. Ayrฤฑca, ISS dahil bazฤฑ uzay araรงlarฤฑ, dรถnme oranlarฤฑnฤฑ kontrol etmek iรงin dรถnen momentum รงarklarฤฑndan yararlanฤฑr.',
'passage: Kubar, genellikle pipo, bong veya vaporizรถr kullanฤฑlarak iรงilir. Ayrฤฑca sigara gibi sarฤฑlarak da tรผketilebilir. Ancak kubar tek baลฤฑna yanmadฤฑฤฤฑ iรงin, bu ลekilde iรงildiฤinde genellikle normal esrar veya tรผtรผn ile karฤฑลtฤฑrฤฑlฤฑr. Dekarboksile edilmiล kubar ise oral yolla da kullanฤฑlabilir.',
'passage: Taลฤฑma kuvveti, bir cismin havada yukarฤฑ doฤru kaldฤฑrฤฑlmasฤฑna neden olan kuvvettir. Direnรง kuvveti ise cismin hareketini yavaลlatan, ona karลฤฑ koyan kuvvettir. Hava taลฤฑmacฤฑlฤฑฤฤฑnda her iki kuvvet de รถnemlidir. Uรงaklar uรงabilmek iรงin yeterli taลฤฑma kuvveti รผretmelidir. Ancak aynฤฑ zamanda direnci minimize etmek iรงin tasarlanฤฑrlar รงรผnkรผ direnรง yakฤฑt tรผketimini artฤฑrฤฑr. Kara taลฤฑtlarฤฑnda ise dรผลรผk hฤฑzlarda direnรง kuvveti รถn plandadฤฑr. Ancak yรผksek hฤฑzlarda, รถrneฤin Formula 1 araรงlarฤฑnda, taลฤฑma kuvveti de รถnemli hale gelir รงรผnkรผ aracฤฑn yol tutuลunu saฤlar.',
'passage: Evet, yazฤฑda da belirtildiฤi gibi kuvvet makineleri yakฤฑt kullanan ฤฑsฤฑ makineleri ve doฤal enerji kaynaklarฤฑnฤฑ kullanan makinelere ayrฤฑlฤฑr. รrneฤin, araรงlarda kullanฤฑlan motorlar ฤฑsฤฑ makineleridir รงรผnkรผ benzin veya dizel yakฤฑtฤฑ kullanarak mekanik enerji รผretirler. Rรผzgar tรผrbinleri ise rรผzgarฤฑn kinetik enerjisini elektrik enerjisine dรถnรผลtรผren doฤal enerji kaynaklฤฑ kuvvet makineleridir.',
]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
Evaluation
Metrics
Cross Encoder Reranking
- Dataset:
gooaq-dev
- Evaluated with
CrossEncoderRerankingEvaluator
with these parameters:{ "at_k": 10, "always_rerank_positives": false }
Metric | Value |
---|---|
map | 0.9094 (-0.0382) |
mrr@10 | 0.9248 (-0.0228) |
ndcg@10 | 0.9386 (-0.0118) |
Cross Encoder Reranking
- Datasets:
NanoMSMARCO_R100
,NanoNFCorpus_R100
andNanoNQ_R100
- Evaluated with
CrossEncoderRerankingEvaluator
with these parameters:{ "at_k": 10, "always_rerank_positives": true }
Metric | NanoMSMARCO_R100 | NanoNFCorpus_R100 | NanoNQ_R100 |
---|---|---|---|
map | 0.5847 (+0.0951) | 0.4027 (+0.1417) | 0.6937 (+0.2741) |
mrr@10 | 0.5880 (+0.1105) | 0.6892 (+0.1894) | 0.7346 (+0.3079) |
ndcg@10 | 0.6644 (+0.1240) | 0.4778 (+0.1527) | 0.7569 (+0.2562) |
Cross Encoder Nano BEIR
- Dataset:
NanoBEIR_R100_mean
- Evaluated with
CrossEncoderNanoBEIREvaluator
with these parameters:{ "dataset_names": [ "msmarco", "nfcorpus", "nq" ], "rerank_k": 100, "at_k": 10, "always_rerank_positives": true }
Metric | Value |
---|---|
map | 0.5604 (+0.1703) |
mrr@10 | 0.6706 (+0.2026) |
ndcg@10 | 0.6330 (+0.1776) |
Training Details
Training Dataset
Unnamed Dataset
- Size: 26,004 training samples
- Columns:
question
,answer
, andlabel
- Approximate statistics based on the first 1000 samples:
question answer label type string string int details - min: 27 characters
- mean: 78.97 characters
- max: 182 characters
- min: 44 characters
- mean: 273.24 characters
- max: 836 characters
- 0: ~81.00%
- 1: ~19.00%
- Samples:
question answer label query: Kumbara tasarruf bilincinin aลฤฑlanmasฤฑnda nasฤฑl bir araรงtฤฑr?
passage: Kumbara, รถzellikle รงocuklara kรผรงรผk yaลta para biriktirmenin ve tasarrufun รถnemini anlamalarฤฑnฤฑ saฤlamak iรงin eฤlenceli ve gรถrsel bir araรง sunar. ฤฐรงine attฤฑklarฤฑ her kuruลu gรถrerek birikimlerinin artฤฑลฤฑnฤฑ gรถzlemlemeleri, onlarda tasarruf alฤฑลkanlฤฑฤฤฑ kazanmalarฤฑna yardฤฑmcฤฑ olur.
1
query: Kumbara tasarruf bilincinin aลฤฑlanmasฤฑnda nasฤฑl bir araรงtฤฑr?
passage: Uzay araรงlarฤฑnda yakฤฑt tasarrufu saฤlamak iรงin reaksiyon kontrol sistemlerine alternatif olarak ark jetleri, iyon iticileri veya Hall etkili iticiler gibi yรผksek รถzgรผl itki motorlarฤฑ kullanฤฑlabilir. Ayrฤฑca, ISS dahil bazฤฑ uzay araรงlarฤฑ, dรถnme oranlarฤฑnฤฑ kontrol etmek iรงin dรถnen momentum รงarklarฤฑndan yararlanฤฑr.
0
query: Kumbara tasarruf bilincinin aลฤฑlanmasฤฑnda nasฤฑl bir araรงtฤฑr?
passage: Kubar, genellikle pipo, bong veya vaporizรถr kullanฤฑlarak iรงilir. Ayrฤฑca sigara gibi sarฤฑlarak da tรผketilebilir. Ancak kubar tek baลฤฑna yanmadฤฑฤฤฑ iรงin, bu ลekilde iรงildiฤinde genellikle normal esrar veya tรผtรผn ile karฤฑลtฤฑrฤฑlฤฑr. Dekarboksile edilmiล kubar ise oral yolla da kullanฤฑlabilir.
0
- Loss:
BinaryCrossEntropyLoss
with these parameters:{ "activation_fn": "torch.nn.modules.linear.Identity", "pos_weight": 5 }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 16per_device_eval_batch_size
: 16learning_rate
: 2e-05num_train_epochs
: 2warmup_ratio
: 0.1bf16
: Truedataloader_num_workers
: 4load_best_model_at_end
: True
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 16per_device_eval_batch_size
: 16per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 2e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 2max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Truefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 4dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Trueignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}tp_size
: 0fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | gooaq-dev_ndcg@10 | NanoMSMARCO_R100_ndcg@10 | NanoNFCorpus_R100_ndcg@10 | NanoNQ_R100_ndcg@10 | NanoBEIR_R100_mean_ndcg@10 |
---|---|---|---|---|---|---|---|
-1 | -1 | - | 0.9555 (+0.0050) | 0.6801 (+0.1397) | 0.4668 (+0.1417) | 0.7932 (+0.2925) | 0.6467 (+0.1913) |
0.0006 | 1 | 0.2737 | - | - | - | - | - |
0.6150 | 1000 | 0.0997 | - | - | - | - | - |
1.2300 | 2000 | 0.019 | - | - | - | - | - |
1.8450 | 3000 | 0.0202 | - | - | - | - | - |
-1 | -1 | - | 0.9386 (-0.0118) | 0.6644 (+0.1240) | 0.4778 (+0.1527) | 0.7569 (+0.2562) | 0.6330 (+0.1776) |
Framework Versions
- Python: 3.11.12
- Sentence Transformers: 4.0.2
- Transformers: 4.51.1
- PyTorch: 2.6.0+cu124
- Accelerate: 1.5.2
- Datasets: 3.5.0
- Tokenizers: 0.21.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
- Downloads last month
- 29
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
HF Inference deployability: The HF Inference API does not support text-ranking models for sentence-transformers
library.
Model tree for SMARTICT/jina-reranker-v2-base-multilingual-wiki-tr-rag-prefix
Base model
jinaai/jina-reranker-v2-base-multilingualEvaluation results
- Map on gooaq devself-reported0.909
- Mrr@10 on gooaq devself-reported0.925
- Ndcg@10 on gooaq devself-reported0.939
- Map on NanoMSMARCO R100self-reported0.585
- Mrr@10 on NanoMSMARCO R100self-reported0.588
- Ndcg@10 on NanoMSMARCO R100self-reported0.664
- Map on NanoNFCorpus R100self-reported0.403
- Mrr@10 on NanoNFCorpus R100self-reported0.689
- Ndcg@10 on NanoNFCorpus R100self-reported0.478
- Map on NanoNQ R100self-reported0.694