cross-encoder/stsb-roberta-base · Add exported onnx model 'model_qint8_avx512

Sentence Transformers - Cross-Encoders org 11 days ago

Hello!

This pull request has been automatically generated from the export_dynamic_quantized_onnx_model function from the Sentence Transformers library.

Config

QuantizationConfig(
    is_static=False,
    format=<QuantFormat.QOperator: 0>,
    mode=<QuantizationMode.IntegerOps: 0>,
    activations_dtype=<QuantType.QUInt8: 1>,
    activations_symmetric=False,
    weights_dtype=<QuantType.QInt8: 0>,
    weights_symmetric=True,
    per_channel=True,
    reduce_range=False,
    nodes_to_quantize=[],
    nodes_to_exclude=[],
    operators_to_quantize=['Conv',
    'MatMul',
    'Attention',
    'LSTM',
    'Gather',
    'Transpose',
    'EmbedLayerNormalization'],
    qdq_add_pair_to_weight=False,
    qdq_dedicated_pair=False,
    qdq_op_type_per_channel_support_to_axis={'MatMul': 1}
)

Tip:

Consider testing this pull request before merging by loading the model from this PR with the revision argument:

from sentence_transformers import CrossEncoder

# TODO: Fill in the PR number
pr_number = 2
model = CrossEncoder(
    "cross-encoder/stsb-roberta-base",
    revision=f"refs/pr/{pr_number}",
    backend="onnx",
    model_kwargs={"file_name": "model_qint8_avx512_vnni.onnx"},
)

# Verify that everything works as expected
query = "Which planet is known as the Red Planet?"
passages = [
    "Venus is often called Earth's twin because of its similar size and proximity.",
    "Mars, known for its reddish appearance, is often referred to as the Red Planet.",
    "Jupiter, the largest planet in our solar system, has a prominent red spot.",
    "Saturn, famous for its rings, is sometimes mistaken for the Red Planet."
]

scores = model.predict([(query, passage) for passage in passages])
print(scores)

Add exported onnx model 'model_qint8_avx512_vnni.onnx'4135c375

tomaarsen changed pull request status to merged 11 days ago