Hello,

I am trying to deploy the jina-embeddings-v2-small-en model on a Sagemaker endpoint that lacks network connectivity under strict VPC configs (not reproduced in the example below):

tei_image_uri = get_huggingface_llm_image_uri(
"huggingface-tei",
version="1.4.0")

emb_model = HuggingFaceModel(
role=role,
image_uri=tei_image_uri,
env={"HF_TASK": "feature-extraction",
"HF_MODEL_ID": "jinaai/jina-embeddings-v2-small-en"
},

vpc_config={}

)

emb_predictor = emb_model.deploy(
initial_instance_count=1,
instance_type="ml.g5.2xlarge",
endpoint_name="jina-embeddings"
)

Because of missing network connectivity, necessary model artifacts are not downloaded.
To address this issue, I tried fetching model files using:

from sentence_transformers import SentenceTransformer

model_id = "jinaai/jina-embeddings-v2-small-en"
local_model_path = "./jina_embeddings"
model = SentenceTransformer("jinaai/jina-embeddings-v2-small-en",trust_remote_code=True)
model.save_pretrained(local_model_path)

Looking into config.json, modeling_bert and condiguration_bert are mentioned, without being present among the model artifacts.
Presumably these files need to be downloaded, but without network access the deployment fails (when supplying an archive containing model artifacts) with the error:

"Starting FlashBert model on Cuda(CudaDevice(DeviceId(1)))"
Caused by: Could not start backend: FlashBert only supports absolute position embeddings

The successful deployment without VPC configs (model artifacts are downloaded), mentions:
"Starting FlashJinaBert model on Cuda(CudaDevice(DeviceId(1)))" , as opposed to FlashBert.

What would your recommendation be to address this issue?

Thank you!

jinaai
/

jina-embeddings-v2-base-en

Deployment on Sagemaker endpoints without network connectivity

vpc_config={}