meta-llama/Llama-4-Scout-17B-16E-Instruct · [request for feedback] Faster downloads with Xet

clem

18 days ago

Llama 4 Maverick and Scout are the first major models on Hugging Face uploaded with Xet, to make the upload and download significantly faster (more info here: https://huggingface.co/docs/huggingface_hub/en/guides/download#faster-downloads)

Let us know if you have any feedback!

reach-vb

Meta Llama org 18 days ago

pip install -u huggingface_hub hf_xet is all you need to get blazingly fast download speeds! ⚡

bullerwins

18 days ago

Seems like I'm getting slower download using xet vs just using hf_transter

Xet is between 100-250MB/s while hf_tranfer caps my 10Gbit connection at 850MB/s-1GB/s. Using a high speed NVMe drive as the destination, high end EPYC CPU.

Xet:

hf_transfer:

rajatarya

18 days ago

Hi @bullerwins : Love to see you trying Xet! Can you confirm that your Python environment is using hf-xet==1.0.0? Also, on a beefy machine, you can tweak the number of concurrent downloads to speed things up. Set the environment variable XET_NUM_CONCURRENT_RANGE_GETS=[number bigger than 16 default] . Also, confirm that your HF_HOME or HF_HUB_CACHE and HF_XET_CACHE directories are on SSD/NVMe (not a distributed filesystem).

Note: in hf-xet 1.1 we will rename this env variable to HF_XET_NUM_CONCURRENT_RANGE_GETS (to bring consistency across all the env variables to tune for hf-xet).

rawsh

18 days ago

•

edited 18 days ago

Hi, I was hitting this with xet:

OSError: meta-llama/Llama-4-Scout-17B-16E-Instruct does not appear to have a file named model-00044-of-00050.safetensors. Checkout 'https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct/tree/main'for available files.

With a fresh env:

pip install git+https://github.com/huggingface/[email protected]#egg=transformers accelerate trl deepspeed
pip install -u huggingface_hub hf_xet

rajatarya

18 days ago

@rawsh This is unexpected. Can you try downloading this file just using huggingface_hub and hf_xet?

The command is:

huggingface-cli download --repo-type model meta-llama/Llama-4-Scout-17B-16E-Instruct model-00044-of-00050.safetensors

bullerwins

18 days ago

•

edited 18 days ago

Hi @bullerwins : Love to see you trying Xet! Can you confirm that your Python environment is using hf-xet==1.0.0? Also, on a beefy machine, you can tweak the number of concurrent downloads to speed things up. Set the environment variable XET_NUM_CONCURRENT_RANGE_GETS=[number bigger than 16 default] . Also, confirm that your HF_HOME or HF_HUB_CACHE and HF_XET_CACHE directories are on SSD/NVMe (not a distributed filesystem).

Note: in hf-xet 1.1 we will rename this env variable to HF_XET_NUM_CONCURRENT_RANGE_GETS (to bring consistency across all the env variables to tune for hf-xet).

hf-xet is 1.0.0 correct:

It seems to be going much faster with XET_NUM_CONCURRENT_RANGE_GETS=50, It fluctuates a lot from 300-700MB/s, hf_transfer seems like it stays stable at a bit faster speeds still

Edit: i don't have the variables for HF_HOME HF_HUB_CACHE or HF_XET_CACHE set but it seems like it's using this directory as default, which is indeed in a local nvme:

rajatarya

18 days ago

•

edited 18 days ago

@bullerwins : Great, glad things are performing better. From our preliminary benchmarking we've seen hf-xet outperform hf-transfer - but we haven't had a chance to document the environment variables to set to get best performance out of hf-xet. We've got lots of knobs to tune but intentionally wanted to make sure hf-xet performed well in constrained environments (containers running colab/jupyter notebooks, laptops, weak/flaky network) so we kept the default config modest. With big machines (fast network, fast disk, lots of cores) hf-xet can do really well for downloading & uploading files. Stay tuned for details on how to squeeze the perf out of hf-xet later this week! Again, thanks for using hf-xet today - please keep the feedback coming!

Epliz

18 days ago

Hi,

It seems that McAfee Web Gateway is considering https://cas-bridge.xethub.hf.co/ as a high risk url, and prevents any download from it.
It would be great if there was a fallback to the normal HF download servers.
unsloth/Llama-4-Scout-17B-16E-Instruct downloads fine as it is not using xet.

brianronan

17 days ago

@Epliz : you'll have to ask your admin to allow list cas-bridge.xethub.hf.co on your gateway.

To be comprehensive, here are all the URLs you may have to allow list as well depending on your access patterns:

cdn-lfs-us-1.hf.co
cdn-lfs-eu-1.hf.co
cdn-lfs.hf.co
cas-bridge.xethub.hf.co (new)
cas-server.xethub.hf.co (new)
transfer.xethub.hf.co (new)

You make a great point; it would be wonderful if you didn't have to take this step. Thanks for the feedback!

mrbean-character

17 days ago

@rajatarya I am using a distributed filesystem. What is your suggestion to proceed with downloading the Llama 4 weights? I have tried using hf_xet and downloading the individual files after I get errors like OSError: meta-llama/Llama-4-Scout-17B-16E-Instruct does not appear to have files named ('model-00003-of-00050.safetensors', 'model-00007-of-00050.safetensors', 'model-00029-of-00050.safetensors', 'model-00048-of-00050.safetensors') and it continues to complain about the same files being missing. Any advice would be super valuable!

rajatarya

17 days ago

•

edited 16 days ago

@mrbean-character : Hi, thanks for the question, and using Xet!

We just released hf-xet==1.0.2 which renames our config params to make allow for better performance tuning out of hf-xet. See the release notes here, or the comment block below.

Specifically, for distributed filesystems, you want to make sure the Xet cache is set to a local disk (and a fast SSD or NVMe). Set the Xet cache location with HF_XET_CACHE parameter. Also, if you plan on uploading files, make sure TMPDIR is also set to a local disk location. If the distributed filesystem is on a slow network, you may want to consider disabling the Xet cache altogether. See the notes below for moe information there. Also, to remove a couple layers in the problem, consider downloading the files directly using the huggingface_hub, with huggingface_hub.snapshot_download() (docs here).

# This is number of concurrent terms (range of bytes from within a xorb) downloaded from S3 per file.
# Increasing this will help with the speed of downloading a file if there is network bandwidth available. 
HF_XET_NUM_CONCURRENT_RANGE_GETS=16

# hf-xet is designed for SSD/NVMe disks to be used (using parallel writes). If you are using an HDD, setting this
# will change disk writes to be sequential instead of parallel.
# To set, HF_XET_RECONSTRUCT_WRITE_SEQUENTIALLY=true

(Note that hf-xet will have at most HF_XET_MAX_CONCURRENT_DOWNLOADS * HF_XET_NUM_CONCURRENT_RANGE_GETS 
parallel GETs from S3).

# Default cache size. Increasing this will give more space for caching terms/chunks fetched from S3.
# A larger cache can better take advantage of deduplication across repos & files.
HF_XET_CHUNK_CACHE_SIZE_BYTES=10737418240 (10GiB)

# setting this changes where the chunk cache is located (ideally set to a local SSD/nvme vs shared/distributed filesystem)
HF_XET_CACHE=~/.cache/huggingface/xet
# setting this will change where the chunk cache is located (`$HF_HOME/xet`). Lower precedence than `HF_XET_CACHE`
HF_HOME=~/.cache/huggingface

# If your network bandwidth is >> disk speed, e.g. 10 Gbps link vs SATA SSD or worse
# Disabling the xet cache will increase your performance. To disable xet cache, set HF_XET_CHUNK_CACHE_SIZE_BYTES=0.

(also, cc: @bullerwins @rawsh )

bullerwins

17 days ago

It seems to be way faster and stable at higher speeds. Using HF_XET_NUM_CONCURRENT_RANGE_GETS=50 HF_XET_MAX_CONCURRENT_DOWNLOADS=8

It seems that HF_XET_MAX_CONCURRENT_DOWNLOADS is being ignored, is that a hf_transfer thing? i believe without the hf_tranfer rust implementation it indeed download 8 in parallel

rajatarya

17 days ago

@bullerwins : Glad you are seeing more stable download speeds. This is likely due to the CDN being warmed up now.

From your screenshot you are still using hf_transfer. You will want to unset HF_HUB_ENABLE_HF_TRANSFER environment variable to stop using hf-transfer when downloading.

When using hf-transfer hf-xet is not used at all. hf-transfer uses LFS protocol and does concurrent range reads of 10MB for each file. The LFS protocol is reading the file directly from S3 (using presigned URLs). hf_transfer uses system threads and downloads the file in 10MB chunks, and has each system thread write their chunk to disk in parallel.

hf-xet uses the XET protocol to reassemble the file. Xet Storage doesn't store the entire file in S3. Instead, it chunks the file into 64KB chunks (using content-defined chunking) and then collects the unique 64KB chunks into 64MB data blocks (we call them xorbs) and stores them in S3. So, downloading a file from using XET protocol is downloading 64MB data blocks from S3 (fronted by a CDN) and then reassembling them in parallel.

If you are using hf-transfer then you aren't using hf-xet and the HF_XET_ environment variables do not apply or alter the configuration of hf-transfer. Both hf-transfer and hf-xet are implemented in Rust, but the implementations are different due to the protocols being different. And hf-transfer is designed to have limited UX and focus entirely on performance.

rajatarya

17 days ago

@bullerwins One more thing I am realizing that is making this confusing. The console output says "Xet Storage is enabled for this repo. Downloading file from Xet Storage.." That is totally confusing, but is accurate. When a file is uploaded using XET protocol then it is stored in Xet Storage. All of Llama4 was stored using the XET protocol in the Xet Storage Backend.

However, to support backwards compatibility, Xet Storage backend also speaks the LFS protocol. This allows existing clients (like hf-transfer) to download files stored in the XET protocol. So, from your screenshot, the files you are downloading are stored in XET protocol, but are being downloaded using LFS protocol (with hf-transfer). Part of the backwards compatibility with Xet Storage is more than just supporting the LFS protocol, the Xet Storage backend also uses CDNs and caches full files to speed up delivery of LFS files.

We have documentation in Hub docs explaining how the Storage Backends for Hub are built, check it out here.

brianronan

17 days ago

•

edited 17 days ago

From your screenshot you are still using hf_transfer. You will want to unset HF_HUB_ENABLE_HF_TRANSFER environment variable to stop using hf-transfer when downloading.

@rajatarya it looks like @bullerwins is indeed using hf_xet for download in the above log. The (hf_transfer=True) log is explaining why the cached, incomplete file download is being deleted, and the entire file is being downloaded again. It's a bit confusing, but it looks like the file was partially downloaded through hf_transfer before this run.

The Downloading file from Xet Storage.. part of the subsequent log is only shown when we download the file using the xet protocol.

So excited that you are getting better download speeds. You have the honor of being one of the earliest adopters, so as @rajatarya mentioned, the earlier, slower downloads were likely due to this data not being available from our CDN.

brianronan

17 days ago

•

edited 17 days ago

It seems that HF_XET_MAX_CONCURRENT_DOWNLOADS is being ignored, is that a hf_transfer thing? i believe without the hf_tranfer rust implementation it indeed download 8 in parallel

@bullerwins : It's not an hf_transfer thing, but you are very correct. This environment variable is essentially ignored. We get the number of concurrent files from the --max-workers option on huggingface-cli download. If you want to increase the number of files that are downloaded in parallel, please use that option on the cli and ignore HF_XET_MAX_CONCURRENT_DOWNLOADS.

HF_XET_NUM_CONCURRENT_RANGE_GETS gives you control over how many blocks we will download for each file in parallel. If you have a lot of bandwidth, it would be worth tweaking both HF_XET_NUM_CONCURRENT_RANGE_GETS and --max-workers to see how to maximize your download speed of these models.

srinivasbilla

16 days ago

The download speeds seem the same to me but that might be because im using distributed storage so it might already be capped to 200mbps from before. But Im so happy with xet because i dont get read errors anymore and have to restart downloads. it works better than https in that regard.

kingabzpro

15 days ago

•

edited 15 days ago

I dont see any diference. Same speed with and with out.

meta-llama
/

Llama-4-Scout-17B-16E-Instruct

[request for feedback] Faster downloads with Xet

With hf_xet

Without hf_xet