[request for feedback] Faster downloads with Xet
Llama 4 Maverick and Scout are the first major models on Hugging Face uploaded with Xet, to make the upload and download significantly faster (more info here: https://huggingface.co/docs/huggingface_hub/en/guides/download#faster-downloads)
Let us know if you have any feedback!
pip install -u huggingface_hub hf_xet
is all you need to get blazingly fast download speeds! ⚡
Hi
@bullerwins
: Love to see you trying Xet! Can you confirm that your Python environment is using hf-xet==1.0.0
? Also, on a beefy machine, you can tweak the number of concurrent downloads to speed things up. Set the environment variable XET_NUM_CONCURRENT_RANGE_GETS=[number bigger than 16 default]
. Also, confirm that your HF_HOME
or HF_HUB_CACHE
and HF_XET_CACHE
directories are on SSD/NVMe (not a distributed filesystem).
Note: in hf-xet 1.1 we will rename this env variable to HF_XET_NUM_CONCURRENT_RANGE_GETS
(to bring consistency across all the env variables to tune for hf-xet).
Hi, I was hitting this with xet:
OSError: meta-llama/Llama-4-Scout-17B-16E-Instruct does not appear to have a file named model-00044-of-00050.safetensors. Checkout 'https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct/tree/main'for available files.
With a fresh env:
pip install git+https://github.com/huggingface/[email protected]#egg=transformers accelerate trl deepspeed
pip install -u huggingface_hub hf_xet
Hi @bullerwins : Love to see you trying Xet! Can you confirm that your Python environment is using
hf-xet==1.0.0
? Also, on a beefy machine, you can tweak the number of concurrent downloads to speed things up. Set the environment variableXET_NUM_CONCURRENT_RANGE_GETS=[number bigger than 16 default]
. Also, confirm that yourHF_HOME
orHF_HUB_CACHE
andHF_XET_CACHE
directories are on SSD/NVMe (not a distributed filesystem).Note: in hf-xet 1.1 we will rename this env variable to
HF_XET_NUM_CONCURRENT_RANGE_GETS
(to bring consistency across all the env variables to tune for hf-xet).
hf-xet is 1.0.0 correct:
It seems to be going much faster with XET_NUM_CONCURRENT_RANGE_GETS=50, It fluctuates a lot from 300-700MB/s, hf_transfer seems like it stays stable at a bit faster speeds still
Edit: i don't have the variables for HF_HOME HF_HUB_CACHE or HF_XET_CACHE set but it seems like it's using this directory as default, which is indeed in a local nvme:
@bullerwins : Great, glad things are performing better. From our preliminary benchmarking we've seen hf-xet outperform hf-transfer - but we haven't had a chance to document the environment variables to set to get best performance out of hf-xet. We've got lots of knobs to tune but intentionally wanted to make sure hf-xet performed well in constrained environments (containers running colab/jupyter notebooks, laptops, weak/flaky network) so we kept the default config modest. With big machines (fast network, fast disk, lots of cores) hf-xet can do really well for downloading & uploading files. Stay tuned for details on how to squeeze the perf out of hf-xet later this week! Again, thanks for using hf-xet today - please keep the feedback coming!
Hi,
It seems that McAfee Web Gateway is considering https://cas-bridge.xethub.hf.co/ as a high risk url, and prevents any download from it.
It would be great if there was a fallback to the normal HF download servers.
unsloth/Llama-4-Scout-17B-16E-Instruct downloads fine as it is not using xet.
@Epliz
: you'll have to ask your admin to allow list cas-bridge.xethub.hf.co
on your gateway.
To be comprehensive, here are all the URLs you may have to allow list as well depending on your access patterns:
cdn-lfs-us-1.hf.co
cdn-lfs-eu-1.hf.co
cdn-lfs.hf.co
cas-bridge.xethub.hf.co (new)
cas-server.xethub.hf.co (new)
transfer.xethub.hf.co (new)
You make a great point; it would be wonderful if you didn't have to take this step. Thanks for the feedback!
@rajatarya
I am using a distributed filesystem. What is your suggestion to proceed with downloading the Llama 4 weights? I have tried using hf_xet and downloading the individual files after I get errors like OSError: meta-llama/Llama-4-Scout-17B-16E-Instruct does not appear to have files named ('model-00003-of-00050.safetensors', 'model-00007-of-00050.safetensors', 'model-00029-of-00050.safetensors', 'model-00048-of-00050.safetensors')
and it continues to complain about the same files being missing. Any advice would be super valuable!
@mrbean-character : Hi, thanks for the question, and using Xet!
We just released hf-xet==1.0.2
which renames our config params to make allow for better performance tuning out of hf-xet
. See the release notes here, or the comment block below.
Specifically, for distributed filesystems, you want to make sure the Xet cache is set to a local disk (and a fast SSD or NVMe). Set the Xet cache location with HF_XET_CACHE
parameter. Also, if you plan on uploading files, make sure TMPDIR
is also set to a local disk location. If the distributed filesystem is on a slow network, you may want to consider disabling the Xet cache altogether. See the notes below for moe information there. Also, to remove a couple layers in the problem, consider downloading the files directly using the huggingface_hub
, with huggingface_hub.snapshot_download()
(docs here).
# This is number of concurrent terms (range of bytes from within a xorb) downloaded from S3 per file.
# Increasing this will help with the speed of downloading a file if there is network bandwidth available.
HF_XET_NUM_CONCURRENT_RANGE_GETS=16
# hf-xet is designed for SSD/NVMe disks to be used (using parallel writes). If you are using an HDD, setting this
# will change disk writes to be sequential instead of parallel.
# To set, HF_XET_RECONSTRUCT_WRITE_SEQUENTIALLY=true
(Note that hf-xet will have at most HF_XET_MAX_CONCURRENT_DOWNLOADS * HF_XET_NUM_CONCURRENT_RANGE_GETS
parallel GETs from S3).
# Default cache size. Increasing this will give more space for caching terms/chunks fetched from S3.
# A larger cache can better take advantage of deduplication across repos & files.
HF_XET_CHUNK_CACHE_SIZE_BYTES=10737418240 (10GiB)
# setting this changes where the chunk cache is located (ideally set to a local SSD/nvme vs shared/distributed filesystem)
HF_XET_CACHE=~/.cache/huggingface/xet
# setting this will change where the chunk cache is located (`$HF_HOME/xet`). Lower precedence than `HF_XET_CACHE`
HF_HOME=~/.cache/huggingface
# If your network bandwidth is >> disk speed, e.g. 10 Gbps link vs SATA SSD or worse
# Disabling the xet cache will increase your performance. To disable xet cache, set HF_XET_CHUNK_CACHE_SIZE_BYTES=0.
(also, cc: @bullerwins @rawsh )
It seems to be way faster and stable at higher speeds. Using HF_XET_NUM_CONCURRENT_RANGE_GETS=50 HF_XET_MAX_CONCURRENT_DOWNLOADS=8
It seems that HF_XET_MAX_CONCURRENT_DOWNLOADS is being ignored, is that a hf_transfer thing? i believe without the hf_tranfer rust implementation it indeed download 8 in parallel
@bullerwins : Glad you are seeing more stable download speeds. This is likely due to the CDN being warmed up now.
From your screenshot you are still using hf_transfer
. You will want to unset HF_HUB_ENABLE_HF_TRANSFER environment variable to stop using hf-transfer
when downloading.
When using hf-transfer
hf-xet
is not used at all. hf-transfer
uses LFS protocol and does concurrent range reads of 10MB for each file. The LFS protocol is reading the file directly from S3 (using presigned URLs). hf_transfer
uses system threads and downloads the file in 10MB chunks, and has each system thread write their chunk to disk in parallel.
hf-xet
uses the XET protocol to reassemble the file. Xet Storage doesn't store the entire file in S3. Instead, it chunks the file into 64KB chunks (using content-defined chunking) and then collects the unique 64KB chunks into 64MB data blocks (we call them xorbs) and stores them in S3. So, downloading a file from using XET protocol is downloading 64MB data blocks from S3 (fronted by a CDN) and then reassembling them in parallel.
If you are using hf-transfer
then you aren't using hf-xet
and the HF_XET_ environment variables do not apply or alter the configuration of hf-transfer
. Both hf-transfer
and hf-xet
are implemented in Rust, but the implementations are different due to the protocols being different. And hf-transfer
is designed to have limited UX and focus entirely on performance.
@bullerwins One more thing I am realizing that is making this confusing. The console output says "Xet Storage is enabled for this repo. Downloading file from Xet Storage.." That is totally confusing, but is accurate. When a file is uploaded using XET protocol then it is stored in Xet Storage. All of Llama4 was stored using the XET protocol in the Xet Storage Backend.
However, to support backwards compatibility, Xet Storage backend also speaks the LFS protocol. This allows existing clients (like hf-transfer) to download files stored in the XET protocol. So, from your screenshot, the files you are downloading are stored in XET protocol, but are being downloaded using LFS protocol (with hf-transfer). Part of the backwards compatibility with Xet Storage is more than just supporting the LFS protocol, the Xet Storage backend also uses CDNs and caches full files to speed up delivery of LFS files.
We have documentation in Hub docs explaining how the Storage Backends for Hub are built, check it out here.
From your screenshot you are still using hf_transfer. You will want to unset HF_HUB_ENABLE_HF_TRANSFER environment variable to stop using hf-transfer when downloading.
@rajatarya
it looks like
@bullerwins
is indeed using hf_xet for download in the above log. The (hf_transfer=True)
log is explaining why the cached, incomplete file download is being deleted, and the entire file is being downloaded again. It's a bit confusing, but it looks like the file was partially downloaded through hf_transfer
before this run.
The Downloading file from Xet Storage..
part of the subsequent log is only shown when we download the file using the xet protocol.
So excited that you are getting better download speeds. You have the honor of being one of the earliest adopters, so as @rajatarya mentioned, the earlier, slower downloads were likely due to this data not being available from our CDN.
It seems that HF_XET_MAX_CONCURRENT_DOWNLOADS is being ignored, is that a hf_transfer thing? i believe without the hf_tranfer rust implementation it indeed download 8 in parallel
@bullerwins
: It's not an hf_transfer thing, but you are very correct. This environment variable is essentially ignored. We get the number of concurrent files from the --max-workers
option on huggingface-cli download
. If you want to increase the number of files that are downloaded in parallel, please use that option on the cli and ignore HF_XET_MAX_CONCURRENT_DOWNLOADS
.
HF_XET_NUM_CONCURRENT_RANGE_GETS
gives you control over how many blocks we will download for each file in parallel. If you have a lot of bandwidth, it would be worth tweaking both HF_XET_NUM_CONCURRENT_RANGE_GETS
and --max-workers
to see how to maximize your download speed of these models.
The download speeds seem the same to me but that might be because im using distributed storage so it might already be capped to 200mbps from before. But Im so happy with xet because i dont get read errors anymore and have to restart downloads. it works better than https in that regard.