Within Seconds ?

by Daemontatox - opened 11 days ago

11 days ago

i am sorry but your usage of the phrase within seconds is very vague and misleading .
Does it Generate images that are amazing and very high quality ? Yes
Does it take "seconds" to generate the image ? yes if you mean 120+ secs.
even the official space demo on huggingface says its "quantized and faster" and it still takes way too long

cai-qi

HiDream.ai org 11 days ago

@Daemontatox The current space demo use a low-end GPU and the 120+ seconds includes queuing time. Excluding queuing, the estimated processing time on this hardware is around 30 seconds. For comparison, on H-series GPUs, the Fast model runs in 3.4 seconds and the Dev model in 5.8 seconds when using torch.compile for 1024x1024 resolution. Further engineering efforts are expected to yield additional speed improvements.

BTW, I've increased the concurrency from 2 to 4, which should reduce queuing time.

natalie5

11 days ago

@Daemontatox The current space demo use a low-end GPU and the 120+ seconds includes queuing time. Excluding queuing, the estimated processing time on this hardware is around 30 seconds. For comparison, on H-series GPUs, the Fast model runs in 3.4 seconds and the Dev model in 5.8 seconds when using torch.compile for 1024x1024 resolution. Further engineering efforts are expected to yield additional speed improvements.

BTW, I've increased the concurrency from 2 to 4, which should reduce queuing time.

Could you give insight on my issue? https://huggingface.co/HiDream-ai/HiDream-I1-Full/discussions/7

EnlightenedNostalgic

3 days ago

200+ seconds for which model? I'm on an RTX A6000 (24GB VRAM Ampere gen, not the recent ADA). I've been running the FP8 versions of Full, Dev and Fast models. Full version generations take 12 minutes (15.5s/it @ 50 steps), Dev version half that, and Fast 2minutes (and I'm not counting queueing times). I've seen user claims of the Full version generating in less than a minute on a 4090. Is such a difference normal? I did a clean install of the latest portable comfy yesterday, even went as far as turning down my display resolution to 720p @30fps to save even tiny KBs of VRAM. Flux times are consistent with what I see elsewhere, is there a problem with my system or is it normal and I should downgrade to the NF4 version?

mancub

3 days ago

•

edited 3 days ago

ComfyUI Full example from their page at https://comfyanonymous.github.io/ComfyUI_examples/hidream/ runs at 115.09s on my RTX 3090, from cold start.

I'm using a google t5 fp8 and llama3.1 fp8 scaled and the provided clip_l and clip_g for text encoders, and hidream full fp8 model.

This is all in Windows using portable ComfyUI, so in Linux there would be another 10% gain in performance.

There's definitely something wrong with your setup because your A6000 should be comparable to my 3090.

200+ seconds for which model? I'm on an RTX A6000 (24GB VRAM Ampere gen, not the recent ADA). I've been running the FP8 versions of Full, Dev and Fast models. Full version generations take 12 minutes (15.5s/it @ 50 steps), Dev version half that, and Fast 2minutes (and I'm not counting queueing times). I've seen user claims of the Full version generating in less than a minute on a 4090. Is such a difference normal? I did a clean install of the latest portable comfy yesterday, even went as far as turning down my display resolution to 720p @30fps to save even tiny KBs of VRAM. Flux times are consistent with what I see elsewhere, is there a problem with my system or is it normal and I should downgrade to the NF4 version?

EnlightenedNostalgic

2 days ago

ComfyUI Full example from their page at https://comfyanonymous.github.io/ComfyUI_examples/hidream/ runs at 115.09s on my RTX 3090, from cold start.

I'm using a google t5 fp8 and llama3.1 fp8 scaled and the provided clip_l and clip_g for text encoders, and hidream full fp8 model.

This is all in Windows using portable ComfyUI, so in Linux there would be another 10% gain in performance.

There's definitely something wrong with your setup because your A6000 should be comparable to my 3090.

Thanks for the info! Much appreciated. I'm starting again from scratch.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment