ColorizeDiffusion: Adjustable Sketch Colorization with Reference Image and Text
(April. 2025) Official implementation of Colorize Diffusion.
Colorize Diffusion is a SD-based colorization framework that can achieve high-quality colorization results with arbitrary input pairs.
Fundamental issue for this repository: ColorizeDiffusion (e-print).
Version 1 - Base training, 512px. Released, ckpt starts with mult.
Version 1.5 - Solving spatial entanglement, 512px. Released, ckpt starts with switch.
Version 2 - Enhancing background and style transfer, 768px. Released, ckpt starts with v2.
Version XL - Enhancing embedding guidance for character colorization, geometry disentanglement, 1024px. Available soon.
Getting Start
conda env create -f environment.yaml
conda activate hf
User Interface
We implement a fully-featured UI. To run it, just:
python -u app.py
The default server address is http://localhost:7860.
Important inference options
Options | Description |
---|---|
BG enhance | Low-level feature injection for v2 models. |
FG enhance | Useless for currently open-sourced models. |
Reference strength | Decreasing it to increase semantic fidelity to sketch inputs. |
Foreground strength | Similar to reference strength but only for foreground region. Need to activate FG or BG enhance. |
Preprocessor | Sketch preprocessing. Extract is suggested if the sketch input is complicated pencil drawing. |
Line extractor | Line extractors used when preprocessor is Extract. |
Sketch guidance scale | Classifier-free guidance scale of the sketch image, suggested 1. |
Attention injection | Noised low-level feature injection, 2x inference time. |
768-level Cross-content colorization results (from v2)
1536-level Character colorization results (from XL)
Manipulation
The colorization results can be manipulated using text prompts, see ColorizeDiffusion (e-print).
It is now deactivated by default. To activate it, use
python -u app.py -manipulate
For local manipulations, a visualization is provided to show the correlation between each prompt and tokens in the reference image.
The manipulation result and correlation visualization of the settings:
Target prompt: the girl's blonde hair
Anchor prompt the girl's brown hair
Control prompt the girl's brown hair,
Target scale: 8
Enhanced: false
Thresholds: 0.5γ0.55γ0.65γ0.95
As you can see, the manipluation unavoidably changed some unrelated regions as it is taken on the reference embeddings.
Manipulation options
Options | Description |
---|---|
Group index | The index of selected manipulation sequences's parameter group. |
Target prompt | The prompt used to specify the desired visual attribute for the image after manipulation. |
Anchor prompt | The prompt to specify the anchored visaul attribute for the image before manipulation. |
Control prompt | Used for local manipulation (crossattn-based models). The prompt to specify the target regions. |
Enhance | Specify whether this manipulation should be enhanced or not. (More likely to influence unrelated attribute). |
Target scale | The scale used to progressively control the manipulation. |
Thresholds | Used for local manipulation (crossattn-based models). Four hyperparameters used to reduce the influnece on irrelevant visual attributes, where 0.0 < threshold 0 < threshold 1 < threshold 2 < threshold 3 < 1.0. |
<Threshold0 | Select regions most related to control prompt. Indicated by deep blue. |
Threshold0-Threshold1 | Select regions related to control prompt. Indicated by blue. |
Threshold1-Threshold2 | Select neighbouring but unrelated regions. Indicated by green. |
Threshold2-Threshold3 | Select unrelated regions. Indicated by orange. |
>Threshold3 | Select most unrelated regions. Indicated by brown. |
Add | Click add to save current manipulation in the sequence. |
Training
Our implementation is based on Accelerate and Deepspeed.
Before starting a training, first collect data and organize your training dataset as follows:
[dataset_path]
βββ image_list.json # Optionally for image indexing
βββ color/ # Color images
β βββ 0001.zip
| | βββ 10001.png
| | βββ 100001.jpg
β | βββ ...
β βββ 0002.zip
β βββ ...
βββ sketch # Sketch images
β βββ 0001.zip
| | βββ 10001.png
| | βββ 100001.jpg
β | βββ ...
β βββ 0002.zip
β βββ ...
βββ mask # Mask images (required for fg-bg training)
βββ 0001.zip
| βββ 10001.png
| βββ 100001.jpg
| βββ ...
βββ 0002.zip
βββ ...
For details of dataset organization, check data/dataloader.py
.
Training command example:
accelerate launch --config_file [accelerate_config_file] \
train.py \
--name base \
--dataroot [dataset_path] \
--batch_size 64 \
--num_threads 8 \
-cfg configs/train/sd2.1/mult.yaml \
-pt [pretrained_model_path]
Refer to options.py
for training/inference/validation arguments.
Note that the batch size
here is micro batch size per gpu. If you run the command on 8 gpus, the total batch size is 512.
Code reference
- Stable Diffusion v2
- Stable Diffusion XL
- SD-webui-ControlNet
- Stable-Diffusion-webui
- K-diffusion
- Deepspeed
- sketchKeras-PyTorch
Citation
@article{2024arXiv240101456Y,
author = {{Yan}, Dingkun and {Yuan}, Liang and {Wu}, Erwin and {Nishioka}, Yuma and {Fujishiro}, Issei and {Saito}, Suguru},
title = "{ColorizeDiffusion: Adjustable Sketch Colorization with Reference Image and Text}",
journal = {arXiv e-prints},
year = {2024},
doi = {10.48550/arXiv.2401.01456},
}
@InProceedings{Yan_2025_WACV,
author = {Yan, Dingkun and Yuan, Liang and Wu, Erwin and Nishioka, Yuma and Fujishiro, Issei and Saito, Suguru},
title = {ColorizeDiffusion: Improving Reference-Based Sketch Colorization with Latent Diffusion Model},
booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)},
year = {2025},
pages = {5092-5102}
}
@article{2025arXiv250219937Y,
author = {{Yan}, Dingkun and {Wang}, Xinrui and {Li}, Zhuoru and {Saito}, Suguru and {Iwasawa}, Yusuke and {Matsuo}, Yutaka and {Guo}, Jiaxian},
title = "{Image Referenced Sketch Colorization Based on Animation Creation Workflow}",
journal = {arXiv e-prints},
year = {2025},
doi = {10.48550/arXiv.2502.19937},
}
@article{yan2025colorizediffusionv2enhancingreferencebased,
title={ColorizeDiffusion v2: Enhancing Reference-based Sketch Colorization Through Separating Utilities},
author={Dingkun Yan and Xinrui Wang and Yusuke Iwasawa and Yutaka Matsuo and Suguru Saito and Jiaxian Guo},
year={2025},
journal = {arXiv e-prints},
doi = {10.48550/arXiv.2504.06895},
}