File size: 2,310 Bytes
9e2e766 292dece 9e2e766 5566ba7 9e2e766 5566ba7 9e2e766 880c34c 9e2e766 880c34c 9e2e766 880c34c 5566ba7 d528dbe 5566ba7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
---
license: creativeml-openrail-m
base_model: runwayml/stable-diffusion-v1-5
tags:
- stable-diffusion
- stable-diffusion-diffusers
- text-to-image
- diffusers
- controlnet
- jax-diffusers-event
inference: true
datasets:
- mfidabel/sam-coyo-2k
- mfidabel/sam-coyo-2.5k
- mfidabel/sam-coyo-3k
language:
- en
library_name: diffusers
---
# ControlNet - mfidabel/controlnet-segment-anything
These are controlnet weights trained on runwayml/stable-diffusion-v1-5 with a new type of conditioning. You can find some example images in the following.
**prompt**: contemporary living room of a house
**negative prompt**: low quality

**prompt**: new york buildings, Vincent Van Gogh starry night
**negative prompt**: low quality, monochrome

**prompt**: contemporary living room, high quality, 4k, realistic
**negative prompt**: low quality, monochrome, low res

## Limitations and Bias
- The model can't render text
- Landscapes with fewer segments tend to render better
- Some segmentation maps tend to render in monochrome (use a negative_prompt to get around it)
- Some generated images can be over saturated
- Shorter prompts usually work better, as long as it makes sense with the input segmentation map
- The model is biased to produce more paintings images rather than realistic images, as there are a lot of paintings in the training dataset
## Training
**Training Data** This model was trained using a Segmented dataset based on the [COYO-700M Dataset](https://huggingface.co/datasets/kakaobrain/coyo-700m).
[Stable Diffusion v1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5) checkpoint was used as the base model for the controlnet.
The model was trained as follows:
- 25k steps with the [SAM-COYO-2k](https://huggingface.co/datasets/mfidabel/sam-coyo-2k) dataset
- 28k steps with the [SAM-COYO-2.5k](https://huggingface.co/datasets/mfidabel/sam-coyo-2.5k) dataset
- 38k steps with the [SAM-COYO-3k](https://huggingface.co/datasets/mfidabel/sam-coyo-3k) dataset
In that particular order.
- **Hardware**: Google Cloud TPUv4-8 VM
- **Optimizer**: AdamW
- **Train Batch Size**: 2 x 4 = 8
- **Learning rate**: 0.00001 constant
- **Gradient Accumulation Steps**: 1
- **Resolution**: 512
|