Detailed usage of ComfyUI Control Net SD1.5 / SDXL
In this article, we will discuss ComfyUI’s ControlNet, which is complicated to configure when using ControlNet with ComfyUI, but the basics are available in the standard installation.
Basic usage of ControlNet with ComfyUI
First, let’s look at how to use the standard ControlNet. We will use the nodes in Add Node > conditioning > controlnet
and the “Load ControlNet Model” in Add Node > loaders
for the nodes needed for the ControlNet.
Download ControlNet Model
To use ControlNet, you need a model for each mode. Here are the models for V1.1 released by the ControlNet developer.
You do not need to download all the models at once, so first download control_v11p_sd15_scribble.pth
and control_v11p_sd15_scribble.yaml
, which you will use later, and place them in the ComfyUI directory ComfyUI/models/controlnet
.
In addition to the model introduced here, there are other models for SD1.5/SDXL. Please refer to the previous article for more details if you are interested.
ControlNet Graph Basics
The ControlNet graph basically goes between the positive and negative prompts and the sampler and changes the conditioning.
There are “Apply ControlNet” and “Apply ControlNet (OLD)” as nodes to adapt ControlNet, but “Apply ControlNet (OLD)” is deprecation, so use “Apply ControlNet”.
Also, you will need “Load ControlNet Model” to load the model.
About Preprocessors
The ComfyUI preprocessor must be installed separately as a custom node because the data it contains in the standard installation would be huge.
To install, search for ComfyUI's ControlNet Auxiliary Preprocessors
in the Custom Nodes Manager of the ComfyUI Manager and install.
Please refer to the following page for a detailed explanation of custom nodes.
There is no template like the Control Type in the A1111 WebUI, so you must combine them manually. Please refer to the previous article “About Control Type” for the functions of each preprocessor.
We have not explained all the uses of the nodes, but please refer to the examples of their use shown later in this section.
Explanation of official workflow
Let’s download the official workflow example and generate an illustration from a rough sketch.
Download the workflow images from the “Scribble ControlNet” section of the linked page and drag and drop them into ComfyUI or load them from the Load button. Also, download the input images as we will use them later.
When loading is complete, load the model of your choice from ckpt_name in the “Load Checkpoint” node.
Then load the control net model into the “Load ControlNet Model” node. In this case, we will use Scribble, so select control_v11p_sd15_scribble.pth
.
Load the sample image you have just downloaded into the “Load Image” section.
This completes the setup. Click on “Queue Prompt” to generate it.
Improve the official Scribble ControlNet Examples
From here, we will try to improve the official Scribble ControlNet Examples into a more practical ControlNet workflow. Below is a list of what we would like to incorporate.
- Generated with SDXL model.
- Adjust the input image to make the generated image horizontal.
- Adapt Clip skip.
- Implement LoRA and adapt “Zoot Detailer XL”.
- Negative embedding “negativeXL_D” was implemented to improve quality.
- Implement a second pass because the quality of generation is not good.
Let us explain step by step.
Adjustment of input image:
We are going to place a 512×512 sized input image in the middle of a 1216×832 sized input image to make the generated image SDXL sized for this custom.
Add “ImageCompositeMasked” after “Load Image” and connect it to source. Then place “EmptyImage” and connect IMAGE out to destination.
Externalize the width/height values of the “Empty Latent Image” and “EmptyImage” by right-clicking on the node and selecting “Convert width/height to input” to make them the same size. Place two “Primitives” and connect them to their respective width/height values. Also, connect the IMAGE out of “ImageCompositeMasked” to “Preview Image” to confirm the image to be sent to the ControlNet.
- ImageCompositeMasked: image > ImageCompositeMasked
- EmptyImage: image > EmptyImage
- Primitive: utils > Primitive
- Preview Image: image > Preview Image
Clip skip / LoRA adaptation:
Connect “CLIP Set Last Layer” to the CLIP out of “Load Checkpoint” and also connect “Load LoRA” to the CLIP out of “CLIP Set Last Layer”. The CLIP out of “Load LoRA” should be connected to the two “CLIP Text Encode (Prompt)”. And the MODEL out of “Load Checkpoint” should be connected to “Load LoRA” and from there to “KSampler”.
- CLIP Set Last Layer: conditioning > CLIP Set Last Layer
- Load LoRA: loaders > Load LoRA
Sharing Seed:
Externalize the seed value by right-clicking on the KSampler node and selecting “Convert seed to input”. Drag the seed input ● to bring up a list of nodes that can be selected, and select “Primitive“.
- Primitive: utils > Primitive
Apply ControlNet Update:
The “Apply ControlNet (OLD)” used in the sample has been deprecated and should be replaced with a new “Apply ControlNet.
After placing the “Apply ControlNet”, connect the respective “CLIP Text Encode (Prompt)” to posivive/negative and “Load ControlNet Model” to control_net. Finally, connect the IMAGE out of “ImageCompositeMasked” to image. vae does not need to be connected.
- Apply ControlNet: conditioning > controlnet > Apply ControlNet
Implementation of a second path:
Magnify the image for a second pass. Connect “Upscale Image By” after “VAE Decode”.
To convert a sized image to a latent image, use “VAE Encode” to convert the image.
Make the first KSampler selected and copy it with Ctrl + c
. Then, Ctrl + Shift + v
allows you to copy the input as it is taken over, so bring it to the desired position while it is still connected.
Once the position is determined, the latent converted by “VAE Encode” as described earlier is connected to the latent_image input of the second KSampler.
Copy the positive prompt “CLIP Text Encode (Prompt)” with Ctrl + Shift + v
and connect it to the second KSampler positive. Rewrite the following prompt for upscaling.
very detailed, intricate details, ultra detailed, masterpiece, best quality
- Upscale Image By: image > upscaling > Upscale Image By
- VAE Encode: latent > VAE Encode
- KSampler: sampling > KSampler
- CLIP Text Encode (Prompt): conditioning > CLIP Text Encode (Prompt)
This completes the improved version of Scribble ControlNet Examples. The completed workflow is available free of charge on Patreon for your reference.
Using the improved Scribble ControlNet Examples
Checkpoint Model Selection:
Set the model by ckpt_name in “Load Checkpoint”. In this case, we will use the SDXL model AnythingXL_xl
. You can use any model you like, but you will need to adjust some parameters for the model.
Clip skip settings:
We want to set Clip skip: 2 in A1111 WebUI, so set stop_at_clip_layer in “CLIP Set Last Layer” to -2
.
Loading LoRA:
Select the LoRA model by lora_name in “Load LoRA”. We wanted to bring out the details, so we loaded Zoot Detailer XL
. Use the default parameters.
Loading VAE:
Select the VAE model sdxl.vae.safetensors
under vae_name in “Load VAE”.
We use “madebyollin/sdxl-vae-fp16-fix” which is lighter than the official VAE. Please download “sdxl.vae.safetensors” from the link below.
Input image settings:
Load the input image into “Load Image”. Load a sample image from the official workflow.
Set the width of the “Primitive” connected to the “Empty Latent Image” and the “EmptyImage” to 1216
height to 832
.
Change the size of the input image in “ImageCompositeMasked” to be compatible with the SDXL model, setting x to 354
y to 160
and centering the input image.
Change prompts:
Add prompts according to the model you will be using. Overwrite the following prompts for each of the positives and negatives.
Positive promptsmasterpiece, best quality, white dress, (solo) girl (flat chest:0.9), (fennec ears:1.1) (fox ears:1.1), (blonde hair:1.0), messy hair, sky clouds, standing in a grass field, (chibi), blue eyes
nude, (hands), text, error, cropped, (worst quality:1.2), (low quality:1.2), normal quality, (jpeg artifacts:1.3), signature, watermark, username, blurry, artist name, monochrome, sketch, censorship, censor, (copyright:1.2), extra legs, (forehead mark) (depth of field) (emotionless) (penis)
We have added masterpiece, best quality
as quality modifier tags for “Anything XL” used in this workflow. Also, since the test generated a nude image, we added white dress
to the positive prompt and nude
to the negative prompt.
Negative Embedding Adaptation:
We want to adapt the negative embedding “negativeXL – D”, so we add the following prompt at the beginning of the negative prompt.
embedding:negativeXL_D,
If you do not have nativeXL, download it from the link below and place it in your \ComfyUI\models\embeddings
.
ControlNet Settings:
The controlnet model uses the Scribble model controlnet-scribble-sdxl-1.0
for SDXL published by xinsir. (*The “kohya_controlllite_xl_scribble_anime” introduced by the official site could not be used due to an error.) Download diffusion_pytorch_model.safetensors
from the link below and rename it to a file name that is easy to understand like xinsir-controlnet-scribble-sdxl-1.0.safetensors. Put it in the \ComfyUI\models\controlnet
directory.
Strength for “Apply ControlNet” is set to 0.50
because of its strong influence. start_percent and end_percent are used as default.
1st Pass setting:
Set the steps to the standard 20
.
The cfg had a high contrast by default, so let’s lower it a bit and change it to 5.0
.
For sampler, select dpmpp_2m
, which is your choice, and use karras
for scheduler.
Upscale settings:
Set upscale in “Upscale Image By”. In this example, since it is SDXL, it does not need to be very large, so set upscale_method to lanczos
and scale_by to 1.50
.
2nd Pass setting:
I wanted to increase the number of steps to improve the finish, but raising the number of steps did not make much difference, so I left the number of steps at 20
.
Set CFG to 3.0
to make the prompt less emphatic.
The sampler can be chosen according to preference, but in this case, we used the same combination of dpmpp_2m
and karras
as in the 1st Pass.
We want to increase the write volume, so let’s set Denoising strength to 0.60
.
Generate:
Now that the configuration is complete, let’s generate it with the “Queue Promt” button.
Generation resultsExample of ControlNet Usage
From here on, we will introduce a workflow similar to A1111 WebUI. Install the custom node “ComfyUI’s ControlNet Auxiliary Preprocessors” as it is required to convert the input image to an image suitable for ControlNet. We also use “Image Chooser” to make the image sent to the 2nd pass optional.
Pose Reference
Pose references are generated by analyzing the person using DensePose Estimator and using the SDXL model. We also use multiple control net adaptation examples. First download it from Patreon.
Workflow DescriptionBasic Info
- Load Checkpoint: Load the Checkpoint model. In our example we will use
animagineXLV31_v31
. - CLIP Set Last Layer: Clip skip: 2 in A1111 WebUI, so set stop_at_clip_layer to
-2
. - Load LoRA: Load LoRA. This time, use
add-detail-xl
to add detail, and set strength_model to2.00
to add more detail. - Primitive-Seeds: Seeds used for KSampler are externalized and shared with this node.
- CLIP Text Encode (Prompt)-Positive: Positive prompts are entered for skaters and background prompts.
- CLIP Text Encode (Prompt)-Negative: Keep it simple and don’t include too many negative prompts.
- Empty Latent Image: Since the SDXL model is used for this generation, the appropriate size is set to
1344x768
.
Load Image
First, let’s load the 🔗man performing skateboard trick during daytime, borrowed from Unsplash.
After loading is complete, we want to mask the skaters, so we select Open in MaskEditor from the node’s right-click menu to mask the skaters.
Mask
The graph is used to change the size of the mask to be generated.
- Convert Mask to Image: To edit the mask, it must be changed to Image once, so let’s convert it at this node.
- Enchance And Resize Hint Images: Convert to Generation Size. The size is adapted to the size extracted by “Generation Resolution From Latent” in the Image Composition Group.
- Invert Image: We want to extract the skater, so let’s invert the black and white of the mask image.
- ImageBlur: We want to smooth the borders of the mask, so we apply blur.
- Convert Image to Mask: Lastly, the image is converted to a mask.
Image Composition
The input image is vertical, but the size we want to generate is horizontal. Let’s use this graph to resize it, and then use a mask to attach the skater to it.
- Enchance And Resize Hint Images: Convert to Generation Size. Apply the size extracted in “Generation Resolution From Latent”.
- Generation Resolution From Latent: Latent Image size is picked out.
- Join Image with Alpha: Extract from the resized input image using a mask.
- Preview Image: It is set up to check the extracted images.
Preprocess
The image is converted to make it adaptable to the ControlNet.
- DensePose Estimator: Generates a DensePose from the input image.
- AnyLine Lineart: Generate a Lineart from the input image. In this case,
lineart_realstic
is used. - Preview Image: It is set up to check each preprocessor.
ControlNet
When multiple ControlNets are applied, they are installed as shown in this graph.
- Load ControlNet Model-Densepose: To use DensePose with SDXL, use
jschoormans-controlnet-densepose-sdxl.safetensors
. Download diffusion_pytorch_model.safetensors from the link below and rename it to something more descriptive likejschoormans-controlnet-densepose-sdxl.safetensors
. - Apply ControlNet-Densepose: Apply DensePose. Set strength to
0.70
to weaken the weights a little. Set start_percent to0.050
to color the background. - Load ControlNet Model-Lineart: Use
xinsir-controlnet-scribble-sdxl-1-0.safetensors
to reference skater line art. Download diffusion_pytorch_model.safetensors (V2 is also fine) from the link below and rename it to something more descriptive likexinsir-controlnet-scribble-sdxl-1-0.safetensors
. - Apply ControlNet-Lineart: To make the skater look artistic, the weight is lowered a little to
0.90
. To color the background, start_percent is set to0.100
. And to reflect the prompts on the entire image, end_percent is set to0.900
.
1st Pass
Mostly default, but cfg is 5.0
sampler is dpmpp_2m
scheduler is karras
.
Preview Chooser
This is introduced to confirm the image generated by 1st Pass. If you are satisfied with the result, you can proceed by clicking the “Progress selected image as restart” button.
Upscale
This time, the image is upscaled using “ImageScaleToTotalPixels”. The size is set to 3.00
megapixels.
2nd Pass
We keep it almost the same as 1st Pass, with denoise set to 0.40
to adjust for the new write volume.
This is the end of the explanation of the pose references.
Generation ResultRefinement of the generated image
Tile is used in the SDXL model to perform the refinement. This workflow will use the custom node “Tiled Diffusion & VAE for ComfyUI”, so let’s install it. First, download it from Patreon.
Basic Info
Basic workflow information is grouped here.
- Marge Checkpoints:Merge checkpoint models. Details are explained in the next section.
- CLIP Set Last Layer: Clip skip: 2 in A1111 WebUI, so set stop_at_clip_layer to
-2
. - Load LoRA: Load LoRA. This time, use
add-detail-xl
to add detail, and set strength_model to3.00
to add more detail. - Empty Latent Image: Since the SDXL model is used for this generation, the appropriate size is set to
1344x768
. - Primitive-Seed: Seeds used for KSampler are externalized and shared with this node.
- CLIP Text Encode (Prompt)-Positive: Nothing special, but a prompt for a young girl standing in a medieval castle market.
- CLIP Text Encode (Prompt)-Negative: We used the negative prompt “negativeXL_D”. Also, we used
heart pupils
because the pupils of the eyes were generated with pink.
Marge Checkpoints
This will be the graph when you want to merge checkpoint models. We use “ModelMergeSimple” to do the merge, and CLIP uses whichever base/sub you prefer.
- Load Checkpoint-base:It will be the base checkpoint model.
fiamixXL_v40.safetensers
will be used. - Load Checkpoint-sub:This will be a sub checkpoint model. We wanted to make it look a little more anime-like, so we used
AnythingXL_xl.safetensers
. In this case, we will use this Clip. - ModelMergeSimple:Merge checkpoint models. retio is set to
0.85
because model1 (base) is 100% at 1.00, so we add a little sub element. - Save Checkpoint:If you want to save the merged checkpoint model, use this node.
1stPass
Mostly default, but cfg is set to 4.0
, sampler is set to euler
scheduler is set to beta
.
Preview Chooser
This is introduced to confirm the image generated by 1st Pass. If you are satisfied with the result, you can proceed by clicking the “Progress selected image as restart” button.
Upscale
- Load Upscale Model: Load the upscaler model. In this case,
4x-foolhardy_Remacri.pth
. - Upscale Image (using Model): Upscale using the upscaler model.
- ImageScaleToTotalPixels: Scales the image by the specified pixels. The aspect ratio is preserved.
Preprocess
Use “TTPlanet Tile GuidedFilter” to convert for control nets. resolution is set to 1024
; those with less VRAM can use the default of 512
or 768
.
ControlNet
- Load ControlNet Model: Use the Tile model “ttplanetSDXLControlnet_v20Fp16.safetensors” for SDXL.
- Apply ControlNet: You can adjust the finish with end_percent, although it is not very influential. In this case, we will use an average value of
0.500
. If this value is too high, the amount of drawing will be reduced and a high-contrast illustration will be produced; adjust the amount of drawing in balance with the denoise in 2ndPass.
Tiled Diffusion
- Tiled Diffusion: Set the tile size for Tiled Diffusion. In this case,
1024
will be used; those with less VRAM can use the default of768
or512
. - Tiled VAE Encode: This one is also used at
1024
. This value should also be lowered if you have less VRAM.
2nd Pass
Here is also almost default, but the cfg is set to 7.0
sampler and the euler
scheduler is set to beta
. In this case, set it to 0.40
.
Tiled VAE Decode
Decode using Tiled VAE Decode. If you have less VRAM, lower the tile_size.
This is the end of the explanation of the refinement of the generated image.
Generation ResultIf you have a high-spec machine, you can try megapixels of 6.00
or 8.00
in “ImageScaleToTotalPixels” in the Upscale group.
If you would like to see the 8.00 megapixels (3824 x 2184 pixels) generated, you can download them from GooleDrive below.
Conclusion
In this article, we have shown you how to use ControlNet with ComfyUI; compared to A1111 WebUI, ControlNet may seem a bit more complicated because it does not have templates like control types. However, if you know how to use it, you can control the generation process in detail.
In addition, although we did not introduce it in this report, it can be generated even faster by using the T2I-Adapter, which is introduced by the official. Furthermore, if you want to use ControlNet for AI videos, we recommend using the custom node “ComfyUI-Advanced-ControlNet”.