DCAI
Loading Light/Dark Toggl

Detailed usage of ComfyUI Control Net SD1.5 / SDXL

⏱️26min read
📅 Oct 07, 2024
Detailed usage of ComfyUI Control Net SD1.5 / SDXL featured Image
Supported by

In this article, we will discuss ComfyUI’s ControlNet, which is complicated to configure when using ControlNet with ComfyUI, but the basics are available in the standard installation.

PR
Image of Thermaltake LCGS Reactor i7TS Gaming Desktop (Intel Core™ i7-14700KF, 32GB 5600MT/s DDR5 RGB Memory, NVIDIA GeForce® RTX 4070 Ti Super, 2TB NVMe M.2, WiFi, Windows 11)
Thermaltake LCGS Reactor i7TS Gaming Desktop (Intel Core™ i7-14700KF, 32GB 5600MT/s DDR5 RGB Memory, NVIDIA GeForce® RTX 4070 Ti Super, 2TB NVMe M.2, WiFi, Windows 11)
🔗Amazon-Usa Link
Image of Corsair Vengeance i7400 Series Gaming PC - Liquid Cooled Intel® Core™ i9 12900K CPU - NVIDIA® GeForce RTX™ RTX 4090 GPU - 2TB M.2 SSD - 64GB Vengeance RGB DDR5 Memory - Black
Corsair Vengeance i7400 Series Gaming PC - Liquid Cooled Intel® Core™ i9 12900K CPU - NVIDIA® GeForce RTX™ RTX 4090 GPU - 2TB M.2 SSD - 64GB Vengeance RGB DDR5 Memory - Black
🔗Amazon-Usa Link

Basic usage of ControlNet with ComfyUI

First, let’s look at how to use the standard ControlNet. We will use the nodes in Add Node > conditioning > controlnet and the “Load ControlNet Model” in Add Node > loaders for the nodes needed for the ControlNet.

Download ControlNet Model

To use ControlNet, you need a model for each mode. Here are the models for V1.1 released by the ControlNet developer.

You do not need to download all the models at once, so first download control_v11p_sd15_scribble.pth and control_v11p_sd15_scribble.yaml, which you will use later, and place them in the ComfyUI directory ComfyUI/models/controlnet.

In addition to the model introduced here, there are other models for SD1.5/SDXL. Please refer to the previous article for more details if you are interested.

ControlNet Graph Basics

The ControlNet graph basically goes between the positive and negative prompts and the sampler and changes the conditioning.

There are “Apply ControlNet” and “Apply ControlNet (OLD)” as nodes to adapt ControlNet, but “Apply ControlNet (OLD)” is deprecation, so use “Apply ControlNet”.

Also, you will need “Load ControlNet Model” to load the model.

About Preprocessors

The ComfyUI preprocessor must be installed separately as a custom node because the data it contains in the standard installation would be huge.

To install, search for ComfyUI's ControlNet Auxiliary Preprocessors in the Custom Nodes Manager of the ComfyUI Manager and install.

Please refer to the following page for a detailed explanation of custom nodes.

There is no template like the Control Type in the A1111 WebUI, so you must combine them manually. Please refer to the previous article “About Control Type” for the functions of each preprocessor.

We have not explained all the uses of the nodes, but please refer to the examples of their use shown later in this section.

Explanation of official workflow

Let’s download the official workflow example and generate an illustration from a rough sketch.

Download the workflow images from the “Scribble ControlNet” section of the linked page and drag and drop them into ComfyUI or load them from the Load button. Also, download the input images as we will use them later.

When loading is complete, load the model of your choice from ckpt_name in the “Load Checkpoint” node.

Then load the control net model into the “Load ControlNet Model” node. In this case, we will use Scribble, so select control_v11p_sd15_scribble.pth.

Load the sample image you have just downloaded into the “Load Image” section.

This completes the setup. Click on “Queue Prompt” to generate it.

Improve the official Scribble ControlNet Examples

From here, we will try to improve the official Scribble ControlNet Examples into a more practical ControlNet workflow. Below is a list of what we would like to incorporate.

  • Generated with SDXL model.
  • Adjust the input image to make the generated image horizontal.
  • Adapt Clip skip.
  • Implement LoRA and adapt “Zoot Detailer XL”.
  • Negative embedding “negativeXL_D” was implemented to improve quality.
  • Implement a second pass because the quality of generation is not good.

Let us explain step by step.

Adjustment of input image:

We are going to place a 512×512 sized input image in the middle of a 1216×832 sized input image to make the generated image SDXL sized for this custom.

Add “ImageCompositeMasked” after “Load Image” and connect it to source. Then place “EmptyImage” and connect IMAGE out to destination.

Externalize the width/height values of the “Empty Latent Image” and “EmptyImage” by right-clicking on the node and selecting “Convert width/height to input” to make them the same size. Place two “Primitives” and connect them to their respective width/height values. Also, connect the IMAGE out of “ImageCompositeMasked” to “Preview Image” to confirm the image to be sent to the ControlNet.

Node Location
  • ImageCompositeMasked: image > ImageCompositeMasked
  • EmptyImage: image > EmptyImage
  • Primitive: utils > Primitive
  • Preview Image: image > Preview Image

Clip skip / LoRA adaptation:

Connect “CLIP Set Last Layer” to the CLIP out of “Load Checkpoint” and also connect “Load LoRA” to the CLIP out of “CLIP Set Last Layer”. The CLIP out of “Load LoRA” should be connected to the two “CLIP Text Encode (Prompt)”. And the MODEL out of “Load Checkpoint” should be connected to “Load LoRA” and from there to “KSampler”.

Node Location
  • CLIP Set Last Layer: conditioning > CLIP Set Last Layer
  • Load LoRA: loaders > Load LoRA

Sharing Seed:

Externalize the seed value by right-clicking on the KSampler node and selecting “Convert seed to input”. Drag the seed input ● to bring up a list of nodes that can be selected, and select “Primitive“.

Node Location
  • Primitive: utils > Primitive

Apply ControlNet Update:

The “Apply ControlNet (OLD)” used in the sample has been deprecated and should be replaced with a new “Apply ControlNet.

After placing the “Apply ControlNet”, connect the respective “CLIP Text Encode (Prompt)” to posivive/negative and “Load ControlNet Model” to control_net. Finally, connect the IMAGE out of “ImageCompositeMasked” to image. vae does not need to be connected.

Node Location
  • Apply ControlNet: conditioning > controlnet > Apply ControlNet

Implementation of a second path:

Magnify the image for a second pass. Connect “Upscale Image By” after “VAE Decode”.

To convert a sized image to a latent image, use “VAE Encode” to convert the image.

Make the first KSampler selected and copy it with Ctrl + c. Then, Ctrl + Shift + v allows you to copy the input as it is taken over, so bring it to the desired position while it is still connected.

Once the position is determined, the latent converted by “VAE Encode” as described earlier is connected to the latent_image input of the second KSampler.

Copy the positive prompt “CLIP Text Encode (Prompt)” with Ctrl + Shift + v and connect it to the second KSampler positive. Rewrite the following prompt for upscaling.

very detailed, intricate details, ultra detailed, masterpiece, best quality
Node Location
  • Upscale Image By: image > upscaling > Upscale Image By
  • VAE Encode: latent > VAE Encode
  • KSampler: sampling > KSampler
  • CLIP Text Encode (Prompt): conditioning > CLIP Text Encode (Prompt)

This completes the improved version of Scribble ControlNet Examples. The completed workflow is available free of charge on Patreon for your reference.

Using the improved Scribble ControlNet Examples

Checkpoint Model Selection:

Set the model by ckpt_name in “Load Checkpoint”. In this case, we will use the SDXL model AnythingXL_xl. You can use any model you like, but you will need to adjust some parameters for the model.

Clip skip settings:

We want to set Clip skip: 2 in A1111 WebUI, so set stop_at_clip_layer in “CLIP Set Last Layer” to -2.

Loading LoRA:

Select the LoRA model by lora_name in “Load LoRA”. We wanted to bring out the details, so we loaded Zoot Detailer XL. Use the default parameters.

Loading VAE:

Select the VAE model sdxl.vae.safetensors under vae_name in “Load VAE”.

We use “madebyollin/sdxl-vae-fp16-fix” which is lighter than the official VAE. Please download “sdxl.vae.safetensors” from the link below.

Input image settings:

Load the input image into “Load Image”. Load a sample image from the official workflow.

Set the width of the “Primitive” connected to the “Empty Latent Image” and the “EmptyImage” to 1216 height to 832.

Change the size of the input image in “ImageCompositeMasked” to be compatible with the SDXL model, setting x to 354 y to 160 and centering the input image.

Change prompts:

Add prompts according to the model you will be using. Overwrite the following prompts for each of the positives and negatives.

Positive prompts
masterpiece, best quality, white dress, (solo) girl (flat chest:0.9), (fennec ears:1.1) (fox ears:1.1), (blonde hair:1.0), messy hair, sky clouds, standing in a grass field, (chibi), blue eyes
Negative prompts
nude, (hands), text, error, cropped, (worst quality:1.2), (low quality:1.2), normal quality, (jpeg artifacts:1.3), signature, watermark, username, blurry, artist name, monochrome, sketch, censorship, censor, (copyright:1.2), extra legs, (forehead mark) (depth of field) (emotionless) (penis)

We have added masterpiece, best quality as quality modifier tags for “Anything XL” used in this workflow. Also, since the test generated a nude image, we added white dress to the positive prompt and nude to the negative prompt.

Negative Embedding Adaptation:

We want to adapt the negative embedding “negativeXL – D”, so we add the following prompt at the beginning of the negative prompt.

embedding:negativeXL_D, 

If you do not have nativeXL, download it from the link below and place it in your \ComfyUI\models\embeddings.

ControlNet Settings:

The controlnet model uses the Scribble model controlnet-scribble-sdxl-1.0 for SDXL published by xinsir. (*The “kohya_controlllite_xl_scribble_anime” introduced by the official site could not be used due to an error.) Download diffusion_pytorch_model.safetensors from the link below and rename it to a file name that is easy to understand like xinsir-controlnet-scribble-sdxl-1.0.safetensors. Put it in the \ComfyUI\models\controlnet directory.

Strength for “Apply ControlNet” is set to 0.50 because of its strong influence. start_percent and end_percent are used as default.

1st Pass setting:

Set the steps to the standard 20.

The cfg had a high contrast by default, so let’s lower it a bit and change it to 5.0.

For sampler, select dpmpp_2m, which is your choice, and use karras for scheduler.

Upscale settings:

Set upscale in “Upscale Image By”. In this example, since it is SDXL, it does not need to be very large, so set upscale_method to lanczos and scale_by to 1.50.

2nd Pass setting:

I wanted to increase the number of steps to improve the finish, but raising the number of steps did not make much difference, so I left the number of steps at 20.

Set CFG to 3.0 to make the prompt less emphatic.

The sampler can be chosen according to preference, but in this case, we used the same combination of dpmpp_2m and karras as in the 1st Pass.

We want to increase the write volume, so let’s set Denoising strength to 0.60.

Generate:

Now that the configuration is complete, let’s generate it with the “Queue Promt” button.

Generation results
Generation results
Seed:678787856375407
Open Image
Supported by

Example of ControlNet Usage

From here on, we will introduce a workflow similar to A1111 WebUI. Install the custom node “ComfyUI’s ControlNet Auxiliary Preprocessors” as it is required to convert the input image to an image suitable for ControlNet. We also use “Image Chooser” to make the image sent to the 2nd pass optional.

Pose Reference

Pose references are generated by analyzing the person using DensePose Estimator and using the SDXL model. We also use multiple control net adaptation examples. First download it from Patreon.

Workflow Description

Basic Info

  • Load Checkpoint: Load the Checkpoint model. In our example we will use animagineXLV31_v31.
  • CLIP Set Last Layer: Clip skip: 2 in A1111 WebUI, so set stop_at_clip_layer to -2.
  • Load LoRA: Load LoRA. This time, use add-detail-xl to add detail, and set strength_model to 2.00 to add more detail.
  • Primitive-Seeds: Seeds used for KSampler are externalized and shared with this node.
  • CLIP Text Encode (Prompt)-Positive: Positive prompts are entered for skaters and background prompts.
  • CLIP Text Encode (Prompt)-Negative: Keep it simple and don’t include too many negative prompts.
  • Empty Latent Image: Since the SDXL model is used for this generation, the appropriate size is set to 1344x768.

Load Image

First, let’s load the 🔗man performing skateboard trick during daytime, borrowed from Unsplash.

Input Image
Use the Medium size
Open Image

After loading is complete, we want to mask the skaters, so we select Open in MaskEditor from the node’s right-click menu to mask the skaters.

MaskEditor
MaskEditor mask example
Open Image

Mask

The graph is used to change the size of the mask to be generated.

  • Convert Mask to Image: To edit the mask, it must be changed to Image once, so let’s convert it at this node.
  • Enchance And Resize Hint Images: Convert to Generation Size. The size is adapted to the size extracted by “Generation Resolution From Latent” in the Image Composition Group.
  • Invert Image: We want to extract the skater, so let’s invert the black and white of the mask image.
  • ImageBlur: We want to smooth the borders of the mask, so we apply blur.
  • Convert Image to Mask: Lastly, the image is converted to a mask.

Image Composition

The input image is vertical, but the size we want to generate is horizontal. Let’s use this graph to resize it, and then use a mask to attach the skater to it.

  • Enchance And Resize Hint Images: Convert to Generation Size. Apply the size extracted in “Generation Resolution From Latent”.
  • Generation Resolution From Latent: Latent Image size is picked out.
  • Join Image with Alpha: Extract from the resized input image using a mask.
  • Preview Image: It is set up to check the extracted images.

Preprocess

The image is converted to make it adaptable to the ControlNet.

  • DensePose Estimator: Generates a DensePose from the input image.
  • AnyLine Lineart: Generate a Lineart from the input image. In this case, lineart_realstic is used.
  • Preview Image: It is set up to check each preprocessor.

ControlNet

When multiple ControlNets are applied, they are installed as shown in this graph.

  • Load ControlNet Model-Densepose: To use DensePose with SDXL, use jschoormans-controlnet-densepose-sdxl.safetensors. Download diffusion_pytorch_model.safetensors from the link below and rename it to something more descriptive like jschoormans-controlnet-densepose-sdxl.safetensors.
  • Apply ControlNet-Densepose: Apply DensePose. Set strength to 0.70 to weaken the weights a little. Set start_percent to 0.050 to color the background.
  • Load ControlNet Model-Lineart: Use xinsir-controlnet-scribble-sdxl-1-0.safetensors to reference skater line art. Download diffusion_pytorch_model.safetensors (V2 is also fine) from the link below and rename it to something more descriptive like xinsir-controlnet-scribble-sdxl-1-0.safetensors.
  • Apply ControlNet-Lineart: To make the skater look artistic, the weight is lowered a little to 0.90. To color the background, start_percent is set to 0.100. And to reflect the prompts on the entire image, end_percent is set to 0.900.

1st Pass

Mostly default, but cfg is 5.0 sampler is dpmpp_2m scheduler is karras.

Preview Chooser

This is introduced to confirm the image generated by 1st Pass. If you are satisfied with the result, you can proceed by clicking the “Progress selected image as restart” button.

Upscale

This time, the image is upscaled using “ImageScaleToTotalPixels”. The size is set to 3.00 megapixels.

2nd Pass

We keep it almost the same as 1st Pass, with denoise set to 0.40 to adjust for the new write volume.

This is the end of the explanation of the pose references.

Generation Result
Generated result of pose reference
Seed:242305135254817
Open Image

Refinement of the generated image

Tile is used in the SDXL model to perform the refinement. This workflow will use the custom node “Tiled Diffusion & VAE for ComfyUI”, so let’s install it. First, download it from Patreon.

Workflow Description

Basic Info

Basic workflow information is grouped here.

  • Marge Checkpoints:Merge checkpoint models. Details are explained in the next section.
  • CLIP Set Last Layer: Clip skip: 2 in A1111 WebUI, so set stop_at_clip_layer to -2.
  • Load LoRA: Load LoRA. This time, use add-detail-xl to add detail, and set strength_model to 3.00 to add more detail.
  • Empty Latent Image: Since the SDXL model is used for this generation, the appropriate size is set to 1344x768.
  • Primitive-Seed: Seeds used for KSampler are externalized and shared with this node.
  • CLIP Text Encode (Prompt)-Positive: Nothing special, but a prompt for a young girl standing in a medieval castle market.
  • CLIP Text Encode (Prompt)-Negative: We used the negative prompt “negativeXL_D”. Also, we used heart pupils because the pupils of the eyes were generated with pink.

Marge Checkpoints

This will be the graph when you want to merge checkpoint models. We use “ModelMergeSimple” to do the merge, and CLIP uses whichever base/sub you prefer.

  • Load Checkpoint-base:It will be the base checkpoint model. fiamixXL_v40.safetensers will be used.
  • Load Checkpoint-sub:This will be a sub checkpoint model. We wanted to make it look a little more anime-like, so we used AnythingXL_xl.safetensers. In this case, we will use this Clip.
  • ModelMergeSimple:Merge checkpoint models. retio is set to 0.85 because model1 (base) is 100% at 1.00, so we add a little sub element.
  • Save Checkpoint:If you want to save the merged checkpoint model, use this node.

1stPass

Mostly default, but cfg is set to 4.0, sampler is set to euler scheduler is set to beta.

Preview Chooser

This is introduced to confirm the image generated by 1st Pass. If you are satisfied with the result, you can proceed by clicking the “Progress selected image as restart” button.

Upscale

Preprocess

Use “TTPlanet Tile GuidedFilter” to convert for control nets. resolution is set to 1024; those with less VRAM can use the default of 512 or 768.

ControlNet

Tiled Diffusion

  • Tiled Diffusion: Set the tile size for Tiled Diffusion. In this case, 1024 will be used; those with less VRAM can use the default of 768 or 512.
  • Tiled VAE Encode: This one is also used at 1024. This value should also be lowered if you have less VRAM.

2nd Pass

Here is also almost default, but the cfg is set to 7.0 sampler and the euler scheduler is set to beta. In this case, set it to 0.40.

Tiled VAE Decode

Decode using Tiled VAE Decode. If you have less VRAM, lower the tile_size.

This is the end of the explanation of the refinement of the generated image.

Generation Result
Generated results of the refinement of the generated image
Seed:313039866590761
Open Image

If you have a high-spec machine, you can try megapixels of 6.00 or 8.00 in “ImageScaleToTotalPixels” in the Upscale group.

Generated image refinement generation result 8 megapixels
This image has been reduced to 2K size
Open Image

If you would like to see the 8.00 megapixels (3824 x 2184 pixels) generated, you can download them from GooleDrive below.

Conclusion

In this article, we have shown you how to use ControlNet with ComfyUI; compared to A1111 WebUI, ControlNet may seem a bit more complicated because it does not have templates like control types. However, if you know how to use it, you can control the generation process in detail.

In addition, although we did not introduce it in this report, it can be generated even faster by using the T2I-Adapter, which is introduced by the official. Furthermore, if you want to use ControlNet for AI videos, we recommend using the custom node “ComfyUI-Advanced-ControlNet”.

PR
Image of ASUS TUF Gaming GeForce RTX™ 4090 OG OC Edition Gaming Graphics Card (PCIe 4.0, 24GB GDDR6X, DLSS 3, HDMI 2.1, DisplayPort 1.4a)
ASUS TUF Gaming GeForce RTX™ 4090 OG OC Edition Gaming Graphics Card (PCIe 4.0, 24GB GDDR6X, DLSS 3, HDMI 2.1, DisplayPort 1.4a)
🔗Amazon-Usa Link
Image of Seasonic Focus V4 GX-1000 (ATX3) -1000W -80+ Gold -Full-Modular -ATX Form Factor -Premium Japanese Capacitor -10 Year Warranty -Nvidia RTX 30/40 Super & AMD GPU Compatible -Focus V4 GX-1000 (ATX3)
Seasonic Focus V4 GX-1000 (ATX3) -1000W -80+ Gold -Full-Modular -ATX Form Factor -Premium Japanese Capacitor -10 Year Warranty -Nvidia RTX 30/40 Super & AMD GPU Compatible -Focus V4 GX-1000 (ATX3)
🔗Amazon-Usa Link
Supported by