DCAI
Loading Light/Dark Toggl

Detailed usage of ComfyUI Stable Diffusion 3.5 including how to use LoRA

⏱️18min read
📅 Dec 03, 2024
Detailed usage of ComfyUI Stable Diffusion 3.5 including how to use LoRA featured Image
Supported by

Stability AI announced “Stable Diffusion 3” in the spring of 2024, and the power of the generated images in the demo became popular, However, when the demo was released to the public, it was not as good as the demo, and the generated image of a woman lying on the grass was a hot topic on the Internet for a while. “Stable Diffusion 3.5” was released to regain the popularity of the demo. The current craze for free locally generated models is Black Forest Labs’s “Flux.1,” but is this not a model to compete with it? In this article, we will show you how to use the three models of “Stable Diffusion 3.5”: Mideum, Large, and Large Turbo.

PR
Image of Corsair RM1000x Shift Fully Modular ATX Power Supply - Modular Side Interface - ATX 3.0 & PCIe 5.0 Compliant - Zero RPM Fan Mode - 105°C-Rated Capacitors - 80 Plus Gold Efficiency - Black
Corsair RM1000x Shift Fully Modular ATX Power Supply - Modular Side Interface - ATX 3.0 & PCIe 5.0 Compliant - Zero RPM Fan Mode - 105°C-Rated Capacitors - 80 Plus Gold Efficiency - Black
🔗Amazon-Usa Link
Image of ASUS TUF Gaming GeForce RTX 4090 24GB Gaming Graphics Card (DLSS 3, PCIe 4.0, 24GB GDDR6X, HDMI 2.1a, DisplayPort 1.4a, TUF-RTX4090-24G-GAMING)
ASUS TUF Gaming GeForce RTX 4090 24GB Gaming Graphics Card (DLSS 3, PCIe 4.0, 24GB GDDR6X, HDMI 2.1a, DisplayPort 1.4a, TUF-RTX4090-24G-GAMING)
🔗Amazon-Usa Link

About the 3 models in Stable Diffusion 3.5

Stable Diffusion 3.5 is the latest model with fine tuning, LoRA, and other extensibility features, and is available for both commercial and non-commercial use if registered under the 🔗Stability AI Community License. Turbo and Medium models are also available so that users with less powerful machines can handle it without stress.

SD3.5 has three models, and the following is a brief summary of the features of each model.

  • Stable Diffusion 3.5 Large:The most powerful base model in Stabule Diffusion, with 8 billion parameters, fewer than Flux.1, which has 12 billion parameters, but can produce high-quality 1-megapixel images due to improved efficiency. The 🔗Multimodal Diffusion Transformer (MMDiT) Text-to-Image improves image quality, typography, and resource efficiency performance.
  • Stable Diffusion 3.5 Large Turbo:Stable Diffusion 3.5 Large is now a distilled version of Stable Diffusion 3.5 Large and can generate high-quality illustrations in 4 fast Steps. In addition to Large’s Multimodal Diffusion Transformer (MMDiT) Text-to-Image, Stable Diffusion now features 🔗Adversarial Diffusion Distillation (ADD) for faster generation, This improves generation performance.
  • Stable Diffusion 3.5 Medium:It is the base model with 2.5 billion parameters and an improved version of the Multimodal Diffusion Transformer with improvements (MMDiT-X) text-to-image. This quality and customizable model is capable of producing images from 0.25 to 2 megapixels.

Text Encoders

Stable Diffusion 3.5, like SD3, uses three encoders.

  • CLIPs: OpenClip model (ViT-G/14), Open AI CLIP ViT-L/14:The two models are identical to the SDXL text encoder and have a content length of 77 tokens.
  • T5: Google T5-xxl:This encoder is also used in Flux.1 and has a content length of 77/256 tokens.

Download Models

The model is available for download on Stability AI’s HuggingFace. You must also accept the Stability AI Community License. If you have trouble accepting the license, Civitai has 🔗lightweight versions of Fp32, Fp16 and 8Fp. VAE is built in, so there is no need to download it separately.

Stable Diffusion 3.5 Large (16.5GB) Stable Diffusion 3.5 Large Turbo (16.5GB) Stable Diffusion 3.5 Medium (5.11GB) Text Encoder

From the above link, use clip_g.safetensors, clip_l.safetensors and (t5xxl_fp16.safetensors or t5xxl_fp8_e4m3fn.safetensors).

Using Stable Diffusion 3.5 with ComfyUI

From here, we will explain how to use Stable Diffusion 3.5 with ComfyUI.

Since the workflows are available in the repository of the models introduced earlier, let’s start with those files. This time, we will use SD3.5L_example_workflow.json, which is an example workflow for Stable Diffusion 3.5 Large. All three workflows have the same structure, with the only difference being the parameters, so you can look at the workflow of your choice.

Stable Diffusion 3.5 Official Workflow Example

Workflow for Stable Diffusion 3.5 Large
Workflow for Stable Diffusion 3.5 Large
Open Image

After downloading the workflow, open it in ComfyuUI. If you press the “Queue” button in this state, an error will occur because the path of the model is different from your environment, so reconfigure the “Load Checkpoint” and “TripleCLIPLoader” node selections. After reconfiguration, let’s generate it by pressing the “Queue” button.

We have output the results of each model with default parameters. The generation time is the sampling time and does not include the model load time. We have included each for ease of comparison. The prompts are as follows.

beautiful scenery nature glass bottle landscape, purple galaxy bottle,
Generation Result
Results for Stable Diffusion 3.5 Large
Results for Stable Diffusion 3.5 Large Seed: 818183488572308 Generation time: 47.33s 1.17s/it
Open Image
Results for Stable Diffusion 3.5 Large Turbo
Results for Stable Diffusion 3.5 Large Turbo Seed: 880448269872607 Generation time: 2.49s 1.69s/it
Open Image
Results for Stable Diffusion 3.5 Medium
Results for Stable Diffusion 3.5 Medium Seed: 1006649603562674 Generation time: 17.91s 2.29s/it
Open Image

Nodes used in the official Stable Diffusion 3.5 workflow

This section describes the nodes used in the Stable Diffusion 3.5 workflow. Basic nodes such as prompts and decoders are omitted.

Load Checkpoint

Loads a checkpoint model.

TripleCLIPLoader

Three text encoders are loaded. In no particular order, clip_g.safetensors, clip_l.safetensors, And for the T5xxl encoder, choose t5xxl_fp16.safetensors, or t5xxl_fp8_e4m3fn.safetensors for those with less RAM.

EmptySD3LatentImage

Create an empty latent image for SD3. The standard “Empty Latent Image” did not change the result. The difference is that the default size of each image is different, and the resolution of “EmptySD3LatentImage” is divisible by 16 pixels, while the resolution of “Empty Latent Image” is divisible by 8.

ModelSamplingSD3

Adjust the shift value of the time step scheduling shift. The larger the shift value, the more drastic the shift, which may increase artistry. Lowering it too much or raising it too much will result in noisy results. Adjust around 0.5 to 10.0, depending on the model.

ConditioningZeroOut

The effect of the model is neutralized by zeroing out certain parts of the conditioning (prompts).

ConditioningSetTimestepRange

You can specify at what step of the generation the conditioning should be applied. A value of 0.1 means 10%, which is the 10% stage of generation.

Conditioning (Combine)

The two conditioning are mixed. In this graph for SD3.5 Large, the normal negative prompt is applied from start to 10% and the negative prompt with ConditioningZeroOut is applied from 10% to 100%. Also, SD3.5 Turbo is a distillation model, so it cannot use negative prompts.

Supported by

Customize the official Stable Diffusion 3.5 Large workflow

As has become standard in DCAI’s ComfyUI articles, let’s make a practical customization of the official workflow. The list we would like to introduce is as follows.

  • Adaptation of LoRA
  • Implementing upscaling from Large to Medium
About the upscaling process

Using SD3.5 Large for the 2nd pass of upscaling in Stable Diffusion 3.5, it is not possible to generate a range of more than 1M pixels. Implement the upscaling process up to 2M pixels using SD3.5 Medium, referring to the official upscaling method. This process is similar to A1111 WebUI’s Hires. fix, as it uses KSampler (Advanced) to calculate additional steps from the end of the 1st step. Another alternative is to use ControlNet’s Tiled Diffusion.

Download LoRA

We will be using the SD3.5 version of “Daubrez Painterly Style” published by blairesilver13, which was introduced in the previous article. Please download it from the link below.

Implementing LoRA

To implement LoRA, insert “LoraLoaderModelOnly” between “Load Checkpoint” and “ModelSamplingSD3”.

Node Locations
  • LoraLoaderModelOnly:loaders > LoraLoaderModelOnly

Externalization of Seed/Steps

Externalize Seed and Steps for “KSampler (Advanced)” to be introduced later.

From the right-click menu of “KSampler,” select Convert Widget to Input > Convert seed to input so that external nodes are connected to the seed. Steps are externalized in the same way. Place two “primitives” and connect them to seed and steps, respectively.

Node Locations
  • Primitive:utils > Primitive

Implementing an upscaling process

Implementing an upscaler

Connect to “Upscale Image (using Model)” from “VAE Decode” out after the 1st pass (KSampler).

Place “Load Upscale Model” and connect it to the “Upscale Image (using Model)” you have just placed.

Connect “Upscale Image (using Model)” out to “Scale Image to Total Pixels”.

Connect “Scale Image to Total Pixels” out to “VAE Encode” and connect “Load Checkpoint (SD3.5 Large)” VAE out to vae input.

Node Locations
  • Upscale Image (using Model):image > upscale > Upscale Image (using Model)
  • Load Upscale Model:image > upscale > Load Upscale Model
  • Scale Image to Total Pixels:image > upscale > Scale Image to Total Pixels
  • VAE Encode:latent > VAE Encode
Implementing 2nd pass

First, place the “KSampler (Advanced)”. Once placed, externalize noise_seed and start_at_step using Convert Widget to Input from the right-click menu. Connect “Primitive (seed)” externalized in the 1st pass to noise_seed and “Primitive (steps)” to start_at_step.

Connect “VAE Encode” after upscaling to latent_image.

Connect the same “Conditioning (Combine)” as the 1st pass to the negative.

The positive will have a different prompt, but you can copy it (Ctrl+C -> Ctrl+Shift+V) while keeping the “CLIP Text Encode (Prompt)” input and connect the out to the positives.

Since we want to use SD3.5 Medium for the model, let’s place “Load Checkpoint” and “ModelSamplingSD3” and connect MODEL out.

Connect the “KSampler (Advanced)” out to the “VAE Decode” and the VAE input to the VAE out of the “Load Checkpoint (SD3.5 Medium)” placed earlier.

Lastly, connect to “Preview Image” or “Save Image” to complete the process.

Node Locations
  • KSampler (Advanced):sampling > KSampler (Advanced)
  • CLIP Text Encode (Prompt):conditioning > CLIP Text Encode (Prompt)
  • Load Checkpoint:loaders > Load Checkpoint
  • ModelSamplingSD3:advanced > model > ModelSamplingSD3
  • VAE Decode:latent > VAE Decode
  • Preview Image:image > Preview Image
  • Save Image:image > Save Image

How to use Stable Diffusion 3.5 Large workflow custom

Once the graph is completed, enter the parameters.

Load basic model and input basic information

Load Checkpoint (Large)

Use high quality Large in 1st pass.

  • ckpt_name:sd3.5_large.safetensors
TripleCLIPLoader

The order of loading is not fixed.

  • clip_name1:clip_g.safetensors
  • clip_name2:clip_l.safetensors
  • clip_name3:t5xxl_fp16.safetensors
EmptySD3LatentImage
  • width:1344
  • height:768
Primitive (Seed)

If you want the generated results to be the same, use the following values.

  • value:481259788557381
  • control_after_genetate:fixed
Primitive (Steps)
  • value:30
  • control_after_genetate:fixed
LoraLoaderModelOnly

We didn’t want to make the LoRA impact too strong, so we set it to 0.35.

  • lora_name:sd35L-daubrez-v1_db4rz_1800.safetensors
  • strength_model:0.35
ModelSamplingSD3

The generation brush-up is adjusted by the shift value.

  • shift:3.10
CLIP Text Encode (Prompt) (Positive)

Like Flux.1, it mixes natural language and Danbooru style.

A masterful highly intricate detailed cinematic digital painting.
A portrait of a girl, who is looking at viewer and (wearing techwear jacket:1.1). Logo of "DCAI" on jacket. 
In the background a huge robots are standing.
The streets are very dirty. Abandoned cars on the road.
It is a summer day.
Colorful buildings stand in ruins, and grass and trees grow everywhere.

A flat color style, Japanese anime style, tetradic colors in details, very detailed, masterpiece, intricate details, focus on girl, upper body, (anime face:1.2), clear eyes, close-up shot, correct perspective, volumetric light, light diffusion
CLIP Text Encode (Prompt) (Positive)
headphone, lower body, worst quality, ugly, deformed, bad anatomy, (realistic  face:1.1)

Sampling (1st pass) settings

KSampler (1st Pass)
  • cfg:3.00
  • sampler_name:dpmpp_2m
  • scheduler:sgm_uniform
  • denoise:1.00

Upscaler Settings

Load Upscale Model
  • model_name:4x-UltraSharp.pth
Scale Image to Total Pixels

SD3.5 Medium is limited to 2 megapixels, so set to 2.00.

  • upscale_method:lanczos
  • megapixels:2.00

Sampling (2nd pass) settings

Load Checkpoint (Medium)

For the 2nd pass, use Medium, which is inferior in quality but supports up to 2 megapixels.

  • ckpt_name:sd3.5_medium.safetensors
ModelSamplingSD3

The values were set small because we did not want to deviate too much from the quality 1st pass results.

  • shift:0.50
CLIP Text Encode (Prompt) (2nd Pass)

SD3.5 Medium does not like too many tokens and keeps prompting to a minimum.

hyper detailed, masterpiece, intricate details, UHD, 8K
KSampler (Advanced) (2nd Pass)

Starting at the 30th step of the 1st pass, so the setting 60 here is actually 30 steps.

  • add_noise:enable
  • steps:60
  • cfg:3.5
  • sampler_name:dpmpp_2m
  • scheduler:sgm_uniform
  • end_at_step:1000
  • return_with_leftover_noise:disable

This completes the setup. This time we will show you how not to use the custom node “Image chooser” that we always use. To use this method, you need to use the new interface of ComfyUI.

Introducing operations that do not use the image chooser

1. Mute (Ctrl+M) or bypass (Ctrl+B) “Preview Image (Medium 2Mpx)” or “Save Image” on the 2nd pass side.

2. Set “Primitive (Seed)” to randomize. This is an example of seeding, but it can also be used to change parameters.

3. Pressing the “Queue” button will stop the process at “Preview Image (Large 1Mpx)” in the 1st pass.

4. Once you are satisfied with the results of the first pass, click on the Queue list on the left or use Q on the keyboard to open it, right-click on the desired generated image, and select “Load Workflow” to open a new workflow.

5. Unmute or bypass “Preview Image (Medium 2Mpx)” or “Save Image” on the 2nd pass side of the new workflow.

6. Set “Primitive (Seed)” to fixed.

7. Lastly, the “Queue” button can be pressed to resume from the upscaling process.

If you are annoyed by the additional workflow and switching of “Primitive (Seed)” in this method, implement the “Image chooser” 😄.

However, the useful part of this method is that it makes Seed management of batch generation easier. In a normal EmptySD3LatentImage batch process, In the version at the time of writing, if we try to share a generation that we like, Seed is 12345679 and it is the Xth in the batch process, It is easy to share a PNG file or a saved json file generated by a newly opened workflow.

The same can be used for batch processing by leaving the batch_size of “EmptySD3LatentImage” at 1 and setting the number of batches in the Batch count next to the “Queue” button.

Final Result

A little off topic, But this is the result of the workflow generation. As expected, the quality of the upscale is a bit lower than the original, since SD3.5 Medium is used. If you want to upscale while maintaining quality, it may be better to use Tiled DIffusion in ControlNet.

Final result of workflow
Seed: 481259788557381
Open Image

This workflow is available on Patreon, but only paid supporters can view and download it.

Conclusion

This time we have introduced Stable Diffusion 3.5. At the time of this article’s posting, Fulx.1 was the mainstream, so we have to compare them, but when comparing SD3.5 Large and Flux.1 [Dev], we felt that Flux.1 [Dev] was superior, perhaps due to the difference in the amount of parameters. However, SD3.5 Large may be easier to use in some environments because it uses less VRAM than Flux.1 [Dev]. At this stage, there are not many fine-tuned checkpoint models and LoRAs for illustration, so let’s hope that they will become more popular in the future.

PR
Image of LaCie 1big Dock, 10TB, External Hard Drive, HDD Docking Station, Thunderbolt 3, 7.200 RPM, Enterprise Class Drives, for Mac and PC Desktop, 5 Year Rescue Services (STHS10000800)
LaCie 1big Dock, 10TB, External Hard Drive, HDD Docking Station, Thunderbolt 3, 7.200 RPM, Enterprise Class Drives, for Mac and PC Desktop, 5 Year Rescue Services (STHS10000800)
🔗Amazon-Usa Link
Image of Seasonic Vertex PX-1200-1200W - 80+ Platinum - ATX 3.0 & PCIe 5.0 Ready - Full-Modular - ATX Form Factor - Low Noise - 12 Year Warranty - Nvidia RTX 30/40 Super & AMD GPU Compatible
Seasonic Vertex PX-1200-1200W - 80+ Platinum - ATX 3.0 & PCIe 5.0 Ready - Full-Modular - ATX Form Factor - Low Noise - 12 Year Warranty - Nvidia RTX 30/40 Super & AMD GPU Compatible
🔗Amazon-Usa Link
Supported by