Detailed usage of ComfyUI Stable Diffusion 3.5 including how to use LoRA

⏱️18min read

📅 Dec 03, 2024

Tags:🏷️ AUTOMATIC1111 🏷️ Install 🏷️ Model 🏷️ SD3.5 🏷️ text2image

Detailed usage of ComfyUI Stable Diffusion 3.5 including how to use LoRA featured Image

📄Table of contents

About the 3 models in Stable Diffusion 3.5
Text Encoders
Download Models
Using Stable Diffusion 3.5 with ComfyUI
1. Stable Diffusion 3.5 Official Workflow Example
2. Nodes used in the official Stable Diffusion 3.5 workflow
Customize the official Stable Diffusion 3.5 Large workflow
How to use Stable Diffusion 3.5 Large workflow custom
Conclusion

Supported by

Stability AI announced “Stable Diffusion 3” in the spring of 2024, and the power of the generated images in the demo became popular, However, when the demo was released to the public, it was not as good as the demo, and the generated image of a woman lying on the grass was a hot topic on the Internet for a while. “Stable Diffusion 3.5” was released to regain the popularity of the demo. The current craze for free locally generated models is Black Forest Labs’s “Flux.1,” but is this not a model to compete with it? In this article, we will show you how to use the three models of “Stable Diffusion 3.5”: Mideum, Large, and Large Turbo.

Image of Corsair RM1000x Shift Fully Modular ATX Power Supply - Modular Side Interface - ATX 3.0 & PCIe 5.0 Compliant - Zero RPM Fan Mode - 105°C-Rated Capacitors - 80 Plus Gold Efficiency - Black

Corsair RM1000x Shift Fully Modular ATX Power Supply - Modular Side Interface - ATX 3.0 & PCIe 5.0 Compliant - Zero RPM Fan Mode - 105°C-Rated Capacitors - 80 Plus Gold Efficiency - Black

Shop at Amazon-Usa

🔗Amazon-Usa Link

ASUS TUF Gaming GeForce RTX 4090 24GB Gaming Graphics Card (DLSS 3, PCIe 4.0, 24GB GDDR6X, HDMI 2.1a, DisplayPort 1.4a, TUF-RTX4090-24G-GAMING)

Shop at Amazon-Usa

🔗Amazon-Usa Link

About the 3 models in Stable Diffusion 3.5

Stable Diffusion 3.5 is the latest model with fine tuning, LoRA, and other extensibility features, and is available for both commercial and non-commercial use if registered under the 🔗Stability AI Community License. Turbo and Medium models are also available so that users with less powerful machines can handle it without stress.

SD3.5 has three models, and the following is a brief summary of the features of each model.

Stable Diffusion 3.5 Large：The most powerful base model in Stabule Diffusion, with 8 billion parameters, fewer than Flux.1, which has 12 billion parameters, but can produce high-quality 1-megapixel images due to improved efficiency. The 🔗Multimodal Diffusion Transformer (MMDiT) Text-to-Image improves image quality, typography, and resource efficiency performance.
Stable Diffusion 3.5 Large Turbo：Stable Diffusion 3.5 Large is now a distilled version of Stable Diffusion 3.5 Large and can generate high-quality illustrations in 4 fast Steps. In addition to Large’s Multimodal Diffusion Transformer (MMDiT) Text-to-Image, Stable Diffusion now features 🔗Adversarial Diffusion Distillation (ADD) for faster generation, This improves generation performance.
Stable Diffusion 3.5 Medium：It is the base model with 2.5 billion parameters and an improved version of the Multimodal Diffusion Transformer with improvements (MMDiT-X) text-to-image. This quality and customizable model is capable of producing images from 0.25 to 2 megapixels.

Text Encoders

Stable Diffusion 3.5, like SD3, uses three encoders.

CLIPs: OpenClip model (ViT-G/14), Open AI CLIP ViT-L/14：The two models are identical to the SDXL text encoder and have a content length of 77 tokens.
T5: Google T5-xxl：This encoder is also used in Flux.1 and has a content length of 77/256 tokens.

Download Models

The model is available for download on Stability AI’s HuggingFace. You must also accept the Stability AI Community License. If you have trouble accepting the license, Civitai has 🔗lightweight versions of Fp32, Fp16 and 8Fp. VAE is built in, so there is no need to download it separately.

Stable Diffusion 3.5 Large (16.5GB)

stabilityai/stable-diffusion-3.5-large · Hugging Face

https://huggingface.co/stabilityai/stable-diffusion-3.5-large

🔗External Link

Stable Diffusion 3.5 Large Turbo (16.5GB)

stabilityai/stable-diffusion-3.5-large-turbo · Hugging Face

https://huggingface.co/stabilityai/stable-diffusion-3.5-large-turbo

🔗External Link

Stable Diffusion 3.5 Medium (5.11GB)

stabilityai/stable-diffusion-3.5-medium · Hugging Face

https://huggingface.co/stabilityai/stable-diffusion-3.5-medium

🔗External Link

Text Encoder

stabilityai/stable-diffusion-3.5-large/text_encoders · Hugging Face

https://huggingface.co/stabilityai/stable-diffusion-3.5-large/tree/main/text_encoders

🔗External Link

From the above link, use clip_g.safetensors, clip_l.safetensors and (t5xxl_fp16.safetensors or t5xxl_fp8_e4m3fn.safetensors).

Using Stable Diffusion 3.5 with ComfyUI

From here, we will explain how to use Stable Diffusion 3.5 with ComfyUI.

Since the workflows are available in the repository of the models introduced earlier, let’s start with those files. This time, we will use SD3.5L_example_workflow.json, which is an example workflow for Stable Diffusion 3.5 Large. All three workflows have the same structure, with the only difference being the parameters, so you can look at the workflow of your choice.

Stable Diffusion 3.5 Official Workflow Example

After downloading the workflow, open it in ComfyuUI. If you press the “Queue” button in this state, an error will occur because the path of the model is different from your environment, so reconfigure the “Load Checkpoint” and “TripleCLIPLoader” node selections. After reconfiguration, let’s generate it by pressing the “Queue” button.

We have output the results of each model with default parameters. The generation time is the sampling time and does not include the model load time. We have included each for ease of comparison. The prompts are as follows.

beautiful scenery nature glass bottle landscape, purple galaxy bottle,

Generation Result

Results for Stable Diffusion 3.5 Large Seed: 818183488572308 Generation time: 47.33s 1.17s/it

Results for Stable Diffusion 3.5 Large Turbo Seed: 880448269872607 Generation time: 2.49s 1.69s/it

Results for Stable Diffusion 3.5 Medium Seed: 1006649603562674 Generation time: 17.91s 2.29s/it

Nodes used in the official Stable Diffusion 3.5 workflow

This section describes the nodes used in the Stable Diffusion 3.5 workflow. Basic nodes such as prompts and decoders are omitted.

Load Checkpoint

Loads a checkpoint model.

TripleCLIPLoader

Three text encoders are loaded. In no particular order, clip_g.safetensors, clip_l.safetensors, And for the T5xxl encoder, choose t5xxl_fp16.safetensors, or t5xxl_fp8_e4m3fn.safetensors for those with less RAM.

EmptySD3LatentImage

Create an empty latent image for SD3. The standard “Empty Latent Image” did not change the result. The difference is that the default size of each image is different, and the resolution of “EmptySD3LatentImage” is divisible by 16 pixels, while the resolution of “Empty Latent Image” is divisible by 8.

ModelSamplingSD3

Adjust the shift value of the time step scheduling shift. The larger the shift value, the more drastic the shift, which may increase artistry. Lowering it too much or raising it too much will result in noisy results. Adjust around 0.5 to 10.0, depending on the model.

ConditioningZeroOut

The effect of the model is neutralized by zeroing out certain parts of the conditioning (prompts).

ConditioningSetTimestepRange

You can specify at what step of the generation the conditioning should be applied. A value of 0.1 means 10%, which is the 10% stage of generation.

Conditioning (Combine)

The two conditioning are mixed. In this graph for SD3.5 Large, the normal negative prompt is applied from start to 10% and the negative prompt with ConditioningZeroOut is applied from 10% to 100%. Also, SD3.5 Turbo is a distillation model, so it cannot use negative prompts.

Supported by

Customize the official Stable Diffusion 3.5 Large workflow

Stable Diffusion 3.5 Large Workflow Custom Overall View

As has become standard in DCAI’s ComfyUI articles, let’s make a practical customization of the official workflow. The list we would like to introduce is as follows.

Adaptation of LoRA
Implementing upscaling from Large to Medium

About the upscaling process

Using SD3.5 Large for the 2nd pass of upscaling in Stable Diffusion 3.5, it is not possible to generate a range of more than 1M pixels. Implement the upscaling process up to 2M pixels using SD3.5 Medium, referring to the official upscaling method. This process is similar to A1111 WebUI’s Hires. fix, as it uses KSampler (Advanced) to calculate additional steps from the end of the 1st step. Another alternative is to use ControlNet’s Tiled Diffusion.

Download LoRA

We will be using the SD3.5 version of “Daubrez Painterly Style” published by blairesilver13, which was introduced in the previous article. Please download it from the link below.

blairesilver13 / Daubrez Painterly Style | Civit AI

https://civitai.com/models/696050?modelVersionId=1015075

🔗External Link

Implementing LoRA

To implement LoRA, insert “LoraLoaderModelOnly” between “Load Checkpoint” and “ModelSamplingSD3”.

Node Locations

LoraLoaderModelOnly：loaders > LoraLoaderModelOnly

Externalization of Seed/Steps

Externalize Seed and Steps for “KSampler (Advanced)” to be introduced later.

From the right-click menu of “KSampler,” select Convert Widget to Input > Convert seed to input so that external nodes are connected to the seed. Steps are externalized in the same way. Place two “primitives” and connect them to seed and steps, respectively.

Node Locations

Primitive：utils > Primitive

Implementing an upscaling process

Implementing an upscaler

Connect to “Upscale Image (using Model)” from “VAE Decode” out after the 1st pass (KSampler).

Place “Load Upscale Model” and connect it to the “Upscale Image (using Model)” you have just placed.

Connect “Upscale Image (using Model)” out to “Scale Image to Total Pixels”.

Connect “Scale Image to Total Pixels” out to “VAE Encode” and connect “Load Checkpoint (SD3.5 Large)” VAE out to vae input.

Node Locations

Upscale Image (using Model)：image > upscale > Upscale Image (using Model)
Load Upscale Model：image > upscale > Load Upscale Model
Scale Image to Total Pixels：image > upscale > Scale Image to Total Pixels
VAE Encode：latent > VAE Encode

Implementing 2nd pass

First, place the “KSampler (Advanced)”. Once placed, externalize noise_seed and start_at_step using Convert Widget to Input from the right-click menu. Connect “Primitive (seed)” externalized in the 1st pass to noise_seed and “Primitive (steps)” to start_at_step.

Connect “VAE Encode” after upscaling to latent_image.

Connect the same “Conditioning (Combine)” as the 1st pass to the negative.

The positive will have a different prompt, but you can copy it (Ctrl+C -> Ctrl+Shift+V) while keeping the “CLIP Text Encode (Prompt)” input and connect the out to the positives.

Since we want to use SD3.5 Medium for the model, let’s place “Load Checkpoint” and “ModelSamplingSD3” and connect MODEL out.

Connect the “KSampler (Advanced)” out to the “VAE Decode” and the VAE input to the VAE out of the “Load Checkpoint (SD3.5 Medium)” placed earlier.

Lastly, connect to “Preview Image” or “Save Image” to complete the process.

Node Locations

KSampler (Advanced)：sampling > KSampler (Advanced)
CLIP Text Encode (Prompt)：conditioning > CLIP Text Encode (Prompt)
Load Checkpoint：loaders > Load Checkpoint
ModelSamplingSD3：advanced > model > ModelSamplingSD3
VAE Decode：latent > VAE Decode
Preview Image：image > Preview Image
Save Image：image > Save Image

How to use Stable Diffusion 3.5 Large workflow custom

Once the graph is completed, enter the parameters.

Load basic model and input basic information

Load Checkpoint (Large)

Use high quality Large in 1st pass.

ckpt_name：sd3.5_large.safetensors

TripleCLIPLoader

The order of loading is not fixed.

clip_name1：clip_g.safetensors
clip_name2：clip_l.safetensors
clip_name3：t5xxl_fp16.safetensors

EmptySD3LatentImage

width：1344
height：768

Primitive (Seed)

If you want the generated results to be the same, use the following values.

value：481259788557381
control_after_genetate：fixed

Primitive (Steps)

value：30
control_after_genetate：fixed

LoraLoaderModelOnly

We didn’t want to make the LoRA impact too strong, so we set it to 0.35.

lora_name：sd35L-daubrez-v1_db4rz_1800.safetensors
strength_model：0.35

ModelSamplingSD3

The generation brush-up is adjusted by the shift value.

shift：3.10

CLIP Text Encode (Prompt) (Positive)

Like Flux.1, it mixes natural language and Danbooru style.

A masterful highly intricate detailed cinematic digital painting.
A portrait of a girl, who is looking at viewer and (wearing techwear jacket:1.1). Logo of "DCAI" on jacket. 
In the background a huge robots are standing.
The streets are very dirty. Abandoned cars on the road.
It is a summer day.
Colorful buildings stand in ruins, and grass and trees grow everywhere.

A flat color style, Japanese anime style, tetradic colors in details, very detailed, masterpiece, intricate details, focus on girl, upper body, (anime face:1.2), clear eyes, close-up shot, correct perspective, volumetric light, light diffusion

CLIP Text Encode (Prompt) (Positive)

headphone, lower body, worst quality, ugly, deformed, bad anatomy, (realistic  face:1.1)

Sampling (1st pass) settings

KSampler (1st Pass)

cfg：3.00
sampler_name：dpmpp_2m
scheduler：sgm_uniform
denoise：1.00

Upscaler Settings

Load Upscale Model

model_name：4x-UltraSharp.pth

Scale Image to Total Pixels

SD3.5 Medium is limited to 2 megapixels, so set to 2.00.

upscale_method：lanczos
megapixels：2.00

Sampling (2nd pass) settings

Load Checkpoint (Medium)

For the 2nd pass, use Medium, which is inferior in quality but supports up to 2 megapixels.

ckpt_name：sd3.5_medium.safetensors

ModelSamplingSD3

The values were set small because we did not want to deviate too much from the quality 1st pass results.

shift：0.50

CLIP Text Encode (Prompt) (2nd Pass)

SD3.5 Medium does not like too many tokens and keeps prompting to a minimum.

hyper detailed, masterpiece, intricate details, UHD, 8K

KSampler (Advanced) (2nd Pass)

Starting at the 30th step of the 1st pass, so the setting 60 here is actually 30 steps.

add_noise：enable
steps：60
cfg：3.5
sampler_name：dpmpp_2m
scheduler：sgm_uniform
end_at_step：1000
return_with_leftover_noise：disable

This completes the setup. This time we will show you how not to use the custom node “Image chooser” that we always use. To use this method, you need to use the new interface of ComfyUI.

Introducing operations that do not use the image chooser

Illustration of operation without image chooser-1

1. Mute (Ctrl+M) or bypass (Ctrl+B) “Preview Image (Medium 2Mpx)” or “Save Image” on the 2nd pass side.

2. Set “Primitive (Seed)” to randomize. This is an example of seeding, but it can also be used to change parameters.

3. Pressing the “Queue” button will stop the process at “Preview Image (Large 1Mpx)” in the 1st pass.

4. Once you are satisfied with the results of the first pass, click on the Queue list on the left or use Q on the keyboard to open it, right-click on the desired generated image, and select “Load Workflow” to open a new workflow.

Illustration of operation without image chooser-2

5. Unmute or bypass “Preview Image (Medium 2Mpx)” or “Save Image” on the 2nd pass side of the new workflow.

6. Set “Primitive (Seed)” to fixed.

7. Lastly, the “Queue” button can be pressed to resume from the upscaling process.

If you are annoyed by the additional workflow and switching of “Primitive (Seed)” in this method, implement the “Image chooser” 😄.

However, the useful part of this method is that it makes Seed management of batch generation easier. In a normal EmptySD3LatentImage batch process, In the version at the time of writing, if we try to share a generation that we like, Seed is 12345679 and it is the Xth in the batch process, It is easy to share a PNG file or a saved json file generated by a newly opened workflow.

The same can be used for batch processing by leaving the batch_size of “EmptySD3LatentImage” at 1 and setting the number of batches in the Batch count next to the “Queue” button.

Final Result

A little off topic, But this is the result of the workflow generation. As expected, the quality of the upscale is a bit lower than the original, since SD3.5 Medium is used. If you want to upscale while maintaining quality, it may be better to use Tiled DIffusion in ControlNet.

This workflow is available on Patreon, but only paid supporters can view and download it.

ComfyUI Stable Diffusion 3.5 official workflow custom | Digital Creative AI | PATREON

https://www.patreon.com/posts/comfyui-stable-3-117206935

🔗External Link

Conclusion

This time we have introduced Stable Diffusion 3.5. At the time of this article’s posting, Fulx.1 was the mainstream, so we have to compare them, but when comparing SD3.5 Large and Flux.1 [Dev], we felt that Flux.1 [Dev] was superior, perhaps due to the difference in the amount of parameters. However, SD3.5 Large may be easier to use in some environments because it uses less VRAM than Flux.1 [Dev]. At this stage, there are not many fine-tuned checkpoint models and LoRAs for illustration, so let’s hope that they will become more popular in the future.

Image of LaCie 1big Dock, 10TB, External Hard Drive, HDD Docking Station, Thunderbolt 3, 7.200 RPM, Enterprise Class Drives, for Mac and PC Desktop, 5 Year Rescue Services (STHS10000800)

LaCie 1big Dock, 10TB, External Hard Drive, HDD Docking Station, Thunderbolt 3, 7.200 RPM, Enterprise Class Drives, for Mac and PC Desktop, 5 Year Rescue Services (STHS10000800)

Shop at Amazon-Usa

🔗Amazon-Usa Link

Image of Seasonic Vertex PX-1200-1200W - 80+ Platinum - ATX 3.0 & PCIe 5.0 Ready - Full-Modular - ATX Form Factor - Low Noise - 12 Year Warranty - Nvidia RTX 30/40 Super & AMD GPU Compatible

Seasonic Vertex PX-1200-1200W - 80+ Platinum - ATX 3.0 & PCIe 5.0 Ready - Full-Modular - ATX Form Factor - Low Noise - 12 Year Warranty - Nvidia RTX 30/40 Super & AMD GPU Compatible

Shop at Amazon-Usa

🔗Amazon-Usa Link

Category:📂 Intermediate

Tags:🏷️ AUTOMATIC1111 🏷️ Install 🏷️ Model 🏷️ SD3.5 🏷️ text2image

Supported by