How to use high performance GGUF with ComfyUI Flux.1 [schnell]

⏱️11min read

📅 Nov 03, 2024

How to use high performance GGUF with ComfyUI Flux.1 [schnell] featured Image

📄Table of contents

What is GGUF?
To use GGUF with ComfyUI Flux.1 [schnell]
ComfyUI Flux.1 [schnell] + GGUF workflow
Conclusion

In this article, I will explain how to use GGUF with ComfyUI Flux.1[schnell]. Using quantized GGUF with Flux.1 requires a dedicated model and a dedicated custom node, and is recommended for performance improvement.

What is GGUF?

GGUF (GPT-Generated Unified Format) is a file format released by the llama.cpp team in August 2023, which supports models that could not be supported by the previous GGML (GPT-Generated Model Language), It also has further versatility and extensibility. In the field of illustration generation AI, safetansor and bin files have been published with quantization conversion. If you want to create your own GGUF file, you can clone the official repository and use “convert-hf-to-gguf.py” to convert it.

ggerganov/llama.cpp · GitHub

https://github.com/ggerganov/llama.cpp

🔗External Link

To use GGUF with ComfyUI Flux.1 [schnell]

Download Models

To use GGUF with ComfyUI Flux.1[schnell], download the model published by city96. The repository has models from 2-bit (Q2_K) to 16-bit (F16), so download a model suitable for your environment with reference to “Quantization” explained later.

city96/FLUX.1-schnell-gguf · Hugging Face

https://huggingface.co/city96/FLUX.1-schnell-gguf/tree/main

🔗External Link

Quantization

The type of quantization for city96/flux.1-schnell-gguf is shown in the table below.

Basically, the higher the bit count, the higher the accuracy. In return, VRAM consumption will also increase.

Type	Model Size	Description
Q2_K	4.01 GB	2-bit quantization; 16 superblocks, each block has 16 weights; 2.5625 bits per weight.
Q3_K_S	5.21 GB	3-bit quantization; 16 superblocks, each block has 16 weights; 3.4375 bits per weight.
Q4_K_S / Q4_0 / Q4_1	6.78 GB / 6.77 GB / 7.51 GB	4-bit quantization; 8 superblocks, each block has 32 weights; 4.5 bits per weight; Q4_0 and Q4_1 are rounded to the nearest 4-bit value.
Q5_K_S / Q5_0 / Q5_1	8.26 GB / 8.25 GB / 8.99 GB	5-bit quantization; 8 superblocks, each block has 32 weights; 5.5 bits per weight; Q5_0 and Q5_1 are rounded to the nearest 5-bit value.
Q6_K	9.83 GB	6-bit quantization; 16 superblocks, each block has 16 weights, resulting in 6.5625 bits per weight.
Q8_0	12.7 GB	Quantized to the nearest 8 bits. Each block has 32 weights.
F16	23.8 GB	16-bit standard IEEE754 half-precision floating-point number.

Recommended type by VRAM

The following recommendations are included for those who are not familiar with the table.

24GB VRAM：Q8_0
16GB VRAM：Q6_K
12GB VRAM：Q5_K_S
Less than 10GB VRAM：Q4_0 or Q4_1

Custom Node Installation

To use GGUF with ComfyUI, a custom node “ComfyUI-GGUF” is required. Use “Custom Nodes Manager” to search and install ComfyUI-GGUF.

city96/ComfyUI-GGUF · GitHub

https://github.com/city96/ComfyUI-GGUF

🔗External Link

If you do not know how to install a custom node, please refer to the following article for a detailed explanation.

[AI Illustration] Introduction of Recommended Custom Nodes for ComfyUI

🔗Read later

Quantized T5 v1.1 XXL encoder installation (optional)

Further performance can be achieved by using the GGUF file for the T5 v1.1 XXL encoder published by city96. Download the same quantization file as the model.

city96/t5-v1_1-xxl-encoder-gguf · Hugging Face

https://huggingface.co/city96/t5-v1_1-xxl-encoder-gguf/tree/main

🔗External Link

ComfyUI Flux.1 [schnell] + GGUF workflow

From here, let’s actually use the workflow with the model. This workflow uses the following models and custom nodes, so download and install them beforehand. Also, update to the latest version of ComfyUI because it may not work well if the version is old.

flux1-schnell-Q8_0.gguf：The 8-bit quantization model of Flux.1 [schnell] introduced in this article
t5-v1_1-xxl-encoder-Q8_0.gguf：T5 v1.1 XXL encoder quantized to 8 bit
ComfyUI-GGUF：Custom node to read Unet and CLIP in GGUF format
Image chooser：Implemented on a custom node to check 1st Pass results, this allows easy re-run of 1st Pass with short generation time until a satisfactory configuration is obtained.
🔗aki_anime.safetensors：LoRA model with anime style
🔗hinaFluxFantasyArmorMix-schnell_v1-rev1.safetensors：LoRA model that generates fantasy-style armor.
🔗clip_l.safetensors：Standard text encoder for Flux.1
🔗ae.safetensors：Standard VAE in Flux.1

The workflow is available on Patreon, but only paid supporters can view and download it. If you would like to become a paid supporter for just one month, it will encourage us to write more, so please join us if you would like.

How to use high performance GGUF with ComfyUI Flux.1 [schnell] | Digital Creative AI | PATREON

https://www.patreon.com/posts/how-to-use-high-115260684

🔗External Link

Even if you cannot download the workflow, you can configure it yourself by looking at the explanation.

Basic Info

Unet Loader (GGUF)：Load the Unet of GGUF, selecting flux-schnell-Q8_0.gguf for the unet_name.
DualCLIPLoader (GGUF)：Load the GGUF text encoder model. choose clip_l.safetensors and t5-v1_1-xxl-encoder-Q8.gguf for clip_name.
Load LoRA：Load LoRA, setting aki_anime.safetensors to lora_name and strength_model to 0.80 to reflect the base model a little. In addition, since we want to adapt LoRA, we place another Load LoRA and set lora_name to hinaFluxFantasyArmorMix-schnell_v1-rev1.safetensors and strength_model to 0.60 because of the effect on the face.
Empty Latent Image：In this case, we will use 1280 x 720. batch_size is left at 1.
ModelSamplingFlux：This is the time step scheduling shift setting. max_shift should be set around 0.0 to 2.0 when used with FLUX.1 [schnell]. In this case, set it to 2.0. The base_shift is not reflected, so use 0 or the default of 0.5. 1024 is fine for width and height. In some cases, bypassing this node may give better results.

CLIP Text Encode (Prompt)：Basically, you can use only natural language since T5XXL is good at natural language, but since CLIP L is also used, you can also use Danbooru style. In this case, we will use the following prompts.

A beautiful blonde girl stands on a hillside under a blue sky.
She looks like an angelic knight with a halo ring.
She gazes at the viewer.
She opens her white wings.
Many white feathers in the sky.
The girl's head is adorned with jewels.

The theme is teal and orange.


(The old castle is on top of a hill:0.85).

horizonin view, 50mm lens portrait, correct perspective, (anime kawaii face with detailed eyes:1.3), medival fantasy, water fall, authentic (no credits, no signature.:1.1), (detailed fantasy white and gold armor:1.2)

Negative prompts are not reflected, so leave them blank.

Load VAE：Load VAE; select ae.safetensors for vae_name.
Primitive (Seeds)：Seed is externalized to share the seed value between 1st Pass and 2nd Pass.

1st Pass

The 1st Pass uses the standard sampler of ComfyUI. You can also configure it with “SamplerCustomAdvanced” as described in past articles.

KSampler：seed is externalized and gets its value from Primitive (Seeds). steps uses 2. cfg is set to 1.0 as recommended by Flux.1[schnell]. sampler_name is set to euler, scheduler to beta, and denoise is set to 1.00.
VAE Decode：Decodes the latent image generated by the sampler into a pixel image.

Preview Chooser

This is placed to check the result generated in the 1st Pass. When an illustration you like is generated, select it and click the “Progress selected image” button to proceed.

Upscale

Load Upscale Model：Select the upscaler model. In this case, we will use 4x-UltraSharp.pth.
Upscale Image (using Model)：Used to use the upscaler model.
Scale Image to Total Pixels：The upscaler model is used to reduce an image that has been enlarged by a factor of 4 to a desired size. In this case, specify 3.00 to enlarge the image to a 3 megapixel illustration; if your PC has sufficient specifications, you can use 5.00 to generate an even sharper illustration.
VAE Encode：Encode the scaled-up image into a latent image to be sent to 2nd Pass.

2nd Pass Info

ModelSamplingFlux：In the 2nd Pass, max_shift is set to 0.15 because we do not want to change the composition significantly.
CLIP Text Encode (Prompt)：Use the following simple prompt for 2nd Pass.
```
very detailed, masterpiece, intricate details, UHD, 8K
```

2nd Pass

The 2nd Pass has most of the same settings as the 1st Pass, but only denoise is set to 0.35 to keep the original composition.

Preview Image

This is the final result. If you want to save the image, select “Save Image” from the right-click menu or change this node to “Save Image”.

The above is an explanation of the workflow.

Final Results

Conclusion

How was it? I hope that the introduction of GGUF has made you comfortable with heavyweight Flux.1. In addition to Flux.1[shnell], Flux.1[dev], SD3.5 large and sd3.5 large turbo models are also available at city96’s Hugging Face, so those who are interested can try them.

This is the third article on Flux.1[schnell], but it seems that the modeling community is not very excited about it. Perhaps the reason is that Flux.1[schnell] is a distillation model designed to reduce model size and improve speed, so it is not very flexible. For this reason, ControlNet cannot be used and fine tuning is not possible. However, the distillation model of Flux.1 [schnell] was removed in “🔗OpenFlux.1” published by Mr. Ostris, so perhaps ControlNet and others may appear.

Category:📂 Intermediate

Tags:🏷️ ComfyUI 🏷️ Flux.1 S

Thank you for reading to the end.

If you found this even a little helpful, please support by giving it a “Like”!

What is GGUF?🔗

To use GGUF with ComfyUI Flux.1 [schnell]🔗

Download Models🔗

Quantization🔗

Recommended type by VRAM🔗

Custom Node Installation🔗

Quantized T5 v1.1 XXL encoder installation (optional)🔗

ComfyUI Flux.1 [schnell] + GGUF workflow🔗

Basic Info🔗

1st Pass🔗

Preview Chooser🔗

Upscale🔗

2nd Pass Info🔗

2nd Pass🔗

Preview Image🔗

Final Results🔗

Conclusion🔗

What is GGUF?

To use GGUF with ComfyUI Flux.1 [schnell]

Download Models

Quantization

Recommended type by VRAM

Custom Node Installation

Quantized T5 v1.1 XXL encoder installation (optional)

ComfyUI Flux.1 [schnell] + GGUF workflow

Basic Info

1st Pass

Preview Chooser

Upscale

2nd Pass Info

2nd Pass

Preview Image

Final Results

Conclusion