How to use high performance GGUF with ComfyUI Flux.1 [schnell]
In this article, I will explain how to use GGUF with ComfyUI Flux.1[schnell]. Using quantized GGUF with Flux.1 requires a dedicated model and a dedicated custom node, and is recommended for performance improvement.
What is GGUF?
GGUF (GPT-Generated Unified Format) is a file format released by the llama.cpp team in August 2023, which supports models that could not be supported by the previous GGML (GPT-Generated Model Language), It also has further versatility and extensibility. In the field of illustration generation AI, safetansor and bin files have been published with quantization conversion. If you want to create your own GGUF file, you can clone the official repository and use “convert-hf-to-gguf.py” to convert it.
To use GGUF with ComfyUI Flux.1 [schnell]
Download Models
To use GGUF with ComfyUI Flux.1[schnell], download the model published by city96. The repository has models from 2-bit (Q2_K) to 16-bit (F16), so download a model suitable for your environment with reference to “Quantization” explained later.
Quantization
The type of quantization for city96/flux.1-schnell-gguf is shown in the table below.
Basically, the higher the bit count, the higher the accuracy. In return, VRAM consumption will also increase.
Type | Model Size | Description |
---|---|---|
Q2_K | 4.01 GB | 2-bit quantization; 16 superblocks, each block has 16 weights; 2.5625 bits per weight. |
Q3_K_S | 5.21 GB | 3-bit quantization; 16 superblocks, each block has 16 weights; 3.4375 bits per weight. |
Q4_K_S / Q4_0 / Q4_1 | 6.78 GB / 6.77 GB / 7.51 GB | 4-bit quantization; 8 superblocks, each block has 32 weights; 4.5 bits per weight; Q4_0 and Q4_1 are rounded to the nearest 4-bit value. |
Q5_K_S / Q5_0 / Q5_1 | 8.26 GB / 8.25 GB / 8.99 GB | 5-bit quantization; 8 superblocks, each block has 32 weights; 5.5 bits per weight; Q5_0 and Q5_1 are rounded to the nearest 5-bit value. |
Q6_K | 9.83 GB | 6-bit quantization; 16 superblocks, each block has 16 weights, resulting in 6.5625 bits per weight. |
Q8_0 | 12.7 GB | Quantized to the nearest 8 bits. Each block has 32 weights. |
F16 | 23.8 GB | 16-bit standard IEEE754 half-precision floating-point number. |
Recommended type by VRAM
The following recommendations are included for those who are not familiar with the table.
- 24GB VRAM:Q8_0
- 16GB VRAM:Q6_K
- 12GB VRAM:Q5_K_S
- Less than 10GB VRAM:Q4_0 or Q4_1
Custom Node Installation
To use GGUF with ComfyUI, a custom node “ComfyUI-GGUF” is required. Use “Custom Nodes Manager” to search and install ComfyUI-GGUF
.
If you do not know how to install a custom node, please refer to the following article for a detailed explanation.
Quantized T5 v1.1 XXL encoder installation (optional)
Further performance can be achieved by using the GGUF file for the T5 v1.1 XXL encoder published by city96. Download the same quantization file as the model.
ComfyUI Flux.1 [schnell] + GGUF workflow
From here, let’s actually use the workflow with the model. This workflow uses the following models and custom nodes, so download and install them beforehand. Also, update to the latest version of ComfyUI because it may not work well if the version is old.
- flux1-schnell-Q8_0.gguf:The 8-bit quantization model of Flux.1 [schnell] introduced in this article
- t5-v1_1-xxl-encoder-Q8_0.gguf:T5 v1.1 XXL encoder quantized to 8 bit
- ComfyUI-GGUF:Custom node to read Unet and CLIP in GGUF format
- Image chooser:Implemented on a custom node to check 1st Pass results, this allows easy re-run of 1st Pass with short generation time until a satisfactory configuration is obtained.
- 🔗aki_anime.safetensors:LoRA model with anime style
- 🔗hinaFluxFantasyArmorMix-schnell_v1-rev1.safetensors:LoRA model that generates fantasy-style armor.
- 🔗clip_l.safetensors:Standard text encoder for Flux.1
- 🔗ae.safetensors:Standard VAE in Flux.1
The workflow is available on Patreon, but only paid supporters can view and download it. If you would like to become a paid supporter for just one month, it will encourage us to write more, so please join us if you would like.
Even if you cannot download the workflow, you can configure it yourself by looking at the explanation.
Basic Info
- Unet Loader (GGUF):Load the Unet of GGUF, selecting
flux-schnell-Q8_0.gguf
for the unet_name. - DualCLIPLoader (GGUF):Load the GGUF text encoder model. choose
clip_l.safetensors
andt5-v1_1-xxl-encoder-Q8.gguf
for clip_name. - Load LoRA:Load LoRA, setting
aki_anime.safetensors
to lora_name and strength_model to0.80
to reflect the base model a little. In addition, since we want to adapt LoRA, we place another Load LoRA and set lora_name tohinaFluxFantasyArmorMix-schnell_v1-rev1.safetensors
and strength_model to0.60
because of the effect on the face. - Empty Latent Image:In this case, we will use
1280 x 720
. batch_size is left at1
. - ModelSamplingFlux:This is the time step scheduling shift setting. max_shift should be set around
0.0 to 2.0
when used with FLUX.1 [schnell]. In this case, set it to2.0
. The base_shift is not reflected, so use0
or the default of0.5
.1024
is fine for width and height. In some cases, bypassing this node may give better results. -
CLIP Text Encode (Prompt):Basically, you can use only natural language since T5XXL is good at natural language, but since CLIP L is also used, you can also use Danbooru style. In this case, we will use the following prompts.
Negative prompts are not reflected, so leave them blank.
A beautiful blonde girl stands on a hillside under a blue sky. She looks like an angelic knight with a halo ring. She gazes at the viewer. She opens her white wings. Many white feathers in the sky. The girl's head is adorned with jewels. The theme is teal and orange. (The old castle is on top of a hill:0.85). horizonin view, 50mm lens portrait, correct perspective, (anime kawaii face with detailed eyes:1.3), medival fantasy, water fall, authentic (no credits, no signature.:1.1), (detailed fantasy white and gold armor:1.2)
- Load VAE:Load VAE; select
ae.safetensors
for vae_name. - Primitive (Seeds):Seed is externalized to share the seed value between 1st Pass and 2nd Pass.
1st Pass
The 1st Pass uses the standard sampler of ComfyUI. You can also configure it with “SamplerCustomAdvanced” as described in past articles.
- KSampler:seed is externalized and gets its value from Primitive (Seeds). steps uses
2
. cfg is set to1.0
as recommended by Flux.1[schnell]. sampler_name is set toeuler
, scheduler tobeta
, and denoise is set to1.00
. - VAE Decode:Decodes the latent image generated by the sampler into a pixel image.
Preview Chooser
This is placed to check the result generated in the 1st Pass. When an illustration you like is generated, select it and click the “Progress selected image” button to proceed.
Upscale
- Load Upscale Model:Select the upscaler model. In this case, we will use
4x-UltraSharp.pth
. - Upscale Image (using Model):Used to use the upscaler model.
- Scale Image to Total Pixels:The upscaler model is used to reduce an image that has been enlarged by a factor of 4 to a desired size. In this case, specify
3.00
to enlarge the image to a 3 megapixel illustration; if your PC has sufficient specifications, you can use5.00
to generate an even sharper illustration. - VAE Encode:Encode the scaled-up image into a latent image to be sent to 2nd Pass.
2nd Pass Info
- ModelSamplingFlux:In the 2nd Pass, max_shift is set to
0.15
because we do not want to change the composition significantly. -
CLIP Text Encode (Prompt):Use the following simple prompt for 2nd Pass.
very detailed, masterpiece, intricate details, UHD, 8K
2nd Pass
The 2nd Pass has most of the same settings as the 1st Pass, but only denoise is set to 0.35
to keep the original composition.
Preview Image
This is the final result. If you want to save the image, select “Save Image” from the right-click menu or change this node to “Save Image”.
The above is an explanation of the workflow.
Final Results
Conclusion
How was it? I hope that the introduction of GGUF has made you comfortable with heavyweight Flux.1. In addition to Flux.1[shnell], Flux.1[dev], SD3.5 large and sd3.5 large turbo models are also available at city96’s Hugging Face, so those who are interested can try them.
This is the third article on Flux.1[schnell], but it seems that the modeling community is not very excited about it. Perhaps the reason is that Flux.1[schnell] is a distillation model designed to reduce model size and improve speed, so it is not very flexible. For this reason, ControlNet cannot be used and fine tuning is not possible. However, the distillation model of Flux.1 [schnell] was removed in “🔗OpenFlux.1” published by Mr. Ostris, so perhaps ControlNet and others may appear.