DCAI
Loading Light/Dark Toggl

How to use high performance GGUF with ComfyUI Flux.1 [schnell]

⏱️13min read
📅 Nov 03, 2024
How to use high performance GGUF with ComfyUI Flux.1 [schnell] featured Image
Supported by

In this article, I will explain how to use GGUF with ComfyUI Flux.1[schnell]. Using quantized GGUF with Flux.1 requires a dedicated model and a dedicated custom node, and is recommended for performance improvement.

PR
Image of GIGABYTE - AORUS 17 (2024) Gaming Laptop - 240Hz 2560x1440 QHD - NVIDIA GeForce RTX 4070 - Intel Ultra 7 155H - 1TB SSD with 16GB DDR5 RAM - Windows 11 Home AD (AORUS 17 BSG-13US654SH)
GIGABYTE - AORUS 17 (2024) Gaming Laptop - 240Hz 2560x1440 QHD - NVIDIA GeForce RTX 4070 - Intel Ultra 7 155H - 1TB SSD with 16GB DDR5 RAM - Windows 11 Home AD (AORUS 17 BSG-13US654SH)
🔗Amazon-Usa Link
Image of Corsair Vengeance i5100 Gaming PC – Liquid Cooled Intel Core i9-14900KF CPU – NVIDIA GeForce RTX 4080 Super GPU – 32GB Dominator Titanium RGB DDR5 Memory – 2TB M.2 SSD – Black/Gray
Corsair Vengeance i5100 Gaming PC – Liquid Cooled Intel Core i9-14900KF CPU – NVIDIA GeForce RTX 4080 Super GPU – 32GB Dominator Titanium RGB DDR5 Memory – 2TB M.2 SSD – Black/Gray
🔗Amazon-Usa Link

What is GGUF?

GGUF (GPT-Generated Unified Format) is a file format released by the llama.cpp team in August 2023, which supports models that could not be supported by the previous GGML (GPT-Generated Model Language), It also has further versatility and extensibility. In the field of illustration generation AI, safetansor and bin files have been published with quantization conversion. If you want to create your own GGUF file, you can clone the official repository and use “convert-hf-to-gguf.py” to convert it.

To use GGUF with ComfyUI Flux.1 [schnell]

Download Models

To use GGUF with ComfyUI Flux.1[schnell], download the model published by city96. The repository has models from 2-bit (Q2_K) to 16-bit (F16), so download a model suitable for your environment with reference to “Quantization” explained later.

Quantization

The type of quantization for city96/flux.1-schnell-gguf is shown in the table below.

Basically, the higher the bit count, the higher the accuracy. In return, VRAM consumption will also increase.

TypeModel SizeDescription
Q2_K4.01 GB2-bit quantization; 16 superblocks, each block has 16 weights; 2.5625 bits per weight.
Q3_K_S5.21 GB3-bit quantization; 16 superblocks, each block has 16 weights; 3.4375 bits per weight.
Q4_K_S / Q4_0 / Q4_16.78 GB / 6.77 GB / 7.51 GB4-bit quantization; 8 superblocks, each block has 32 weights; 4.5 bits per weight; Q4_0 and Q4_1 are rounded to the nearest 4-bit value.
Q5_K_S / Q5_0 / Q5_18.26 GB / 8.25 GB / 8.99 GB5-bit quantization; 8 superblocks, each block has 32 weights; 5.5 bits per weight; Q5_0 and Q5_1 are rounded to the nearest 5-bit value.
Q6_K9.83 GB6-bit quantization; 16 superblocks, each block has 16 weights, resulting in 6.5625 bits per weight.
Q8_012.7 GBQuantized to the nearest 8 bits. Each block has 32 weights.
F1623.8 GB16-bit standard IEEE754 half-precision floating-point number.

The following recommendations are included for those who are not familiar with the table.

  • 24GB VRAM:Q8_0
  • 16GB VRAM:Q6_K
  • 12GB VRAM:Q5_K_S
  • Less than 10GB VRAM:Q4_0 or Q4_1

Custom Node Installation

To use GGUF with ComfyUI, a custom node “ComfyUI-GGUF” is required. Use “Custom Nodes Manager” to search and install ComfyUI-GGUF.

If you do not know how to install a custom node, please refer to the following article for a detailed explanation.

Quantized T5 v1.1 XXL encoder installation (optional)

Further performance can be achieved by using the GGUF file for the T5 v1.1 XXL encoder published by city96. Download the same quantization file as the model.

ComfyUI Flux.1 [schnell] + GGUF workflow

From here, let’s actually use the workflow with the model. This workflow uses the following models and custom nodes, so download and install them beforehand. Also, update to the latest version of ComfyUI because it may not work well if the version is old.

  • flux1-schnell-Q8_0.gguf:The 8-bit quantization model of Flux.1 [schnell] introduced in this article
  • t5-v1_1-xxl-encoder-Q8_0.gguf:T5 v1.1 XXL encoder quantized to 8 bit
  • ComfyUI-GGUF:Custom node to read Unet and CLIP in GGUF format
  • Image chooser:Implemented on a custom node to check 1st Pass results, this allows easy re-run of 1st Pass with short generation time until a satisfactory configuration is obtained.
  • 🔗aki_anime.safetensorsLoRA model with anime style
  • 🔗hinaFluxFantasyArmorMix-schnell_v1-rev1.safetensorsLoRA model that generates fantasy-style armor.
  • 🔗clip_l.safetensorsStandard text encoder for Flux.1
  • 🔗ae.safetensorsStandard VAE in Flux.1

The workflow is available on Patreon, but only paid supporters can view and download it. If you would like to become a paid supporter for just one month, it will encourage us to write more, so please join us if you would like.

Even if you cannot download the workflow, you can configure it yourself by looking at the explanation.

Basic Info

  • Unet Loader (GGUF):Load the Unet of GGUF, selecting flux-schnell-Q8_0.gguf for the unet_name.
  • DualCLIPLoader (GGUF):Load the GGUF text encoder model. choose clip_l.safetensors and t5-v1_1-xxl-encoder-Q8.gguf for clip_name.
  • Load LoRA:Load LoRA, setting aki_anime.safetensors to lora_name and strength_model to 0.80 to reflect the base model a little. In addition, since we want to adapt LoRA, we place another Load LoRA and set lora_name to hinaFluxFantasyArmorMix-schnell_v1-rev1.safetensors and strength_model to 0.60 because of the effect on the face.
  • Empty Latent Image:In this case, we will use 1280 x 720. batch_size is left at 1.
  • ModelSamplingFlux:This is the time step scheduling shift setting. max_shift should be set around 0.0 to 2.0 when used with FLUX.1 [schnell]. In this case, set it to 2.0. The base_shift is not reflected, so use 0 or the default of 0.5. 1024 is fine for width and height. In some cases, bypassing this node may give better results.
  • CLIP Text Encode (Prompt):Basically, you can use only natural language since T5XXL is good at natural language, but since CLIP L is also used, you can also use Danbooru style. In this case, we will use the following prompts.
    A beautiful blonde girl stands on a hillside under a blue sky.
    She looks like an angelic knight with a halo ring.
    She gazes at the viewer.
    She opens her white wings.
    Many white feathers in the sky.
    The girl's head is adorned with jewels.
    
    The theme is teal and orange.
    
    
    (The old castle is on top of a hill:0.85).
    
    horizonin view, 50mm lens portrait, correct perspective, (anime kawaii face with detailed eyes:1.3), medival fantasy, water fall, authentic (no credits, no signature.:1.1), (detailed fantasy white and gold armor:1.2)
    Negative prompts are not reflected, so leave them blank.
  • Load VAE:Load VAE; select ae.safetensors for vae_name.
  • Primitive (Seeds):Seed is externalized to share the seed value between 1st Pass and 2nd Pass.

1st Pass

The 1st Pass uses the standard sampler of ComfyUI. You can also configure it with “SamplerCustomAdvanced” as described in past articles.

  • KSampler:seed is externalized and gets its value from Primitive (Seeds). steps uses 2. cfg is set to 1.0 as recommended by Flux.1[schnell]. sampler_name is set to euler, scheduler to beta, and denoise is set to 1.00.
  • VAE Decode:Decodes the latent image generated by the sampler into a pixel image.

Preview Chooser

This is placed to check the result generated in the 1st Pass. When an illustration you like is generated, select it and click the “Progress selected image” button to proceed.

Upscale

  • Load Upscale Model:Select the upscaler model. In this case, we will use 4x-UltraSharp.pth.
  • Upscale Image (using Model):Used to use the upscaler model.
  • Scale Image to Total Pixels:The upscaler model is used to reduce an image that has been enlarged by a factor of 4 to a desired size. In this case, specify 3.00 to enlarge the image to a 3 megapixel illustration; if your PC has sufficient specifications, you can use 5.00 to generate an even sharper illustration.
  • VAE Encode:Encode the scaled-up image into a latent image to be sent to 2nd Pass.

2nd Pass Info

  • ModelSamplingFlux:In the 2nd Pass, max_shift is set to 0.15 because we do not want to change the composition significantly.
  • CLIP Text Encode (Prompt):Use the following simple prompt for 2nd Pass.
    very detailed, masterpiece, intricate details, UHD, 8K

2nd Pass

The 2nd Pass has most of the same settings as the 1st Pass, but only denoise is set to 0.35 to keep the original composition.

Preview Image

This is the final result. If you want to save the image, select “Save Image” from the right-click menu or change this node to “Save Image”.

The above is an explanation of the workflow.

Final Results

Final Results
Seed:739450908043048
Open Image

Conclusion

How was it? I hope that the introduction of GGUF has made you comfortable with heavyweight Flux.1. In addition to Flux.1[shnell], Flux.1[dev], SD3.5 large and sd3.5 large turbo models are also available at city96’s Hugging Face, so those who are interested can try them.

This is the third article on Flux.1[schnell], but it seems that the modeling community is not very excited about it. Perhaps the reason is that Flux.1[schnell] is a distillation model designed to reduce model size and improve speed, so it is not very flexible. For this reason, ControlNet cannot be used and fine tuning is not possible. However, the distillation model of Flux.1 [schnell] was removed in “🔗OpenFlux.1” published by Mr. Ostris, so perhaps ControlNet and others may appear.

PR
Image of GIGABYTE GeForce RTX 4080 Super WINDFORCE V2 16G Graphics Card, 3X WINDFORCE Fans, 16GB 256-bit GDDR6X, GV-N408SWF3V2-16GD Video Card
GIGABYTE GeForce RTX 4080 Super WINDFORCE V2 16G Graphics Card, 3X WINDFORCE Fans, 16GB 256-bit GDDR6X, GV-N408SWF3V2-16GD Video Card
🔗Amazon-Usa Link
Image of ASUS TUF Gaming NVIDIA GeForce RTX 4080 Super OC Edition Gaming Graphics Card (PCIe 4.0, 16GB GDDR6X, HDMI 2.1a, DisplayPort 1.4a)
ASUS TUF Gaming NVIDIA GeForce RTX 4080 Super OC Edition Gaming Graphics Card (PCIe 4.0, 16GB GDDR6X, HDMI 2.1a, DisplayPort 1.4a)
🔗Amazon-Usa Link
Supported by