DCAI
Loading Light/Dark Toggl

How to create an original character LoRA [Dataset] Making a training image and caption

⏱️37min read
📅 Mar 28, 2025
🔄 Mar 28, 2025
Category:📂 Advanced
How to create an original character LoRA [Dataset] Making a training image and caption featured Image
Supported by

In this article, I will show you how to create a dataset of the original character that will serve as the training source. A dataset is a set of files that includes the source image and its caption. There are various ways to create a dataset, but this time, I will use ComfyUI and A1111 WebUI to create images using “VRoid Studio,” a free 3D character-making software published by pixiv.

PR
Image of Corel PaintShop Pro 2023 Ultimate | Powerful Photo Editing & Graphic Design Software + Creative Suite | Amazon Exclusive ParticleShop + 5 Brush Starter Pack [PC Download]
Corel PaintShop Pro 2023 Ultimate | Powerful Photo Editing & Graphic Design Software + Creative Suite | Amazon Exclusive ParticleShop + 5 Brush Starter Pack [PC Download]
🔗Amazon-Usa Link
Image of Adobe Creative Cloud Photography Plan 1TB (Photoshop + Lightroom) | 12-month Subscription with auto-renewal
Adobe Creative Cloud Photography Plan 1TB (Photoshop + Lightroom) | 12-month Subscription with auto-renewal
🔗Amazon-Usa Link

Creation of original characters for the training source

I will create original character images for training using the free software “VRoid Studio”. The method I will introduce allows for professional character creation using various 3D software tools. These include “🔗Cinema4D”, the free “🔗Blender”, the sculpting software “🔗ZBlush”, the free “🔗ZBrushCoreMini”, and clothing modeling software such as “🔗Marvelous Designer” and “🔗CLO”. While mastering these tools takes time, they enable high-quality, professional-grade character creation.

Installation of VRoid Studio

I will not explain in detail, but first download and install “VRoid Studio”.

Go to the VRoid Studio page, download and run the installer. There are no special settings required, so it is easy to install.

Character creation with VRoid Studio

After installation is complete, launch “VRoid Studio”.

VRoid Studio Creating Screen.
VRoid Studio Creating Screen.
Open Image

In the example, the presets are combined to create a character.

At the top of the UI, switching to the edit page and to the left, you will see a line of presets, which you can freely select to create your character.

Completed character
Completed character
Open Image

Since I will not show you how to use VRoid Studio in detail, I will leave a link to the official documentation.

One thing to note in VRoid Studio is the outline settings on the “Look” page. If you can output clean lines when shooting later, there is no problem using them, but if they are jagged, it will be a problem (jaggedness will appear in the generated image), so set all outlines to 0.

When you are satisfied with your character, save the model.

Creation of training source images

The completed character can be imaged in the Photo Booth Mode. Click the camera icon button in the upper right corner to switch the mode.

Photo Booth Mode
Photo Booth Mode
Open Image

If you want to train a character in LoRA training, you will need to take pictures of the character from various angles.

Photo Booth Mode Settings

Once you have switched to Photo Booth Mode, you should first set the capture settings. You can switch to each setting from the menu on the left.

Background

The photo is taken using the default gray background. The gray background will be recognized as “black background” in the caption, but if you do not plan to use it, it is not a problem. (If you use a black background when applying LoRA, replace it with a gray background.) It is also possible to train on white backgrounds, black backgrounds, and transparent backgrounds. If you have a specific conception of the character’s world, it is a good idea to load a background that fits the conception of the world. However, if you use background images, be careful not to use the same background for all of them.

Lighting

Lighting is also taken as default. It takes a lot of time and effort, but it seems to produce better shadow areas when photographed with shadows from various directions. However, since it takes a lot of effort and the shadows have warm colors, and the shadows of white objects are pink, I decided to shoot only in one direction.

Wind

Not used.

Post-processing

I want to export the captured images with as high quality as possible, so check “Anti-aliasing” and select High. If you do not do this, details such as hair will adversely affect in the training process.

Pic Size

The training source images are to be trained at 1024×1024 or 512×512, but I want to keep them as sharp as possible, so they are taken at 2048x2048 and scaled down later in the process.

Shooting a character

Shoot the characters from various angles, but there are a few things to keep in mind.

  • Number of shots: The number of images needed to train a character LoRA is about 10 to 50 clear images that do not need to be enlarged. In this case I set the number of images to 50.
  • Poses: Poses do not have to be all different, but go ahead and change them moderately, because if you train only one pose, they may only do that pose. However, in “VRoid Studio,” if she is wearing a skirt, her legs will pierce through in poses such as crouching. Avoid poses that would look strange as training source images. Also, it is easier to capture the pose you are aiming for if you pause the pose while shooting.
  • Facial Expression: The same goes for facial expressions. If you have them train to use all the same expressions, they may only use those expressions, so take pictures while changing the expressions moderately. Also, make sure to uncheck the “Blink” checkbox to avoid half-closed eyes when shooting.
  • Shooting Angle: Shoot a person from the front, side, back, or diagonal.
  • Zooming: Shoot full, upper body, lower body, and close-up shots of the person’s face. Especially since we want to train the character, we will shoot a little more of the important facial parts and costumes.
  • About Hands: As for the hands, this 3d model is not very detailed, so try not to photograph them as much as possible to avoid making them train.
  • About Costumes: There is nothing to note since you are setting up only one costume pattern, but if you have multiple costumes in mind, it will be easier to work with if you separate the folders to save for each costume.

Take pictures while carefully considering the precautions. To take a picture, use the blue camera button in the lower right corner of the screen.

Captured images of the character
Captured images of the character
Open Image
Supported by

Custom image styles & image caption making with ComfyUI

I would like to create a generic LoRA that does not train the image style of the training source images (the flat 3DCG often seen in 3D Vtuber models), but only the features of the character. image to image can be used to create multiple image styles for the training source to prevent style fixation.

If you let it train without changing the image style as it is, the style will remain strong. You could use cg, 3dcg tags in the caption to train the style, and then put the tags set at generation in the negative prompt, but it would be difficult to get rid of the style completely.

If you want them to be trained in the style as well, skip this section and move on to “⏬4. Caption generation and editing of images using A1111 WebUI”.

Introduction of Workflow

Entire workflow
Entire workflow
Open Image

The workflow introduced here is to draw with img2img using a model of your choice (SD1.5 or SDXL) from multiple input images and create a caption at the same time.

The workflow is available on Patreon, but only paid supporters can view and download it. As a bonus, you can also download input images exported by VRoid Studio as described in the previous section.

Even if you cannot download the workflow, you can configure it yourself by following the instructions.

Prepare custom nodes for the workflow

The workflow uses the following custom nodes. Install them in advance.

If you do not know how to install a custom node, please refer to the following article.

Workflow Description

Image Batch Loader

  • 🔶 Restart & Active Frame: Control node for batch processing. When you want to start a new batch process, change Version to reset the frame.
  • 🔶 Load Image Batch: Loads images for batch processing. Specify a folder for the input images.
  • 🔶 Info Display: Check batch processing information.

Test Style

  • Load Image: Specifies the image to be used for testing the style.
  • 🔶 Any Switch: Switches between test and input images. 0 = Batch Mode / 1 = Test & Fix Mode

Load Models

  • Load Checkpoint: Load the checkpoint model. In this case, “Animagine XL 4.0” is used.
  • LoraLoaderModelOnly: Loads LoRA. By default, it is bypassed

Scale image

  • Upscale Image: Scale the input image to 1024 pixels.
  • VAE Encode: Encode from pixel data to latent data.

Prompts

  • WD14 Tagger 🐍: Generate captions from input images.
  • String Function 🐍: Add any tags to the generated captions to create positive prompts.
  • CLIP Text Encode (Prompt): Positive prompt; externalizes text and reads the prompt created by String Function 🐍.
  • CLIP Text Encode (Prompt): Negative Prompt. Enter the regular negative prompt.

Sampler

  • KSampler: steps/cfg uses the “Animagine XL 4.0” recommendations; use denoise to set how much it differs from the input image.
  • VAE Decode: Decode from latent data to pixel data.

Caption

  • String Function 🐍: Set the instance (unique name of the LoRA to be trained) and class (generic name of the LoRA to be trained) to be set when training captions. *In the example, the class 1girl is also generated in WD14 Tagger 🐍, so it will be duplicated, but this is not a problem as the caption will be edited later.

Save image & caption

  • LayerUtility: Image Tagger Save: Save the generated image and caption (txt file) to the specified location.

Preview

  • Preview Image: Preview the input and output.

How to use the workflow

The workflow generates training source images in two different image styles from the input image.

Selection of img2img generative model

Select the img2img generation model. The example uses “Animagine XL 4.0 – V4 Opt”. The image style is determined to some extent by the checkpoint model. Experiment with various options.

If you want to use the SD 1.5 model, change the width of Upscale Image to 768 or 512.

LoRA selection (optional)

If the checkpoint model alone cannot generate different image styles, the image style LoRA can be used.

Specify input image directory

Specify the input image directory for the path in “🔶 Load Image Batch”. In this example, I have created a folder train_images in the ComfyUI input folder as ComfyUI\input\train_images and placed the images there.。

This directory specification requires the path to be entered directly as text.

An easy way to do this is to open the folder in File Explorer in Windows and right-click on the address bar to bring up a menu. Click on “Copy Address” to copy the path, then paste it into path.

WD14 Tagger 🐍 Settings

The default is fine to use. In the example, the caption was a bit simplified, so the threshold was lowered from 0.35 to 0.30.

Create img2img prompt

Add a tag before or after the caption generated by the WD14 Tagger. In the example, the “Animagine XL 4.0” recommended quality tag is added after.

Entering Negative Prompts

Specify the negative prompt to use with img2img. In the example, in addition to the “Animagine XL 4.0” recommended negative prompt, lips is entered to avoid generating lips.

Sampler Settings

The steps/cfg/sampler/scheduler can be set to your liking. denoise is an important value in this workflow, as it sets how different from the input image the character elements will be. If this value is too large, the character elements will be too far apart. However, if it is too small, it will be almost the same as the input image, so set it between 0.25 and 0.50.

Illustration Style Testing

Use the checkpoint model, LoRA, and prompts to determine the illustration style.

Mute or bypass “LayerUtility: Image Tagger Save” when testing. Change nr in “🔶 Any Switch” to 1 and load the image you want to test into “Load image”.

Batch processing (1st time)

Set the Batch Count in ComfyUI to a value equal to the number of input images. Then set the Version in “🔶 Restart & Active Frame” to the new version. Also, if you have been testing, set nr in “🔶 Any Switch” back to 0.

Once the settings are complete, click the “Queue” button to start batch generation. The generated data will be saved in the location specified by custom_path in “LayerUtility: Image Tagger Save”. The file name is specified by filename_prefix.

Set the illustration style for the second time

If you want a character LoRA trained to be a generic LoRA without a fixed illustration style, it is better to train multiple illustration styles, so generate the LoRA again with a different illustration style than the first one. When testing the style, remember to set the Batch Count back to 1.

As an example, I added (realistic:1.4) to text_a in “String Function 🐍” in the Prompts group, and flat color, anime in “CLIP Text Encode (Prompt)” (negative prompt).

Batch processing (2nd time)

Set Batch Count to a value equal to the number of input images. Then set the Version of “🔶 Restart & Active Frame” to the new version.

Once the settings are complete, click the “Queue” button to start batch generation; files will be saved from 00000051 following the first batch.

Fixing the generated image

After all images have been generated, check the generated images. If the image is unclear or an unintended image is generated, mute “LayerUtility: Image Tagger Save” as when testing the illustration style, change the nr in “🔶 Any Switch” and the Batch Change the Count to 1 and load the image to be fixed in “Load image”.

Generate images while adjusting Sampler and prompts.

When you are satisfied with the image generated, right-click on the image in Preview (Output) and select Save Image to overwrite the original image that you just fixed.

For even higher quality, retouch using “🔗Gimp”, “🔗CLIP STUDIO PAINT”, “🔗Affinity Photo 2”, or “🔗Adobe Photoshop”. A Drawing tablet is recommended for retouching, as it will increase your work efficiency.

Completion of dataset

Once all the fixes are done, the dataset is complete. As you can see below, there is not much difference, but we now have two different styles from the input image. (If you want to make more difference, change the checkpoint model.)

Dataset Image Sample
Both images are styled with denoise:0.40 while preserving the input image.
Open Image

In the example, I created two illustration styles without the original image, but you can also bypass the sampler in the first batch and caption the input image as is, and img2img in the second batch. Try this method if you are unable to reproduce the details of the characters when using the trained LoRA.

The custom image and caption are now exported. Next, let’s edit the exported caption using A1111 WebUI.

Supported by

Caption generation and editing of images using A1111 WebUI

From here, you will use A1111 WebUI to generate and edit captions for images. If you have generated captions using ComfyUI, proceed from “⏬4-2. Caption editing using the WebUI Dataset Tag Editor extension”.

Caption generation using WebUI WD 1.4 Tagger extension

For caption generation, use the A1111 WebUI extension stable-diffusion-webui-wd14-tagger.

If you do not know how to install the software, please refer to the following article.

After installation is complete, generate captions from the image folder.

Go to the Tagger page

Once the extension is installed, “Tagger” will appear in the page switching tab of the UI, click on it to switch pages.

Specify image folder

In the upper left corner of the opened page, there is a tab for switching between Single process and Batch directory, so switch to Batch directory.

Specify the image folder where the images were taken in VRoid Studio in the Input directory. If you want to save the generated caption files in a different folder, which is not often used, specify the destination in the output directory.

I explained this in the How to use the workflow, but I’ll explain it again for those of you who skipped the custom image styles. This folder specification requires you to enter the path directly. The easy way to do this is to open the folder in File Explorer in Windows, right-click on the address bar, and a menu will appear. Click on “Copy Address” to copy the path and paste it into the input directory.

Instance and class tags common to all images

Specify instance tag (unique names of LoRAs to be trained: tags that have not been trained by the training base model) and class tag (generic names of LoRAs to be trained: tags that have already been trained by the training base model) in “Additional tags”. As an example, we set dcai-girl, 1girl. *In the example, “1girl” is generated in most of the caption files generated, so the double tagged tag will be removed in a later caption edit.

Caption tag generation

Press the “Interrogate” button to start generation. After a few moments, the result of the generation will be displayed in the left part of the UI.

The caption file has now been added to the dataset folder. You can use it as a trained dataset as is, but if you want to make a better LoRA, edit the caption tags generated.

Caption editing using the WebUI Dataset Tag Editor extension

Use stable-diffusion-webui-dataset-tag-editor for caption editing. Install it in the same way as before.

Once the extension is installed, let’s edit the caption tags. Only basic usage will be explained in this article.

Go to Dataset Tag Editor page

Click on the “Dataset Tag Editor” in the page switching tab of the UI to switch pages.

Specify the dataset folder

Specify the path to the dataset folder in the Dataset directory; Caption File Ext is used as .txt, but if it was created with .caption, change it to .caption.

Loading Datasets

After specifying the data set folder, load it by clicking the “Load” button on the right.

Unnecessary tag removal

Open “Batch Edit Captions” from the page tabs on the right side of the UI and select “Remove” from the tabs on the page.

A list of tags for captions is displayed in the Select Tags section in the lower right corner of the UI, so select the tags you do not need. Once selected, click the “Remove selected tags” button to remove them. *How to select unnecessary tags is explained in detail later in this article, so please refer to that for removal.

Duplicate tag removal

Remove them by clicking the “Remove duplicate tags” button on the Remove page. In the example, “1girl” is a duplicate, so use this button to remove it. It is also recommended to use this button at the end if you have replaced or directly edited the tags.

Tag Replacements

Open “Search and Replace” on the “Batch Edit Captions” page and replace the tags using the Search and Replace for all images displayed area at the bottom.

Enter the original tag to be replaced in Search Text, and the tag you wish to replace in Replace Text. If you want to replace the entire caption, select “Search and Replace in” for Entire Caption.

Finally, use the “Search and Replace” button to execute the rewrite.

Direct editing of captions

To edit the caption manually, open “Edit Caption of Selected Image” from the page tabs on the right side of the UI and click on the image you want to edit from the list of images on the left.

While selecting the image to be edited, the “Copy and Overwrite” button will load the caption into Edit Caption at the bottom, so you can edit it directly. If there are any tags missing from the caption, add them.

When you are done editing, overwrite the caption with the “Apply changes to selected image” button. *Note that if you press this button when this Edit Caption is blank, the caption for the selected image will disappear.

You can see the caption tag for each image on this page. Check the number of tokens displayed on the right side of “Caption of Selected Image”. This is because when setting up the Kohya ss GUI, you need to select 75 / 150 / 225 in the “Max Token Length” field. Be sure to check the maximum number of tokens for all training images; if it is more than 225 tokens, tags after 225 tokens will not be loaded.

Check Tag

To check tags, open “Filter by Tags” from the page tab on the right side of the UI and select a tag from Filter Images by Tags. On the left side of the UI, you can see the images that use the selected tag.

Saving edited captions

When you have finished editing the caption, click the “Save all changes” button to overwrite the existing file. By default, the buckup original text is enabled, so the original file is saved with the extension .000. (If a backup file already exists, the extension will be incremented to .001 and so on.) If you want to load a backup file, enter the desired backup extension in the Caption File Ext field. Example: .000

Supported by

About caption tags

The caption tag is one of the very important elements of the LoRA training.

Set up a group of tags that accurately reflect the features you want to train. In particular, if you want the character to train, it is advisable to keep the background tags simple.

It is also important to specify the tags you wish to adjust when applying LoRA.

In the example, if animal ear is trained, ears will be generated by specifying the animal ear tag when LoRA is applied. However, if the animal ear tag is not included in the caption, the ear may be easily influenced by the instance tag dcai-girl, etc. that represents the caption, and if the dcai-girl tag is present when you do not want to generate ears, the ears may be generated due to the influence of the dcai-girl tag.

Also, images that are not associated with the animal ear tag (e.g., rabbit ears) are less likely to be generated unless they are described in the prompt.

Edit your tags with the above in mind.

How to select unnecessary tags when editing tags

Since you may not know what is unnecessary until you get used to selecting unnecessary tags, I will use this example to explain how to select unnecessary tags.

First, let’s look at the list of tags used in the caption.

The following tags are unnecessary from this list

Now let’s look at why the selected tag is unnecessary.

  • :d / ;\): For not using emoticons.
  • alternate costume: Not needed for alternate costume tag.
  • black bow / bow: Ribbon for neck accessory, to be combined with black bowtie / bowtie.
  • brown hair / red hair: To put together orange hair.
  • capelet: For the capelet is not worn.
  • corset: The corset is to be put together in a dress.
  • erune: Probably because it is a tag for the Elune Tribe of Granblue Fantasy and leaving it may affect the Elune Tribe.
  • frilled skirt / frills: The frilled skirt is not planned to be changed, but to be put together into a skirt.
  • high heel boots / high heels: For not wearing high heels.
  • horse ears / horse girl: Probably because it is an Uma Musume tag and leaving it may affect Uma Musume.
  • virtual youtuber: It is a tag that is often attached to anime-style 3DCG images, so you can keep it, but to simplify the caption.
  • white background: False detection. Because white background is not used.

If you want to remove these tags all together, but want to combine them into other tags such as brown hair, replace them with the tag you want to combine (orange hair) and then use “Remove duplicate tags” to remove them. This way, we prevent the caption from becoming a caption with no hair color filled in.

Improve the quality of captions

Once unnecessary captions have been removed, the next step is to add the missing tags.

Some tags are missing from the caption, so add the missing tags while looking at the image.

For example, there are captions that do not contain animal ears. In such cases, use “Search and Replace” to edit.

On the “Batch Edit Captions” page, open “Search and Replace” and in Search Text, type short hair, which is related to animal ears and is correctly captioned in all, and in Replace Text, type short hair, animal ears.

Select Search and Replace in as the Entire Caption and then perform the replacement using the “Search and Replace” button. Now animal ears is correctly inserted into the caption. Also, since the background caption is set to black background, it should be replaced with gray background for versatility.

The following other tags were missing, so use “Edit Caption of Selected Image” to add them directly.

thigh strap,
white thighhighs,
belt

Next, add a blue dress or a black skirt so that the color of the dress and skirt can be changed later. To add the dress, and skirt,, enter the missing dress and skirt manually in the “Edit Caption of Selected Image” field, and replace the dress and skirt in the “Search and Replace” field in the same way as in the animal ears example.

Since it is time consuming to manually edit 100 captions, edit the first 50 files in the first half and copy the txt files 1-50 to a temporary folder after completion. After renaming the files 51-100, return them to their original folders and overwrite the existing files.

Bonus: How to batch convert file names (Windows)

When creating a dataset, there may be times when you need to rename multiple files at once. The simplest method is to select all the files you want to rename in File Explorer, press F2, and rename them. This will automatically append numbers in the format name (1), name (2) and so on. However, this method only starts numbering from 1 and lacks customization options. To achieve more precise control over file names, I recommend using the powerful tool Microsoft PowerRename, which allows for advanced renaming configurations.

How to install PowerRename

PowerRename is one of the features of “Microsoft PowerToys” and can be installed from the Microsoft Store link below.

How to use PowerRename

It is simple to use. With all the files you want to change selected, right-click to bring up the menu.

Click “Rename with PowerRename” to launch PowerRename.

Enter the characters you want to change in the search target. In our example, we check the box “Use regular expressions” and enter the following in the search field. This will change all file names with a “.png” extension.

(.*).png

Next, enter the following in the Replace with field. *Be careful not to forget to enter the extension.

dcai-train_${increment=1,padding=8,start=51}.png

You will now see the change from “dcai-train_00000001.png” to “dcai-train_00000051.png” in the preview on the right side of the UI. If there are no problems, click the “Apply” button to execute the replacement. If you applied it by mistake, go back to the file explorer and press Ctrl + z to cancel the change.

A little explanation of the variable patterns of replacement.

  • ${}: A simple counter starting from 0.
  • ${increment=X}: The counter is incremented by the number of X’s.
  • ${padding=X}: A counter with the number specified to X digits.
  • ${start=X}: Counters starting with an initial value of X.

Combine the above pattern with ${increment=1,padding=8,start=51} and the variable will be replaced with an 8-digit number starting at 51 and moving up by 1. For more detailed settings, please read the official documentation.

Conclusion

In this article, I introduced the process of creating training images using VRoid Studio. Spending time and effort to carefully create a dataset is essential for high-quality LoRA training. By using the ComfyUI workflow introduced earlier, you can modify the original illustration style, making it a valuable tool for reducing the CG-like appearance of 3D renders. The created dataset is available on Patreon for paid supporters. In the next article, I will cover how to train a LoRA for the SD1.5 model using this dataset. I recommend preparing your own original dataset or downloading the one provided in advance.

PR
Image of One by Wacom Small Graphics Drawing Tablet 8.3 x 5.7 Inches, Portable Versatile for Students and Creators, Ergonomic 2048 Pressure Sensitive Pen Included, Compatible with Chromebook Mac and Windows
One by Wacom Small Graphics Drawing Tablet 8.3 x 5.7 Inches, Portable Versatile for Students and Creators, Ergonomic 2048 Pressure Sensitive Pen Included, Compatible with Chromebook Mac and Windows
🔗Amazon-Usa Link
Image of XPPen Updated Deco 01 V3 Drawing Tablet-16384 Levels of Pressure Battery-Free Stylus, 10x6 Inch OSU Graphic Tablet, 8 Hotkeys for Digital Art, Teaching, Gaming Drawing Pad for Chrome, PC, Mac, Android
XPPen Updated Deco 01 V3 Drawing Tablet-16384 Levels of Pressure Battery-Free Stylus, 10x6 Inch OSU Graphic Tablet, 8 Hotkeys for Digital Art, Teaching, Gaming Drawing Pad for Chrome, PC, Mac, Android
🔗Amazon-Usa Link
Supported by