AI Image Processing Toolkit
A collection of specialized scripts for AI image processing, dataset preparation, and model training workflows.
π οΈ Scripts Overview
WDV3 (Waifu Diffusion V3 Tagger)
An image tagging script using the WD V3 tagger models. Supports multiple model architectures (ViT, SwinV2, ConvNext) and can process both single images and directories recursively.
Features
- Multiple model architecture support
- Batch processing capabilities
- Adjustable confidence thresholds
- CUDA acceleration with FP16 support
- JXL image format support
Training Functions (train_functions.zsh)
A set of ZSH functions for managing AI model training workflows:
- Script execution management
- Training variable setup
- Git repository state tracking
- Output directory management
- Automatic cleanup of empty outputs
Git Wrapper (git-wrapper.zsh)
Enhanced Git functionality for dataset management:
- Automatic submodule handling
- LFS integration for JXL files
- Dataset-specific Git attributes management
Check4sig (check4sig.zsh)
Dataset caption file watermark detection utility:
- Scans .caption files for watermark-related text
- Batch processing support
- Interactive editing with nvim
- Recursive directory scanning
Gallery-dl Wrapper (gallery-dl.zsh)
Directory-aware wrapper for gallery-dl:
- Automatically changes to ~/datasets directory
- Maintains consistent download locations
- Preserves original command functionality
JoyCaption (joy)
Advanced image captioning system using CLIP and LLM:
- Multiple caption styles (descriptive, training prompts, art critic, etc.)
- Custom image adapters
- Tag-based caption generation
- Batch processing support
PNG to MP4 Converter (png2mp4)
Training progress visualization tool:
- Converts PNG sequences to MP4
- Customizable frame rates and durations
- Step counter overlay support
- Multiple sample handling
XY Plot Generator (xyplot)
Image comparison grid generator:
- Supports multiple image formats
- Customizable grid layouts
- Optional row/column labels
- Automatic image padding and alignment
Caption Concatenator (concat_captions)
Utility for combining multiple caption files:
- Merges .caption and .tags files
- Maintains original image associations
- Batch processing support
- Error handling for missing files
π Installation
- Clone the repository: (optional)
git clone https://huggingface.co/k4d3/toolkit
- Add the repository to your PATH: (optional)
export PATH="$PATH:~/path/to/toolkit"
- Add the
.zshrc
to your shell: (optional and you will need to make changes to it)
source ~/path/to/toolkit/.zshrc
nano ~/.zshrc
π Requirements
- miniconda with the environment set up for training with sd-scripts, timm, etc
- ZSH shell (optional)
- CUDA-capable GPU (recommended)
- Required Python packages:
- torch
- transformers
- pillow
- pillow-jxl
- opencv-python
- numpy
- and a lot more
π§ Usage
Each script can be used independently or as part of a workflow. Here are some common usage examples:
JoyCaption
joy --feed-from-tags=10 --custom_prompt="Write a very long descriptive caption for this image in a formal tone. Do not mention feelings and emotions evoked by the image." .
png2mp4
png2mp4 --repeat 16
inject_to_txt
inject_to_txt 1_honovy "honovy"
replace_comma_with_keep_tags_txt
replace_comma_with_keep_tags_txt 1 1_honovy
π¦ Directory Structure
~/
βββ datasets/
βββ output_dir/
βββ models/
βββ toolkit/
π License
WTFPL - Do what the fuck you want with it.
The included data and models are copyrighted by their respective owners with their own licenses.
π€ Contributing
Contributions are welcome! For major changes, please open an issue first to discuss what you would like to change.
π Documentation
If the documentation of a script is missing, ask a language model about it.