Actually, I used 341 images with captions, but the training was only 1600 steps at 0.0001 LR. Maybe it’s too few steps for such a large dataset, but … it’s working, you know?
Work for poses and NSFW, so for me it’s fine :)
(Pero es una salvajada en términos técnicos xDD)
