The vision model usually have VL in their name. Qwen3-VL should be available in LM Studio this week.
It would be also possible to finetune a model with this : https://github.com/unslothai/unsloth using their pre-made google colab. We just have to create a dataset of example of images with their "censored description".