This has been very interesting to observe. I have used this decently. My only complaints are that the Qwen 3 model seems to be rather restricted in it's usage. It gives me very bland and disappointing responses. You ask it to call you one name, and it goes on a superman rant about peace and justice (harmlessness etc). Do you have any suggestions? It is a very good feature integration, I just need a better model it seems. My goal stems not from the description, but from getting implicational second person statements towards the reader (me).
Viewing post in Mod to generate the description of censored content with an LLM
For now, the best would be to finetune an uncensored Qwen3-vl model. There are pre-made google collab to easily do this, but it is required to create a dataset of images with their expected spicy description.