I'm working on improving this for Gemma 4 specifically using its currently unused thinking mode. Models with low active parameter count struggle with this task the way it's currently set up. Qwen 3.5 seems to do fine. I have not extensively tested it with the dense Gemma 4 or DeepSeek, but if it's a problem with those too, please let me know.