Does this happen immediately or after several inputs? Does it print anything special on the KoboldCPP command prompt?
After testing the model on cloud based 3090s/5090s, I've come to the conclusion that the upstream GPU implementation for this model is completely broken.
This didn't occur to me during testing as I don't own a GPU with 24 gb of vram myself, and as such did all the testing using slow CPU inference only.
I will look for an alternative 24 gb model, and add GLM-4 back in once it's properly implemented upstream.