1.0.7 already utilized all of those fixes, including the ones talked about here: https://github.com/ggml-org/llama.cpp/issues/12946
Generation works until a certain total amount of tokens have been processed, then it starts producing broken outputs regardless of the size of the inputs. It also produces CUDA errors that crash the backend randomly.