Post by Three Eyes Software in Silverpine comments

Viewing post in Silverpine comments

1.0.7 already utilized all of those fixes, including the ones talked about here: https://github.com/ggml-org/llama.cpp/issues/12946

Generation works until a certain total amount of tokens have been processed, then it starts producing broken outputs regardless of the size of the inputs. It also produces CUDA errors that crash the backend randomly.

itch.io

Viewing post in Silverpine comments