Thank you for the help- I've done as you said but for some reason today I can't seem to connect to the localhost port 5001- at least I think that's what's going on. It feels like a different issue altogether since before I could actually connect, though it was terribly slow still.
Here's the first output, though it looks a little different, it was the closest thing I could find.
This is the second output- though it seems to fail for some reason.
Here's me failing to connect inside the game itself.
I checked and everything is allowed through the firewall, including kobold. Should I just try another time?
If the curl command fails, the issue of connecting to localhost is related to something else on your system, not the game itself. Perhaps there's another program already running on port 5001.
As for the slowness, the game is correctly offloading all layers to the GPU. I can't fully gauge the performance from the two successful API calls you posted since they have very few input/output tokens, but I've uploaded a new version that makes 5000 series RTX GPUs use a special backend setting, which when not selected, resulted in much slower (though not minutes long) processing during my testing.