If the curl command fails, the issue of connecting to localhost is related to something else on your system, not the game itself. Perhaps there's another program already running on port 5001.
As for the slowness, the game is correctly offloading all layers to the GPU. I can't fully gauge the performance from the two successful API calls you posted since they have very few input/output tokens, but I've uploaded a new version that makes 5000 series RTX GPUs use a special backend setting, which when not selected, resulted in much slower (though not minutes long) processing during my testing.







