Hi, I'm trying to run Mistral-Nemo-Instruct-2407-GGUF locally from my computer (thru LM Studio). I have everything set up and it's working, but for some reason all of the responses are short (80-100 characters). When I try to run the same instances/scenarios from the online version of the game, I was starting to get much longer scenarios (300-500 characters)...but I don't want to bog down your server. Are there any tips or tricks you know of that I can use to get longer responses? Also, it appears to cut off the responses in the game window, however the "choices" are almost always properly formatted.
I have also verified that I have plenty of memory (to the best of my knowledge). I'm running 10000 max memory and 4000 max output tokens in the Endpoint menu and I have my Context Length set to 4096 in the LM model. It appears to be running the default Jinja template for the Prompt Template...I'm not knowledgeable enough to know if that is related to the problem or not. I do have a somewhat beefy pc, so I'm not too worried about bumping up settings if that is necessary.