Post by G3mHunter in Release 1.6.5 - General

Viewing post in Release 1.6.5 - General

What's the response time like?
It usually takes 30-60 seconds for messages to generate with my specs, if its faster than the time I'm getting then I'll definetly start using it.

Three Eyes Software104 days ago

It shouldn't take more than a few seconds with your specs. Which model are you using? Is your GPU AMD or Nvidia? Are you using the new version 1.6.6b with optimized memory usage?

G3mHunter103 days ago

Its an Intel Arc A770 so I suppose that's why.

Three Eyes Software103 days ago

I imagine Gemma-4-Sparse should be very fast on any Vulkan GPU. If you're using Qwen 3.5 with a version before 1.6.6b, it might be slow because of that.

G3mHunter102 days ago

I can show you a video/photo of the logs if you'd like, perhaps one of my configs is wrong? I'm not to knowledgeable in this field.

G3mHunter98 days ago

What would you recommend I do/change to be able to use my resources instead of openrouter?

Three Eyes Software97 days ago

Which model are you running? Did you update to 1.6.6b yet?

G3mHunter97 days ago

I am, I got it the moment it was published

Three Eyes Software97 days ago

After some research, it seems that A770 is simply not well-supported by llama.cpp. You could try running Gemma-4-Sparse to see if a sparse model is faster, if you aren't already doing that.

G3mHunter96 days ago

Hmm that's a shame, I'm pretty green on this field but I'll see what I can do. Thanks for the support thus far.

itch.io

Viewing post in Release 1.6.5 - General