Skip to main content

Indie game storeFree gamesFun gamesHorror games
Game developmentAssetsComics
SalesBundles
Jobs
TagsGame Engines

Hey, thanks for trying it out!

Quick disclaimer up front: Godot MCP Pro was developed and tested primarily against Claude (Sonnet/Opus), so I can't guarantee identical reliability with local models. But here's what the current (May 2026) landscape looks like for your VRAM budget:

### Top pick for 24GB VRAM + MCP tool calling

**Qwen3.6-27B (dense)** — released April 2026, Apache 2.0, 262K context, native tool-calling. SWE-bench Verified 77.2. Fits comfortably in 24GB at Q4_K_M (~17–18GB) with room for context. This is currently the best single bet for agentic tool use at your VRAM tier.

### Strong alternatives

- **Qwen3-Coder-Next (80B MoE, 3B active)** — best-in-class for agentic coding (SWE-Bench-Pro at the level of models 10–20× larger), but it needs ~46GB VRAM/RAM at Q4. Won't fit 24GB GPU-only — workable only if you have unified memory (Apple Silicon) or are OK offloading to system RAM with a big speed hit.

- **Qwen2.5-Coder-32B-Instruct** — still excellent for FIM / tab-completion. Tool calling is decent but Qwen3.6-27B is a clearer upgrade for agent workflows.

- **Mistral Small 3.x (24B)** — solid function calling, fast, fits comfortably. A reasonable middle option if Qwen feels too verbose.

### What I'd avoid (or be cautious with)

- **DeepSeek V3** — independent evaluations have it around 81.5% on function-calling benchmarks vs ~96.5% for Qwen Plus, and it's specifically weaker at multi-turn function calling. For single-shot ("create this scene") it's fine; for chained Godot workflows (create scene → add nodes → write script → reload → test), expect drift.

- **Llama 3.x at this VRAM tier** — 70B doesn't fit cleanly at 24GB even quantized, and 8B isn't reliable enough for multi-step tool chains.

### What matters more than the model name

1. **Use a runtime that enforces tool schemas.** llama.cpp grammars, vLLM with guided decoding, or Ollama with native tool support. Free-form decoding will hallucinate tool args no matter which model you pick.

2. **Multi-step Godot workflows are the hard part.** Anything past 3–4 chained tool calls (which is most non-trivial editor work) is where local models drop off vs Claude. Expect to babysit longer chains.

3. **Keep the system prompt tight.** Local models burn context fast, and Godot MCP Pro's tool surface is broad — consider trimming the available tool list if you only need a subset.

If you do find a setup that works smoothly, I'd love to hear about it — drop a note here or on Discord.