The "two core pillars" approach is such a smart way to scope a jam prototype. We took a similar approach with our cozy adventure game — narrowed everything down to two systems (emotion-based crafting + farming loop) and tested only whether those felt fun before adding anything else.
The snapping system challenge resonates hard. Connecting pieces in a way that feels intuitive but works physically is deceptively difficult. In our case it's inventory slots and crafting recipes — different domain but the same core problem: the system needs to "just work" so players focus on creativity, not fighting the interface.
Your point about the camera being a crucial supporting element is really insightful. It's easy to treat camera as an afterthought, but in a physics sandbox where the payoff is watching chaos unfold, the camera IS the storytelling tool. The difference between "car fell off" and "car barely survived an insane loop" is entirely how the camera frames it.
To answer your question: yes, people want this. The combination of building + physics + watching outcomes is a loop that never gets old. The key differentiator will be how easy it is to share those chaotic moments — built-in replay/clip capture could be huge.
What's your current approach to the snapping — grid-based, socket-based, or something hybrid? Curious how you handle pieces that connect at angles.