Love the idea of the world reacting to the pulse of the music — it feels like a strong identity hook.
Are you planning for visual/audio cues to shift with difficulty, or keeping the feedback consistent across levels?