I only played the content from this update (basically just Heidi). The parts where she speaks a sentence the audio is delayed, so you see her talk and then hear the words a second or two after.
OK, that is strange, those were all created with native audio and renpy strictly binds the audio to the fame it is played on (which is why using mute blocks the video from playing at all if it has native audio). I've had a brief look into it and I've accidentally already done all the things the guides say to prevent this happening (using fixed frae rates, converted to webm format, opus audio codec, specifed audio channel in renpy). The only thing that stood out as a possibility is apparently blue-tooth headphones/speakers can sometimes cause a delay, you aren't by any chance using those are you?