If you are starting a sound when dialogue starts and you don't handle the remaining events, you do get that result. To make it stop in the latter situation, you have to process the other events to control those files.
Also, how do you handle when the user changes settings to use a different text speed? Are you also changing the speed of the dialogue?