Huh, voice tags should just replace their content with other tags before parsing...
e.g.
voice tag named "test" with the contents: "<c=red>"
then <c=red> and <v=test> would essentially be the same thing.
Juuust to make sure, what are the contents of the voice tag you're inserting? I want this to be able to reproduce the issue, too.