Ae you still seeing this problem in the latest version?
Viewing post in How long can be a text to speak?
I'd like to see it (but of course not at the cost of more pronunciation errors.) If and when it can be done seamlessly, that would be great. The trade-off of longer processing times, would be acceptable. I suspect that 48khz would fit better with many users' workflow (DAW templates, sharing with collaborators, etc.) Thanks for considering it!