Post by Mortar Tribe in How long can be a text to speak?

Viewing post in How long can be a text to speak?

Ae you still seeing this problem in the latest version?

I am seeing it in anything beyond 4-5 mins. It's all garbled and pretty much makes it useless for my purposes. Any thoughts on what's causing it?

Mortar Tribe71 days ago

What model are you using? Chatterbox tends to degrade over longer speech.

Also, are you using the Lab or Projects interface to generate speech? For text that's longer than a few sentences you should create a project so VCP can properly chunk the text.

skhanna9471 days ago(+1)

Thanks for the quick reply. I'm using Owen3, but Lab. I'll try Projects and see if it makes a difference.

Mortar Tribe71 days ago

No problem. Let me know if it works. I'll work on making it more obvious that you should use projects for long-form content.

skhanna9471 days ago

That worked, 6-8 min audio is not garbled, with either Owen3 or OmniVoice (haven't tried the others yet.) Now, all we need is 48k :)

Mortar Tribe71 days ago

Awesome! My record is generating an 11 hour long audiobook without major issues.

I did briefly experiment with a higher quality 48khz output for qwen3 but it introduced pronunciation errors so I removed it. Might take a look again if you think it's worth it.

skhanna9471 days ago

I'd like to see it (but of course not at the cost of more pronunciation errors.) If and when it can be done seamlessly, that would be great. The trade-off of longer processing times, would be acceptable. I suspect that 48khz would fit better with many users' workflow (DAW templates, sharing with collaborators, etc.) Thanks for considering it!

Mortar Tribe71 days ago

Good to know, I'll add it to my list

itch.io

Viewing post in How long can be a text to speak?