Skip to main content

Indie game storeFree gamesFun gamesHorror games
Game developmentAssetsComics
SalesBundles
Jobs
TagsGame Engines

Ae you still seeing this problem in the latest version?

I am seeing it in anything beyond 4-5 mins. It's all garbled and pretty much makes it useless for my purposes. Any thoughts on what's causing it?

What model are you using? Chatterbox tends to degrade over longer speech.

Also, are you using the Lab or Projects interface to generate speech? For text that's longer than a few sentences you should create a project so VCP can properly chunk the text.

(+1)

Thanks for the quick reply. I'm using Owen3, but Lab. I'll try Projects and see if it makes a difference.

No problem. Let me know if it works. I'll work on making it more obvious that you should use projects for long-form content.

That worked, 6-8 min audio is not garbled, with either Owen3 or OmniVoice (haven't tried the others yet.) Now, all we need is 48k :)

Awesome!  My record is generating an 11 hour long audiobook without major issues. 

I did briefly experiment with a higher quality 48khz output for qwen3 but it introduced pronunciation errors  so I removed it. Might take a look again if you think it's worth it. 

I'd like to see it (but of course not at the cost of more pronunciation errors.) If and when it can be done seamlessly, that would be great. The trade-off of longer processing times, would be acceptable. I suspect that 48khz would fit better with many users' workflow (DAW templates, sharing with collaborators, etc.) Thanks for considering it!

Good to know, I'll add it to my list