Feature Request:
I would like Rahituber to discern what kind of speech fragment is coming through the audio and map it to different “open mouth” positions.
For example, we could have different mouth shapes for “ff”, “nn”, “aah”, “ooh”, etc.
Could use machine learning to classify the audio fragment; plenty of pre-trained models out there to steal.
