Not yet, it's a very early proof of concept, unfinished and with some problems to solve. I plan on making it available however
sunjay787
Recent community posts
Thanks! Yes you are right, we call it "Action inference". The LLM is guided to select (make a decision) from a fixed set of actions and parameters. The decision itself is influenced by the current context, the character’s personality, past interactions (and the LLM of course). It's an essential tool within Voxta to automate decisions and connect with the outside world, like spotify or light control (home automation basically) or toys (everything that has an API).
It's called Voxta, a privacy focused, closed source project I'm contributing to: https://voxta.ai/. You can use local or cloud based models and it's highly customizable
Alright, here is a little demo of how it could look like to give an AI character knowledge of what can be seen on screen and control over what is allowed and what is not: Hotscreen AI Demo
You can see at the bottom of the chat window the classes that are currently on screen and if they are allowed or forbidden on the right are actions that can be chosen by the AI. So with each character reply it can decide weather to allow a class that is currently forbidden or vice versa, that happens all autonomously via a hotscreen bridge within the platform and a custom mod in hotscreen talking to each other. The AI platform can use different services for TTS, SST and text-gen to bring everything to live.
At least in my opinion it would add a lot of value to be able to track the users gaze. With that we could set it up, so that censoring increases if the user watches forbidden categories (gaze position/direction matches or is close to the position/area of the category).
Even better if a software tracker could be used that supports normal webcams or smartphone cameras as webcam instead of expensive hardware tracker. There are open source libraries/projects available that offer something in this area. For example:
https://github.com/NativeSensors/EyeGestures
I could also build addons, like I described in this post: https://itch.io/post/1392194 if we would have an API. That would open up for a lot of opportunities into different directions.
There’s an AI platform project that can be fed with custom information and make decisions based on predefined contexts. It can run entirely locally, complete with TTS, STT, and everything.
To make this work with hotscreen, we’d need an API or WebSocket connection. As a starting point, it would be useful to access the current body-part configuration and the latest detection data. Ideally, we’d also be able to change the configuration via the API.
Here are a couple of possibilities this could unlock:
- Dynamic AI speech: Let the AI speak (via TTS) directly to the current on-screen context, without relying on pre-recorded voice lines. This would make responses fresh and unpredictable, eliminating the need to prepare new samples. The dialogue would always be context-aware, and the tone or style could be shaped by configuring the AI’s character profile (e.g., dominant, teasing, sarcastic, etc.).
- Content control via AI “mood”: If body-part activation could be controlled through the API, the AI could decide when certain content is shown, adding an element of randomness. We could even have to “request” to see specific content, and based on its mood or personality, it could allow or deny access, again, heavily influenced by the configured character traits.
I could develop a plugin to act as the interface between the AI platform and hotscreen, but I’d need a way to both read and set parameters as described above. If this sounds feasible, we could move forward with building a proof of concept.