Hello everyone,
NovelAI released something called "AI Modules" where you can provide a preprocessed and curated dataset that will be used to train the AI, which in essence will generate content in a similar style to the source dataset. I took the encounters.json from the .jar file, and then made some modifications to it. Originally, the JSON file throws a lot of errors while trying to read it (extra or missing commas, some badly escaped strings, etc.) and shows up in both a JSON Linter as well as prevents the JSON file from being read in with your typical JSON Reader (at least the Python `json` library). Then I went ahead and only extracted the 'speaker' and 'text' tags, and tried to follow this guide: https://novelai.medium.com/custom-ai-modules-dbc527d66081 Furthermore, accounted for are the <COCKSIZE>, <LIPSIZE>, <BUTTSIZE>, etc. tags, which are given a hard-coded value. Unfortunately, the AI is not capable of outputting text in the same format as provided, but it has somewhat changed the output to be fairly reminiscent of Majalis' writing... sometimes, after a lot of coaxing and retrying for better output. It definitely needs a lot of work, but shows some potential... The white text below is what it generated (after many redo and undo operations)
It can definitely be improved, but unfortunately, to train on a dataset, you need "Steps" available, which is limited per month; I already used up over 50% on this one attempt and don't have enough for another, so maybe a community-effort could result in a really, really good module? The most likely change that needs to be done is in the encounters.txt file; maybe a way to modify the conversations to add who said what based on the 'speaker', i.e. "'I want to fuck that ass of yours,' says the Brigand" rather than having the name of the speaker on its own line.
Download: https://mega.nz/folder/ebg03S4I#zgNYulb1BbeR_Vj5kN4tAg