I am used the model Phi3.5 of microsoft, a very small model trained on data that have been curated to allow to train small LLM to be coherent. I think the dataset they used is inspired by the dataset called “TinyStories” (textbook or children stories like text generated by larger LLM). This curated dataset allow the model to write relatively coherent text while the specific tone and style then come from the system prompt I give to it (old english, cryptic sentence, low-fantasy medieval context, …).