You might have been mislead by the term "AI". There is nothing intelligent nor self aware about a large language model.
Llm are this https://en.wikipedia.org/wiki/Markov_chain with a lot of sophistication and an immensly huge model and added dimensions. The "AI" we are talking about needs a prompt. It will then generate a probable answer for that prompt. And that probable includes pseudo-randomnes.
Since there is chance involved that means that your results can have random errors. Or systematic errors if the training data is faulty or biased. Bias is also it's own category of problems. Or it will just not really fit the prompt, just look like it might. I have even seen factual wrong answers from AI for trivial things. For pictures, the most common error is number of fingers, just like in the picture in the OP. The gloved hand has only 4. The character has no teeth either. Or look at the numbers of the clock. From afar it looks like it might be roman numerals, but they are wrong.
The AI we talk about is not the sum / more than the sum of its input (training data). It is a condensed version, boiled down, their essence. One could even say, a very good model would be like a super high efficiency lossy compression algorithm. One need only a good prompt and a random seed number to almost recreate almost all training inputs. Plus a lot more outputs that are similar to the input in principle and reachable by random variations.