Hm, STM has its own system to combine materials, but on the Ultra shader, at least for now, proper batching is disabled due to how it breaks vert indices for outline and dropshadow effects. (Will experiment with replacing this with index data stored on a UV channel at some point, I think there might be one UV channel free *maybe* so that would possibly allow batching if changed) If you're not using the super smooth outlines, you could try the default Universal shader instead? It's more lightweight compared to the Ultra one too, especially since I wasn't able to complete the optimization pass on the Ultra shader I tried earlier this year. (More on how to still get the outline effect with this soon...)
The material combiner system reduces draw calls if two meshes share the same basic text settings. So same font, same material, etc. Effects like basic color tags do not matter, as those are applied with vertex colour, but the"texture" color tag will make a new material. From your last screenshot, this seems to be the case already, though... So I'm not entirely sure why each component is causing a new draw call. A culling script is a great idea though regardless! (You can also try disabling the STM component but leaving the meshrenderer & meshfilter on, that could potentially give a performance boost, but I have not tested this)
STMMaskableGraphic may help, all it does is let STM behave as a "MaskableGraphic" component properly, so that could be the cause of the Z issue here, potentially. I'm not sure what would have made it not work a month ago, so it's worth another shot! That should hopefully enforce some rendering order features. Additionally!!With STMMaskableGraphic, you can use a basic shader like the default Universal one, and then use Unity's "outline" component to render an outline that way. I also did work with LeTai's assets to get their asset "True Shadow" working together with STM, which can produce a batched, smooth outline if configured properly. https://assetstore.unity.com/packages/tools/gui/true-shadow-ui-soft-shadow-and-g...
Another small thing to try is, is anything different if you nudge the text forwards a bit? There could just be an issue with the Z buffer somewhere, so if the text and background are on the same layer, I could imagine it getting scrambled together with VR rendering, somehow. (The text is *supposed* to push itself forward a bit with effects like this, but I can see VR as being an edge-case I didn't account for. There's a manual "ZDepth" value in the Ultra shader that might give a better result if adjusted, I can send shader variants soon but I think the Universal shader should be tried first)