Hello,
Our localization expert wants us to go further with our text handling approach on our new project. He's been reading a PDF on the 'best practices for game localization' (https://cdn.ymaws.com/www.igda.org/resource/collection/2DA60D94-0F74-46B1-A9E2-F...), but more precisely, about how to handle line-breaks and word-wrapping (page 16).
We've had issues in the past dealing with word-wrapping, as in, "Where do we insert a line-break when dealing with Japanese sentences as they do not make use of spaces or word separators". In order to deal with this issue, we are currently using explicit line-breaks (\r), as well as Zero-Width Spaces (U+200B). While this isn't perfect, this seems to be our best solution so far; translators can decide when and where words and sentences should break. I believe there are rules for when words should be cut (i.e. https://www.w3.org/International/articles/typography/linebreak), and plausibly it could be automated, but we're definitely not linguistic experts.
Another language that we need to support is French, which grammatically, requires the use of hyphens when breaking words. I am aware of the 'Insert Hyphen' functionality, which works fine by default. But if we enable this option, while using our Zero-Width Spaces strategy to explicitly pick out when to trigger line breaks, no hyphen will show up.
I believe this is due to the fact that Zero-Width Spaces are just grouped up in the 'linebreakFriendlyChars[]' array, and clearly it wouldn't make sense for spaces, tabs or hyphens-based line-breaks to add additional hyphens, heck, under certain circumstances that'd probably end up creating an infinite loop. I took a quick peek at RebuildTextInfo(), and figured out where the hyphens are added under normal circumstances, but I wasn't comfortable enough (yet) to change this up on my own, and other users might benefit from this.
Here's an example of the issue. I've added in black lines to illustrate where the Zero-Width Spaces characters are located at.
Current Behavior: This is what happens when 'Insert Hyphens' is enabled and we make use of Zero-Width Spaces to determine word breaks. You can see how it cuts at just the right places based on the text box's size.
Desired Behavior: This is what we'd like to be able to achieve. Have STM add hyphens when line-breaking a word at a Zero-Width Space.
I believe our desired behavior should be toggle-able as it isn't standard across all languages. We definitely wouldn't want hyphen when dealing with Japanese for example. AFAIK, this should be safe enough to be directly embedded into the "Insert Hyphen" toggle.