Super Text Mesh

Text for Unity, Super-Powered! · By KaiClavier

Add to collection

Unity Asset Community Devlog

Super Text Mesh community

Feature Request: Optional hyphen support for explicit word-splitting

A topic by Natulyre created Jul 18, 2019 Views: 368 Replies: 3

Viewing posts 1 to 4

Natulyre5 years ago (1 edit)

Hello,

Our localization expert wants us to go further with our text handling approach on our new project. He's been reading a PDF on the 'best practices for game localization' (https://cdn.ymaws.com/www.igda.org/resource/collection/2DA60D94-0F74-46B1-A9E2-F...), but more precisely, about how to handle line-breaks and word-wrapping (page 16).

We've had issues in the past dealing with word-wrapping, as in, "Where do we insert a line-break when dealing with Japanese sentences as they do not make use of spaces or word separators". In order to deal with this issue, we are currently using explicit line-breaks (\r), as well as Zero-Width Spaces (U+200B). While this isn't perfect, this seems to be our best solution so far; translators can decide when and where words and sentences should break. I believe there are rules for when words should be cut (i.e. https://www.w3.org/International/articles/typography/linebreak), and plausibly it could be automated, but we're definitely not linguistic experts.

Another language that we need to support is French, which grammatically, requires the use of hyphens when breaking words. I am aware of the 'Insert Hyphen' functionality, which works fine by default. But if we enable this option, while using our Zero-Width Spaces strategy to explicitly pick out when to trigger line breaks, no hyphen will show up.

I believe this is due to the fact that Zero-Width Spaces are just grouped up in the 'linebreakFriendlyChars[]' array, and clearly it wouldn't make sense for spaces, tabs or hyphens-based line-breaks to add additional hyphens, heck, under certain circumstances that'd probably end up creating an infinite loop. I took a quick peek at RebuildTextInfo(), and figured out where the hyphens are added under normal circumstances, but I wasn't comfortable enough (yet) to change this up on my own, and other users might benefit from this.

Here's an example of the issue. I've added in black lines to illustrate where the Zero-Width Spaces characters are located at.

Current Behavior: This is what happens when 'Insert Hyphens' is enabled and we make use of Zero-Width Spaces to determine word breaks. You can see how it cuts at just the right places based on the text box's size.

Desired Behavior: This is what we'd like to be able to achieve. Have STM add hyphens when line-breaking a word at a Zero-Width Space.

I believe our desired behavior should be toggle-able as it isn't standard across all languages. We definitely wouldn't want hyphen when dealing with Japanese for example. AFAIK, this should be safe enough to be directly embedded into the "Insert Hyphen" toggle.

KaiClavierDeveloper5 years ago (1 edit)

Hey!

I actually read that article when designing Fleece! I wish I came across it when making STM but this will prove very useful.

These are all really great ideas, and better hyphenation options have been something on the to-do list for a long time. I think a tag for a zero-width space is a good idea for a built-in feature. (Just like the <br> tag that inserts a \n... could be the lengthy <hlb> like the article suggests...) I believe you might already be able to type "\u200B" but this is very long... I do have an example a script in there somewhere for extending STM with custom tags, so replacing <custom> with "another string" is always doable! I like the article's example of using "^".

I'll take a look at the source and see why those hyphens aren't inserting!

KaiClavierDeveloper5 years ago (3 edits)

Okay, I think I got this working properly. Send me an email with your invoice no. (my address is on my website), and I'll send you a build to test! I also added a unicode tag to the build so <u=200B> is possible.

Natulyre5 years ago

I've tried out the test build in a few different scenarios and it seems to be working great, the hyphens behaves as expected!

The newly added unicode tag is quite enjoyable. I wasn't convinced at first, but it turns out to be explicit, yet clean enough (as in easily distinguished within a sentence), to be of use. We'll probably end up extending STM with a few custom tags/strings for our most commonly used 'special' characters, as per your suggestion.

Hyphen compatibility was our main issue right now, and that's clearly been resolved. Thanks a bunch!

itch.io

Super Text Mesh

Feature Request: Optional hyphen support for explicit word-splitting