Indie game storeFree gamesFun gamesHorror games
Game developmentAssetsComics
SalesBundles
Jobs
Tags

Emoji crashes IL2CPP builds

A topic by mpartel created Aug 03, 2021 Views: 718 Replies: 10
Viewing posts 1 to 11

If I add a smiley like https://www.fileformat.info/info/unicode/char/1f601/index.htm to a SuperTextMesh, I get the following

Error: UTF-16 to UTF-8 conversion failed because the input string is invalid
UnityEngine.StackTraceUtility:ExtractStackTrace ()
SuperTextMesh:FigureOutUnwrappedLimits (UnityEngine.Vector3) (at Assets/Clavian/SuperTextMesh/Scripts/SuperTextMesh.cs:2951)
SuperTextMesh:RebuildTextInfo () (at Assets/Clavian/SuperTextMesh/Scripts/SuperTextMesh.cs:3239)
SuperTextMesh:Rebuild (single,bool,bool) (at Assets/Clavian/SuperTextMesh/Scripts/SuperTextMesh.cs:1572)
SuperTextMesh:Rebuild (single,bool) (at Assets/Clavian/SuperTextMesh/Scripts/SuperTextMesh.cs:1521)
SuperTextMesh:Rebuild () (at Assets/Clavian/SuperTextMesh/Scripts/SuperTextMesh.cs:1511)
SuperTextMesh:set_text (string) (at Assets/Clavian/SuperTextMesh/Scripts/SuperTextMesh.cs:774)
EmojiTest:<Start>b__2_0 () (at Assets/EmojiTest.cs:10)
UnityEngine.EventSystems.EventSystem:Update ()

and

Error: UTF-16 to UTF-8 conversion failed because the input string is invalid
UnityEngine.StackTraceUtility:ExtractStackTrace ()
SuperTextMesh:RebuildTextInfo () (at Assets/Clavian/SuperTextMesh/Scripts/SuperTextMesh.cs:3289)
SuperTextMesh:Rebuild (single,bool,bool) (at Assets/Clavian/SuperTextMesh/Scripts/SuperTextMesh.cs:1572)
SuperTextMesh:Rebuild (single,bool) (at Assets/Clavian/SuperTextMesh/Scripts/SuperTextMesh.cs:1521)
SuperTextMesh:Rebuild () (at Assets/Clavian/SuperTextMesh/Scripts/SuperTextMesh.cs:1511)
SuperTextMesh:set_text (string) (at Assets/Clavian/SuperTextMesh/Scripts/SuperTextMesh.cs:774)
EmojiTest:<Start>b__2_0 () (at Assets/EmojiTest.cs:10)
UnityEngine.EventSystems.EventSystem:Update ()

If I do the same in an IL2CPP build, it segfaults in some C++ code that tries to convert utf16 to utf8.

This is especially problematic when the text is user-defined.

Developer

Hey,


I never added emoji support to STM, but you can use quads to render emoji. Basically, to support it I'd have to make it so emoji are swapped out for quads before rendering anyway, since the dynamic texture code I'm using doesn't support Unicode. (That's the error you're getting, I believe) but there's just so many emoji, no 100% free emoji sheets I could bundle with STM as an example, and they constantly keep adding new ones, so I never added it as a feature.

Do please check out quads, though! You can get a lot more style and personality out of them, in my opinion! Plus add whatever you want, even if it's not emoji!

Oh, not supporting emojis is understandable. I think it still shouldn’t crash like that, and it’d be nice if it rendered unsupported characters (even when composed of surrogates) as “<?>” or something.

I already happily use quads for non-user-defined text, and I’m working on restricting/filtering user-defined text now. It’s a little unfortunate to have to ban all emojis from usernames and in-game chat. Even only supporting the most common emojis, and an API for querying whether a given string is fully supported, would IMO be quite useful.

Developer

Yeahh, I 100% understand where you're coming from, a friend/user tried to add emoji support a few years ago unsuccessfully. It's a strange limitation I'd like to work around. I'll look into seeing if forbidding emoji explicitly is possible. Detecting emoji is a bit strange since a single emoji is actually composed of several unicode characters from what I remember from the last time I looked into this.


Just as some future notes for myself, I found some stuff here:

https://stackoverflow.com/questions/28023682/how-do-i-remove-emoji-characters-fr...

I made STM with dialogue text in mind, so there's not a lot in terms of features for text entry. It's something I've been wanting to flesh out for a while, but I figure Unity's built-in UI text is fine for text entry, since it won't need any special formatting and has everything configured already. I would like to get STM working in the same way, though... but I can't really make any time promises unfortunately, sorry.

So hmm... maybe that above code could be applied to text before it's sent to STM anyway...? I'll experiment with it soon!

Developer (2 edits)


It's a cheap solution, but I found some regex to replace all emoji before it's printed by STM with □. One emoji got turned into that separator there but I guess that's better than crashing Unity. I might just have it replace it with nothing, instead? Because I'm not allowed to include any emoji sets with STM since they all require attribution (section 1.2.i of the Asset Store submission guidelines) , I think... the best way to add emoji support would just be to go through whatever string you want to send to STM, and replace whatever emoji with whatever quad. It *is* possible to pragmatically generate/edit a quad, so maybe that could be used??

I’m not sure exactly what causes the crash, but I think you might want to replace all surrogate chars. They should come in high+low pairs, but there could be strays and other combinations in malicious input.

Could the licensing issue could be solved by linking to an emoji pack and supporting whatever download they have with as little friction as possible 🤔

Developer (1 edit)

Yeahh going through it that way might be a more fool-proof way to get all unsupported characters, or potentially find the unicode I need to replace...

I messaged my friend that tried to write emoji support a few years ago, so we'll see what happens. It sounds like there was some kind of roadblock.

And yeah, hmm... I could tell people to download this database: https://github.com/twitter/twemoji/tree/master/assets/72x72

And then have some code that automatically uses the names of these files to put together a dictionary, and then... figure it out from there. I'd want to write some script to join all of these into one image, too. But first it'll come down to converting those surrogate pairs to the proper quad...


EDIT:

I'm trying to run some code where the goal is...

take "👨🏽‍🌾" as input, and spit out "1F468-1F3FD-200D-1F33E" so that it can be matched with the library I linked. 

Just running into two issues...

1) having trouble determining where the actual emoji begins and ends... Checking if a character is a "low surrogate" always seems to return true, so I need to figure out the right way to determine what makes up one character...


2) I'm able to get the emojis in that above string, but not the zero-width space, "200D". Trying to figure out why... 

Anyway that's what I've got for today! But still, no promises on this as a feature...

Developer (5 edits)

Here's today's progress: (pastebin won't let me upload this so... forgive the bad formatting. Maybe this link won't be broken by the time you see this: https://pastebin.com/kxsG8B2e)


using UnityEngine; using System.Collections; using System.Collections.Generic; using System.Text; using System; using System.Text.RegularExpressions;  //code and ideas stolen from here for now (sorry):   //<a href="<a href="https://stackoverflow.com/questions/44728740/how-to-convert-emoticons-to-its-utf-32-escaped-unicode">https://stackoverflow.com/questions/44728740/how-to-convert-emoticons-to-its-utf-32-escaped-unicode</a>"><a href="https://stackoverflow.com/questions/44728740/how-to-convert-emoticons-to-its-utf-32-escaped-unicode</a>">https://stackoverflow.com/questions/44728740/how-to-convert-emoticons-to-its-utf-32-escaped-unicode</a></a>  //<a href="<a href="https://stackoverflow.com/questions/55082644/c-sharp-regular-expression-to-find-a-surrogate-pair-of-a-unicode-codepoint-fro">https://stackoverflow.com/questions/55082644/c-sharp-regular-expression-to-find-a-surrogate-pair-of-a-unicode-codepoint-fro</a>"><a href="https://stackoverflow.com/questions/55082644/c-sharp-regular-expression-to-find-a-surrogate-pair-of-a-unicode-codepoint-fro</a>">https://stackoverflow.com/questions/55082644/c-sharp-regular-expression-to-find-a-surrogate-pair-of-a-unicode-codepoint-fro</a></a>  public class STMEmoji : MonoBehaviour  {      //U+1F004 is mahjong tile     private string text = "wow🀄... 👨🏽‍🌾";      void Start ()      {         DoEmoji();     }      void DoEmoji()     {         string x = text;          MatchCollection emojiMatch = Regex.Matches(x, @"[#*0-9]\uFE0F\u20E3|[\u00A9\u00AE\u203C\u2049\u2122\u2139\u2194-\u2199\u21A9\u21AA\u231A\u231B\u2328\u23CF\u23E9-\u23F3\u23F8-\u23FA\u24C2\u25AA\u25AB\u25B6\u25C0\u25FB-\u25FE\u2600-\u2604\u260E\u2611\u2614\u2615\u2618]|\u261D(?:\uD83C[\uDFFB-\uDFFF])?|[\u2620\u2622\u2623\u2626\u262A\u262E\u262F\u2638-\u263A\u2640\u2642\u2648-\u2653\u265F\u2660\u2663\u2665\u2666\u2668\u267B\u267E\u267F\u2692-\u2697\u2699\u269B\u269C\u26A0\u26A1\u26AA\u26AB\u26B0\u26B1\u26BD\u26BE\u26C4\u26C5\u26C8\u26CE\u26CF\u26D1\u26D3\u26D4\u26E9\u26EA\u26F0-\u26F5\u26F7\u26F8]|\u26F9(?:\uD83C[\uDFFB-\uDFFF](?:\u200D[\u2640\u2642]\uFE0F)?|\uFE0F\u200D[\u2640\u2642]\uFE0F)?|[\u26FA\u26FD\u2702\u2705\u2708\u2709]|[\u270A-\u270D](?:\uD83C[\uDFFB-\uDFFF])?|[\u270F\u2712\u2714\u2716\u271D\u2721\u2728\u2733\u2734\u2744\u2747\u274C\u274E\u2753-\u2755\u2757\u2763\u2764\u2795-\u2797\u27A1\u27B0\u27BF\u2934\u2935\u2B05-\u2B07\u2B1B\u2B1C\u2B50\u2B55\u3030\u303D\u3297\u3299]|\uD83C(?:[\uDC04\uDCCF\uDD70\uDD71\uDD7E\uDD7F\uDD8E\uDD91-\uDD9A]|\uDDE6\uD83C[\uDDE8-\uDDEC\uDDEE\uDDF1\uDDF2\uDDF4\uDDF6-\uDDFA\uDDFC\uDDFD\uDDFF]|\uDDE7\uD83C[\uDDE6\uDDE7\uDDE9-\uDDEF\uDDF1-\uDDF4\uDDF6-\uDDF9\uDDFB\uDDFC\uDDFE\uDDFF]|\uDDE8\uD83C[\uDDE6\uDDE8\uDDE9\uDDEB-\uDDEE\uDDF0-\uDDF5\uDDF7\uDDFA-\uDDFF]|\uDDE9\uD83C[\uDDEA\uDDEC\uDDEF\uDDF0\uDDF2\uDDF4\uDDFF]|\uDDEA\uD83C[\uDDE6\uDDE8\uDDEA\uDDEC\uDDED\uDDF7-\uDDFA]|\uDDEB\uD83C[\uDDEE-\uDDF0\uDDF2\uDDF4\uDDF7]|\uDDEC\uD83C[\uDDE6\uDDE7\uDDE9-\uDDEE\uDDF1-\uDDF3\uDDF5-\uDDFA\uDDFC\uDDFE]|\uDDED\uD83C[\uDDF0\uDDF2\uDDF3\uDDF7\uDDF9\uDDFA]|\uDDEE\uD83C[\uDDE8-\uDDEA\uDDF1-\uDDF4\uDDF6-\uDDF9]|\uDDEF\uD83C[\uDDEA\uDDF2\uDDF4\uDDF5]|\uDDF0\uD83C[\uDDEA\uDDEC-\uDDEE\uDDF2\uDDF3\uDDF5\uDDF7\uDDFC\uDDFE\uDDFF]|\uDDF1\uD83C[\uDDE6-\uDDE8\uDDEE\uDDF0\uDDF7-\uDDFB\uDDFE]|\uDDF2\uD83C[\uDDE6\uDDE8-\uDDED\uDDF0-\uDDFF]|\uDDF3\uD83C[\uDDE6\uDDE8\uDDEA-\uDDEC\uDDEE\uDDF1\uDDF4\uDDF5\uDDF7\uDDFA\uDDFF]|\uDDF4\uD83C\uDDF2|\uDDF5\uD83C[\uDDE6\uDDEA-\uDDED\uDDF0-\uDDF3\uDDF7-\uDDF9\uDDFC\uDDFE]|\uDDF6\uD83C\uDDE6|\uDDF7\uD83C[\uDDEA\uDDF4\uDDF8\uDDFA\uDDFC]|\uDDF8\uD83C[\uDDE6-\uDDEA\uDDEC-\uDDF4\uDDF7-\uDDF9\uDDFB\uDDFD-\uDDFF]|\uDDF9\uD83C[\uDDE6\uDDE8\uDDE9\uDDEB-\uDDED\uDDEF-\uDDF4\uDDF7\uDDF9\uDDFB\uDDFC\uDDFF]|\uDDFA\uD83C[\uDDE6\uDDEC\uDDF2\uDDF3\uDDF8\uDDFE\uDDFF]|\uDDFB\uD83C[\uDDE6\uDDE8\uDDEA\uDDEC\uDDEE\uDDF3\uDDFA]|\uDDFC\uD83C[\uDDEB\uDDF8]|\uDDFD\uD83C\uDDF0|\uDDFE\uD83C[\uDDEA\uDDF9]|\uDDFF\uD83C[\uDDE6\uDDF2\uDDFC]|[\uDE01\uDE02\uDE1A\uDE2F\uDE32-\uDE3A\uDE50\uDE51\uDF00-\uDF21\uDF24-\uDF84]|\uDF85(?:\uD83C[\uDFFB-\uDFFF])?|[\uDF86-\uDF93\uDF96\uDF97\uDF99-\uDF9B\uDF9E-\uDFC1]|\uDFC2(?:\uD83C[\uDFFB-\uDFFF])?|[\uDFC3\uDFC4](?:\u200D[\u2640\u2642]\uFE0F|\uD83C[\uDFFB-\uDFFF](?:\u200D[\u2640\u2642]\uFE0F)?)?|[\uDFC5\uDFC6]|\uDFC7(?:\uD83C[\uDFFB-\uDFFF])?|[\uDFC8\uDFC9]|\uDFCA(?:\u200D[\u2640\u2642]\uFE0F|\uD83C[\uDFFB-\uDFFF](?:\u200D[\u2640\u2642]\uFE0F)?)?|[\uDFCB\uDFCC](?:\uD83C[\uDFFB-\uDFFF](?:\u200D[\u2640\u2642]\uFE0F)?|\uFE0F\u200D[\u2640\u2642]\uFE0F)?|[\uDFCD-\uDFF0]|\uDFF3(?:\uFE0F\u200D\uD83C\uDF08)?|\uDFF4(?:\u200D\u2620\uFE0F|\uDB40\uDC67\uDB40\uDC62\uDB40(?:\uDC65\uDB40\uDC6E\uDB40\uDC67|\uDC73\uDB40\uDC63\uDB40\uDC74|\uDC77\uDB40\uDC6C\uDB40\uDC73)\uDB40\uDC7F)?|[\uDFF5\uDFF7-\uDFFF])|\uD83D(?:[\uDC00-\uDC14]|\uDC15(?:\u200D\uD83E\uDDBA)?|[\uDC16-\uDC40]|\uDC41(?:\uFE0F\u200D\uD83D\uDDE8\uFE0F)?|[\uDC42\uDC43](?:\uD83C[\uDFFB-\uDFFF])?|[\uDC44\uDC45]|[\uDC46-\uDC50](?:\uD83C[\uDFFB-\uDFFF])?|[\uDC51-\uDC65]|[\uDC66\uDC67](?:\uD83C[\uDFFB-\uDFFF])?|\uDC68(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F|\u2764\uFE0F\u200D\uD83D(?:\uDC8B\u200D\uD83D)?\uDC68|\uD83C[\uDF3E\uDF73\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D(?:\uDC66(?:\u200D\uD83D\uDC66)?|\uDC67(?:\u200D\uD83D[\uDC66\uDC67])?|[\uDC68\uDC69]\u200D\uD83D(?:\uDC66(?:\u200D\uD83D\uDC66)?|\uDC67(?:\u200D\uD83D[\uDC66\uDC67])?)|[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92])|\uD83E[\uDDAF-\uDDB3\uDDBC\uDDBD])|\uD83C(?:\uDFFB(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F|\uD83C[\uDF3E\uDF73\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E[\uDDAF-\uDDB3\uDDBC\uDDBD]))?|\uDFFC(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F|\uD83C[\uDF3E\uDF73\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:\uDD1D\u200D\uD83D\uDC68\uD83C\uDFFB|[\uDDAF-\uDDB3\uDDBC\uDDBD])))?|\uDFFD(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F|\uD83C[\uDF3E\uDF73\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:\uDD1D\u200D\uD83D\uDC68\uD83C[\uDFFB\uDFFC]|[\uDDAF-\uDDB3\uDDBC\uDDBD])))?|\uDFFE(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F|\uD83C[\uDF3E\uDF73\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:\uDD1D\u200D\uD83D\uDC68\uD83C[\uDFFB-\uDFFD]|[\uDDAF-\uDDB3\uDDBC\uDDBD])))?|\uDFFF(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F|\uD83C[\uDF3E\uDF73\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:\uDD1D\u200D\uD83D\uDC68\uD83C[\uDFFB-\uDFFE]|[\uDDAF-\uDDB3\uDDBC\uDDBD])))?))?|\uDC69(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F|\u2764\uFE0F\u200D\uD83D(?:\uDC8B\u200D\uD83D)?[\uDC68\uDC69]|\uD83C[\uDF3E\uDF73\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D(?:\uDC66(?:\u200D\uD83D\uDC66)?|\uDC67(?:\u200D\uD83D[\uDC66\uDC67])?|\uDC69\u200D\uD83D(?:\uDC66(?:\u200D\uD83D\uDC66)?|\uDC67(?:\u200D\uD83D[\uDC66\uDC67])?)|[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92])|\uD83E[\uDDAF-\uDDB3\uDDBC\uDDBD])|\uD83C(?:\uDFFB(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F|\uD83C[\uDF3E\uDF73\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:\uDD1D\u200D\uD83D\uDC68\uD83C[\uDFFC-\uDFFF]|[\uDDAF-\uDDB3\uDDBC\uDDBD])))?|\uDFFC(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F|\uD83C[\uDF3E\uDF73\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:\uDD1D\u200D\uD83D(?:\uDC68\uD83C[\uDFFB\uDFFD-\uDFFF]|\uDC69\uD83C\uDFFB)|[\uDDAF-\uDDB3\uDDBC\uDDBD])))?|\uDFFD(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F|\uD83C[\uDF3E\uDF73\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:\uDD1D\u200D\uD83D(?:\uDC68\uD83C[\uDFFB\uDFFC\uDFFE\uDFFF]|\uDC69\uD83C[\uDFFB\uDFFC])|[\uDDAF-\uDDB3\uDDBC\uDDBD])))?|\uDFFE(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F|\uD83C[\uDF3E\uDF73\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:\uDD1D\u200D\uD83D(?:\uDC68\uD83C[\uDFFB-\uDFFD\uDFFF]|\uDC69\uD83C[\uDFFB-\uDFFD])|[\uDDAF-\uDDB3\uDDBC\uDDBD])))?|\uDFFF(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F|\uD83C[\uDF3E\uDF73\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:\uDD1D\u200D\uD83D[\uDC68\uDC69]\uD83C[\uDFFB-\uDFFE]|[\uDDAF-\uDDB3\uDDBC\uDDBD])))?))?|\uDC6A|[\uDC6B-\uDC6D](?:\uD83C[\uDFFB-\uDFFF])?|\uDC6E(?:\u200D[\u2640\u2642]\uFE0F|\uD83C[\uDFFB-\uDFFF](?:\u200D[\u2640\u2642]\uFE0F)?)?|\uDC6F(?:\u200D[\u2640\u2642]\uFE0F)?|\uDC70(?:\uD83C[\uDFFB-\uDFFF])?|\uDC71(?:\u200D[\u2640\u2642]\uFE0F|\uD83C[\uDFFB-\uDFFF](?:\u200D[\u2640\u2642]\uFE0F)?)?|\uDC72(?:\uD83C[\uDFFB-\uDFFF])?|\uDC73(?:\u200D[\u2640\u2642]\uFE0F|\uD83C[\uDFFB-\uDFFF](?:\u200D[\u2640\u2642]\uFE0F)?)?|[\uDC74-\uDC76](?:\uD83C[\uDFFB-\uDFFF])?|\uDC77(?:\u200D[\u2640\u2642]\uFE0F|\uD83C[\uDFFB-\uDFFF](?:\u200D[\u2640\u2642]\uFE0F)?)?|\uDC78(?:\uD83C[\uDFFB-\uDFFF])?|[\uDC79-\uDC7B]|\uDC7C(?:\uD83C[\uDFFB-\uDFFF])?|[\uDC7D-\uDC80]|[\uDC81\uDC82](?:\u200D[\u2640\u2642]\uFE0F|\uD83C[\uDFFB-\uDFFF](?:\u200D[\u2640\u2642]\uFE0F)?)?|\uDC83(?:\uD83C[\uDFFB-\uDFFF])?|\uDC84|\uDC85(?:\uD83C[\uDFFB-\uDFFF])?|[\uDC86\uDC87](?:\u200D[\u2640\u2642]\uFE0F|\uD83C[\uDFFB-\uDFFF](?:\u200D[\u2640\u2642]\uFE0F)?)?|[\uDC88-\uDCA9]|\uDCAA(?:\uD83C[\uDFFB-\uDFFF])?|[\uDCAB-\uDCFD\uDCFF-\uDD3D\uDD49-\uDD4E\uDD50-\uDD67\uDD6F\uDD70\uDD73]|\uDD74(?:\uD83C[\uDFFB-\uDFFF])?|\uDD75(?:\uD83C[\uDFFB-\uDFFF](?:\u200D[\u2640\u2642]\uFE0F)?|\uFE0F\u200D[\u2640\u2642]\uFE0F)?|[\uDD76-\uDD79]|\uDD7A(?:\uD83C[\uDFFB-\uDFFF])?|[\uDD87\uDD8A-\uDD8D]|[\uDD90\uDD95\uDD96](?:\uD83C[\uDFFB-\uDFFF])?|[\uDDA4\uDDA5\uDDA8\uDDB1\uDDB2\uDDBC\uDDC2-\uDDC4\uDDD1-\uDDD3\uDDDC-\uDDDE\uDDE1\uDDE3\uDDE8\uDDEF\uDDF3\uDDFA-\uDE44]|[\uDE45-\uDE47](?:\u200D[\u2640\u2642]\uFE0F|\uD83C[\uDFFB-\uDFFF](?:\u200D[\u2640\u2642]\uFE0F)?)?|[\uDE48-\uDE4A]|\uDE4B(?:\u200D[\u2640\u2642]\uFE0F|\uD83C[\uDFFB-\uDFFF](?:\u200D[\u2640\u2642]\uFE0F)?)?|\uDE4C(?:\uD83C[\uDFFB-\uDFFF])?|[\uDE4D\uDE4E](?:\u200D[\u2640\u2642]\uFE0F|\uD83C[\uDFFB-\uDFFF](?:\u200D[\u2640\u2642]\uFE0F)?)?|\uDE4F(?:\uD83C[\uDFFB-\uDFFF])?|[\uDE80-\uDEA2]|\uDEA3(?:\u200D[\u2640\u2642]\uFE0F|\uD83C[\uDFFB-\uDFFF](?:\u200D[\u2640\u2642]\uFE0F)?)?|[\uDEA4-\uDEB3]|[\uDEB4-\uDEB6](?:\u200D[\u2640\u2642]\uFE0F|\uD83C[\uDFFB-\uDFFF](?:\u200D[\u2640\u2642]\uFE0F)?)?|[\uDEB7-\uDEBF]|\uDEC0(?:\uD83C[\uDFFB-\uDFFF])?|[\uDEC1-\uDEC5\uDECB]|\uDECC(?:\uD83C[\uDFFB-\uDFFF])?|[\uDECD-\uDED2\uDED5\uDEE0-\uDEE5\uDEE9\uDEEB\uDEEC\uDEF0\uDEF3-\uDEFA\uDFE0-\uDFEB])|\uD83E(?:[\uDD0D\uDD0E]|\uDD0F(?:\uD83C[\uDFFB-\uDFFF])?|[\uDD10-\uDD17]|[\uDD18-\uDD1C](?:\uD83C[\uDFFB-\uDFFF])?|\uDD1D|[\uDD1E\uDD1F](?:\uD83C[\uDFFB-\uDFFF])?|[\uDD20-\uDD25]|\uDD26(?:\u200D[\u2640\u2642]\uFE0F|\uD83C[\uDFFB-\uDFFF](?:\u200D[\u2640\u2642]\uFE0F)?)?|[\uDD27-\uDD2F]|[\uDD30-\uDD36](?:\uD83C[\uDFFB-\uDFFF])?|\uDD37(?:\u200D[\u2640\u2642]\uFE0F|\uD83C[\uDFFB-\uDFFF](?:\u200D[\u2640\u2642]\uFE0F)?)?|[\uDD38\uDD39](?:\u200D[\u2640\u2642]\uFE0F|\uD83C[\uDFFB-\uDFFF](?:\u200D[\u2640\u2642]\uFE0F)?)?|\uDD3A|\uDD3C(?:\u200D[\u2640\u2642]\uFE0F)?|[\uDD3D\uDD3E](?:\u200D[\u2640\u2642]\uFE0F|\uD83C[\uDFFB-\uDFFF](?:\u200D[\u2640\u2642]\uFE0F)?)?|[\uDD3F-\uDD45\uDD47-\uDD71\uDD73-\uDD76\uDD7A-\uDDA2\uDDA5-\uDDAA\uDDAE-\uDDB4]|[\uDDB5\uDDB6](?:\uD83C[\uDFFB-\uDFFF])?|\uDDB7|[\uDDB8\uDDB9](?:\u200D[\u2640\u2642]\uFE0F|\uD83C[\uDFFB-\uDFFF](?:\u200D[\u2640\u2642]\uFE0F)?)?|\uDDBA|\uDDBB(?:\uD83C[\uDFFB-\uDFFF])?|[\uDDBC-\uDDCA]|[\uDDCD-\uDDCF](?:\u200D[\u2640\u2642]\uFE0F|\uD83C[\uDFFB-\uDFFF](?:\u200D[\u2640\u2642]\uFE0F)?)?|\uDDD0|\uDDD1(?:\u200D\uD83E\uDD1D\u200D\uD83E\uDDD1|\uD83C(?:\uDFFB(?:\u200D\uD83E\uDD1D\u200D\uD83E\uDDD1\uD83C\uDFFB)?|\uDFFC(?:\u200D\uD83E\uDD1D\u200D\uD83E\uDDD1\uD83C[\uDFFB\uDFFC])?|\uDFFD(?:\u200D\uD83E\uDD1D\u200D\uD83E\uDDD1\uD83C[\uDFFB-\uDFFD])?|\uDFFE(?:\u200D\uD83E\uDD1D\u200D\uD83E\uDDD1\uD83C[\uDFFB-\uDFFE])?|\uDFFF(?:\u200D\uD83E\uDD1D\u200D\uD83E\uDDD1\uD83C[\uDFFB-\uDFFF])?))?|[\uDDD2-\uDDD5](?:\uD83C[\uDFFB-\uDFFF])?|\uDDD6(?:\u200D[\u2640\u2642]\uFE0F|\uD83C[\uDFFB-\uDFFF](?:\u200D[\u2640\u2642]\uFE0F)?)?|[\uDDD7-\uDDDD](?:\u200D[\u2640\u2642]\uFE0F|\uD83C[\uDFFB-\uDFFF](?:\u200D[\u2640\u2642]\uFE0F)?)?|[\uDDDE\uDDDF](?:\u200D[\u2640\u2642]\uFE0F)?|[\uDDE0-\uDDFF\uDE70-\uDE73\uDE78-\uDE7A\uDE80-\uDE82\uDE90-\uDE95])");         for(int i=0; i<emojimatch.count; i++)="" {="" x="x.Replace(emojiMatch[i].Value," "<"="" +="" emojitobytes(emojimatch[i].value)="" "="">");         }          Debug.Log(x);     }          private List<string> v = new List<string>();          string EmojiToBytes(string x)     {         var enc = new UTF32Encoding(true, false);         var bytes = enc.GetBytes(x);                  var o = BitConverter.ToString(bytes);          //this part sucks:         o = o.Replace("00-", string.Empty);         o = o.Replace("-00", string.Empty);         o = o.Replace("-01-", "ZZZZZ");         o = o.Replace("01-", "XXXXX");         o = o.Replace("-20-", "YYYYY");         o = o.Replace("-", string.Empty);         o = o.Replace("ZZZZZ", "-1");         o = o.Replace("XXXXX", "1");         o = o.Replace("YYYYY", "-20");         return o.ToLower();     } }


I really really don't like this code. That regex used will change every time a new emoji is released, I think. And that string.Replace method I have near the bottom is nasty. But this will take a string with emoji in it, and replace the emoji with the name of the respective .png here: https://github.com/twitter/twemoji/tree/master/assets/72x72

So... all that's needed now is to download that directory, turn it into a string,texture dictionary (more ideally an index to texturesheet+index dictionary...) and then create/edit a quad to match on the fly. I'll try more later!


My friend also gave me some good regex for explicitly disallowing STM from trying to render emoji... so either way, the problem is solved and this won't cause a crash in a future build.

Developer (2 edits)

Okay, so I still feel this is a quick-and-dirty solution, but: 



I put all those .pngs inside of a folder named "Resources", then made sure all the images are imported as textures.

Then, I attach this script to the STM object I want to be able to render emoji: https://pastebin.com/qr9me3GA Hopefully pastebin properly uploads that soon...

Either way... the unity inspector itself isn't equipped to render emoji, so they will be invisible in all editor textboxes, and if you try and backspace though it, it will break surrogate heads and tails. I don't have a way around this besides saying to have your game's script be in a .txt file or just using quads. But... for player input that never has to see the unity inspector... this should work. Just make sure that if a player backspaces, to delete the entire quad tag instead of the last character in your text entry field.

I could add a method that can do this for the next update, but the process will be...


Regex match drawText "(<q=\w+?>)$"

If this match returns any result, the last thing being displayed by STM is a quad, so...

Well, the emoji are still being stored *as emoji* in STM, so I'd use that huge emoji regex to remove the last emoji in the string.

(2 edits)

Nice! Ok, that regex is truly horrifying, but if it works ¯\(ツ)

I’ve not read the code in detail, but I wonder if the regex could be built dynamically from the set of available PNGs. Basically something like string.Join('|', imageNames.OrderByDescending(n => n.Length).Select(HexToRegex).ToArray())

Developer

That regex is generated from a script somewhere. I haven't found a regex script that's any less... specific than it that doesn't delete non-emoji characters, too.

But yeah building it dynamically... I think it can be done, if the names can be converted back to the right format, shooould be doable? Maybe? I didn't find any C# generators, but maybe another language's could be modified.