ios - Some emojis has a length of 3? (digits) -


i trying implement way of counting number of emojis in nsstring. have found way works emojis, struggling emojis, seems defined in different way others.

for example hot beverage icon has unicode hex of u+2615 (codepoint 9749), zero digit has unicode hex of u+0030 u+20e3 (codepoint 3154147).

i using nsstring category determine number of emojis:

@implementation nsstring (emojis)  - (bool)isemoji {     const unichar high = [self characteratindex: 0];      // surrogate pair (u+1d000-1f77f)     if (0xd800 <= high && high <= 0xdbff)     {         const unichar low = [self characteratindex: 1];         const int codepoint = ((high - 0xd800) * 0x400) + (low - 0xdc00) + 0x10000;          return (0x1d000 <= codepoint && codepoint <= 0x1f77f);     }     else // not surrogate pair (u+2100-27bf)     {         return (0x2100 <= high && high <= 0x27bf);     } }  - (nsuinteger)numbersofemojis {     nsuinteger __block emojicount = 0;     [self enumeratesubstringsinrange:nsmakerange(0, [self length])                              options:nsstringenumerationbycomposedcharactersequences                           usingblock: ^(nsstring* substring, nsrange substringrange, nsrange enclosingrange, bool* stop) {                               if ([substring isemoji])                               {                                   emojicount++;                               }                           }];      return emojicount; } @end 

most emojis has length of 2 works finde in algorithm because of high , low unicodes, digit has length of 3 , high unicode not match range of surrogate pair (0xd800 <= high && high <= 0xdbff).

i can't find documentation describes ranges type of emoji. there way of handling type of emojis?

what called “keycap digit 0 emoji” on page cited not emoji @ (though used in emoji-like manner) 2 unicode characters, common digit 0 (u+0030) , u+20e3 combining enclosing keycap, combining mark.

a combining mark u+20e3 can used after character produce symbols keycap 0, 0⃣, or keycap $, $⃣ (as can see, these won’t work in contexts, due font problems).

such combination 2 characters, technical meaning of “character” element of coded character set. if want count 1 symbol, need define , implement logic yourself. note there large number of combining marks in unicode, , there infinite number of combinations of characters , combining marks (since can use combining marks in succession).


Comments

Popular posts from this blog

java - Oracle EBS .ClassNotFoundException: oracle.apps.fnd.formsClient.FormsLauncher.class ERROR -

c# - how to use buttonedit in devexpress gridcontrol -

How do you convert a timestamp into a datetime in python with the correct timezone? -