ios - Some emojis has a length of 3? (digits) -
i trying implement way of counting number of emojis in nsstring
. have found way works emojis, struggling emojis, seems defined in different way others.
for example hot beverage icon has unicode hex of u+2615
(codepoint 9749), zero digit has unicode hex of u+0030 u+20e3
(codepoint 3154147).
i using nsstring
category determine number of emojis:
@implementation nsstring (emojis) - (bool)isemoji { const unichar high = [self characteratindex: 0]; // surrogate pair (u+1d000-1f77f) if (0xd800 <= high && high <= 0xdbff) { const unichar low = [self characteratindex: 1]; const int codepoint = ((high - 0xd800) * 0x400) + (low - 0xdc00) + 0x10000; return (0x1d000 <= codepoint && codepoint <= 0x1f77f); } else // not surrogate pair (u+2100-27bf) { return (0x2100 <= high && high <= 0x27bf); } } - (nsuinteger)numbersofemojis { nsuinteger __block emojicount = 0; [self enumeratesubstringsinrange:nsmakerange(0, [self length]) options:nsstringenumerationbycomposedcharactersequences usingblock: ^(nsstring* substring, nsrange substringrange, nsrange enclosingrange, bool* stop) { if ([substring isemoji]) { emojicount++; } }]; return emojicount; } @end
most emojis has length of 2 works finde in algorithm because of high
, low
unicodes, digit has length of 3 , high
unicode not match range of surrogate pair (0xd800 <= high && high <= 0xdbff
).
i can't find documentation describes ranges type of emoji. there way of handling type of emojis?
what called “keycap digit 0 emoji” on page cited not emoji @ (though used in emoji-like manner) 2 unicode characters, common digit 0 (u+0030) , u+20e3 combining enclosing keycap, combining mark.
a combining mark u+20e3 can used after character produce symbols keycap 0, 0⃣, or keycap $, $⃣ (as can see, these won’t work in contexts, due font problems).
such combination 2 characters, technical meaning of “character” element of coded character set. if want count 1 symbol, need define , implement logic yourself. note there large number of combining marks in unicode, , there infinite number of combinations of characters , combining marks (since can use combining marks in succession).
Comments
Post a Comment