"๐Ÿ‘ฉโ€๐Ÿ’ป๐ŸŽ‰".length = 7 ??? How to count emojis with Javascript

cestoliv

Olivier Cartier

Posted on November 14, 2022

"๐Ÿ‘ฉโ€๐Ÿ’ป๐ŸŽ‰".length = 7 ??? How to count emojis with Javascript

This post was originally published on my blog: cestoliv.com

In this article

  1. The problem
  2. A first solution with the spread operator
  3. A second algorithm with Zero Width Joiner
  4. The best solution (to use in production)
    1. Use it on Firefox/IE
    2. Use it with TypeScript

1. The problem

This week, a friend of mine encountered a Javascript problem when he wanted to check that his user was entering only one character in a text input.
Indeed, the first solution we think of is to look at the length of the string, but problems occur when this string contains emojis:

"a".length // => 1
"๐Ÿ›".length // => 2 ??
Enter fullscreen mode Exit fullscreen mode

Houston, we have a problem meme

In fact, it is quite logical, knowing that the .length function in Javascript returns the length of the string in UTF-16 code units, not the number of visible characters.

2. A first solution with the spread operator

The first solution I thought of was to split the string on each character and then get the number of elements:

"๐Ÿ›".split('') // => ["๏ฟฝ","๏ฟฝ"]
"๐Ÿ›".split('').length // => 2
Enter fullscreen mode Exit fullscreen mode

Ouch... Unfortunately, .split('') also splits in UTF-16 code units.

But there is another way to split a string on each character in Javascript, using the spread operator:

[..."๐Ÿ›"] // => ["๐Ÿ›"]
[..."๐Ÿ›"].length // => 1, Hooray !!
[..."๐Ÿ›๐ŸŽ‰"] // => ["๐Ÿ›", "๐ŸŽ‰"]
[..."๐Ÿ›๐ŸŽ‰"].length // => 2, Hooray !!

[..."๐Ÿ‘ฉโ€๐Ÿ’ป"] // => ["๐Ÿ‘ฉ"โ€, "\u{200D}", "๐Ÿ’ป"]
[..."๐Ÿ‘ฉโ€๐Ÿ’ป"].length // => 3, Oops...
Enter fullscreen mode Exit fullscreen mode

Damn... Still not, unfortunately for us, some emojis are composed of several emojis, separated by a "โ€" (U+200D, a Zero Width Joiner):

[..."๐Ÿ‘ฉโ€๐Ÿ’ป"] // => ["๐Ÿ‘ฉ"โ€, "\u{200D}", "๐Ÿ’ป"]

[..."๐Ÿ‘ฉโ€๐Ÿ’ป๐Ÿ‘ฉโ€โค๏ธโ€๐Ÿ’‹โ€๐Ÿ‘ฉ"] // => ["๐Ÿ‘ฉ", "\u{200D}", "๐Ÿ’ป", "๐Ÿ‘ฉ", "\u{200D}", "โค", "\u{fe0f}", "\u{200D}", "๐Ÿ’‹", "\u{200D}", "๐Ÿ‘ฉ"]
Enter fullscreen mode Exit fullscreen mode

3. A second algorithm with Zero Width Joiner

As you can see in this example, to count the number of visible characters, you can count the number of times two characters that are NOT Zero Width Joiner are side by side.

For example:

[..."a๐Ÿ‘ฉโ€๐Ÿ’ป๐ŸŽ‰"]
// => ["a", "๐Ÿ‘ฉ", "\u{200D}", "๐Ÿ’ป", "๐ŸŽ‰"]
//    | 1 |           2           |  3  |
Enter fullscreen mode Exit fullscreen mode

So we can make it a simple function:

function visibleLength(str) {
    let count = 0;
    let arr = [...str];

    for (c = 0; c < arr.length; c++) {
        if (
            arr[c] != '\u{200D}' &&
            arr[c + 1] != '\u{200D}' &&
            arr[c + 1] != '\u{fe0f}' &&
            arr[c + 1] != '\u{20e3}'
        ) {
            count++;
        }
    }
    return count;
}

visibleLength('Hello World'); // => 11
visibleLength('Hello World ๐Ÿ‘‹'); // => 13
visibleLength("I'm going to ๐Ÿ› !"); // 16
visibleLength('๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ'); // => 1
visibleLength('๐Ÿ‘ฉโ€๐Ÿ’ป๐Ÿ‘ฉโ€โค๏ธโ€๐Ÿ’‹โ€๐Ÿ‘ฉ'); // => 2

visibleLength('๐Ÿ‡ซ๐Ÿ‡ท'); // => 2 AAAAAAAAAAAAAA!!!
Enter fullscreen mode Exit fullscreen mode

Our function works in many cases, but not in the case of flags, because flags are two emojis-letters put side by side, but they are not separated by a Zero Width Joiner, they are simply transformed into flags by the supported platforms.

[..."๐Ÿ‡ซ๐Ÿ‡ท"] // => ["๐Ÿ‡ซ", "๐Ÿ‡ท"]
[..."๐Ÿ‡บ๐Ÿ‡ธ"] // => ["๐Ÿ‡บ", "๐Ÿ‡ธ"]
Enter fullscreen mode Exit fullscreen mode

4. The best solution (to use in production)

One of the best solutions we have to handle all these cases is to use a Grapheme algorithm capable of separating strings into visible phrases, words or characters.

To our delight, Javascript integrates this algorithm natively: Intl.Segmenter

It's pretty easy to use, AND IT WORKS WITH ALL CHARACTERS!:

function visibleLength(str) {
    return [...new Intl.Segmenter().segment(str)].length
}

visibleLength("I'm going to ๐Ÿ› !") // => 16
visibleLength("๐Ÿ‘ฉโ€๐Ÿ’ป") // => 1
visibleLength("๐Ÿ‘ฉโ€๐Ÿ’ป๐Ÿ‘ฉโ€โค๏ธโ€๐Ÿ’‹โ€๐Ÿ‘ฉ") // => 2
visibleLength("France ๐Ÿ‡ซ๐Ÿ‡ท!") // => 9
visibleLength("England ๐Ÿด๓ ง๓ ข๓ ฅ๓ ฎ๓ ง๓ ฟ!") // => 10
visibleLength("ใจๆ—ฅๆœฌ่ชžใฎๆ–‡็ซ ") // => 7
Enter fullscreen mode Exit fullscreen mode

There is just one small problem, Intl.Segmenter() is not compatible at all with Firefox (both Desktop and Mobile) and Internet Explorer.

Mozilla Developer Network screenshot of Intl Segmenter Browser compatibility

4.1 Use it on Firefox/IE

To make this solution compatible with Firefox, we need to use this polyfill: https://github.com/surferseo/intl-segmenter-polyfill

Because the file is very large (1.77 MB), we need to make sure that it is only loaded for clients that do not yet support Intl.Segmenter().

Because it's a bit out of the scope of this article, I'm just posting my solution:

4.2. Use it with TypeScript

Because this implementation is relatively new, if you want to use Intl.Segmenter() with TypeScript, make sure you have at least ES2022 as target:

// File: tsconfig.json
{
  "compilerOptions": {
    "target": "ES2022",
    // [...]
  }
  // [...]
}
Enter fullscreen mode Exit fullscreen mode

Thanks for reading!

๐Ÿ’– ๐Ÿ’ช ๐Ÿ™… ๐Ÿšฉ
cestoliv
Olivier Cartier

Posted on November 14, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

ยฉ TheLazy.dev

About