Convert Unicode strings to ASCII with ColdFusion & JUnidecode

gamesover

James Moberg

Posted on August 4, 2020

Convert Unicode strings to ASCII with ColdFusion & JUnidecode

I’ve struggled for years attempting to identify the best solution for converting unicode accents and other characters using ColdFusion. I’ve used regex, java.text.Normalizer, ICU4J Transliterate and Apache.Lang3.StringUtils.StripAccents and recently scrapped them all in favor of using JUnidecode. JUnidecode is a Java port of Text::Unidecode perl module. The JUnidecode Java library only has one method and it takes a string and transliterates it to a valid 7-bit ASCII String (obviously it also strips diacritic marks).

Examples:

  • Москвa becomes Moskva.
  • čeština becomes cestina.
  • Հայաստան becomes Hayastan.
  • Ελληνικά becomes Ellenika.
  • 北亰 becomes Bei Jing
  • Häuser Bäume Höfe Gärten becomes Hauser Baume Hofe Garten
  • daß becomes dass

WARNING: Please be aware that Junidecode doesn't like emojis. You may need to sanitize (or convert to aliases) using cf-emoji-java prior to using converting to ASCII7.

Here's a demo script I've written that has some generic test cases:
https://gist.github.com/JamoCA/6565bd4e2526b7c177a5f0cde3980d1c

💖 💪 🙅 🚩
gamesover
James Moberg

Posted on August 4, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related