Convert Unicode strings to ASCII with ColdFusion & JUnidecode
James Moberg
Posted on August 4, 2020
I’ve struggled for years attempting to identify the best solution for converting unicode accents and other characters using ColdFusion. I’ve used regex, java.text.Normalizer, ICU4J Transliterate and Apache.Lang3.StringUtils.StripAccents and recently scrapped them all in favor of using JUnidecode. JUnidecode is a Java port of Text::Unidecode perl module. The JUnidecode Java library only has one method and it takes a string and transliterates it to a valid 7-bit ASCII String (obviously it also strips diacritic marks).
Examples:
- Москвa becomes Moskva.
- čeština becomes cestina.
- Հայաստան becomes Hayastan.
- Ελληνικά becomes Ellenika.
- 北亰 becomes Bei Jing
- Häuser Bäume Höfe Gärten becomes Hauser Baume Hofe Garten
- daß becomes dass
WARNING: Please be aware that Junidecode doesn't like emojis. You may need to sanitize (or convert to aliases) using cf-emoji-java prior to using converting to ASCII7.
Here's a demo script I've written that has some generic test cases:
https://gist.github.com/JamoCA/6565bd4e2526b7c177a5f0cde3980d1c
Posted on August 4, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.