Katz Ueno
Posted on June 9, 2021
MySQL has utf8 & utf8mb4 charcter collation.
Although UTF8 is one encoding type, MySQL's (MariaDB) utf8 only allows to store upto 3-byte characters.
Therefore, some new emojis and Japanese and Chinese characters are not able to store it unless you use utf8mb4.
My CTO at concrete5 Japan, Inc and I came up with the solution to export mysqldump and search the txt in PHP.
It was difficult to search the text only by using SQL.
Export MySQL Database with MySQLDump
We used --skip-extended-insert
option so that each record has one line
mysqldump -h {$host} -u {$db_username} --password="{$db_password}" --default-character-set=utf8mb4 --single-transaction --skip-extended-insert {$db_name} {$db_tables} > {$SQL_FILENAME}
If you need forgot to change your DB collation to utf8mb4, and need to re-work on the existing data, you may want to mysqldump using --replace --no-create-info
options.
mysqldump -h {$host} -u {$db_username} --password="{$db_password}" --default-character-set=utf8mb4 --single-transaction --replace --no-create-info --skip-extended-insert {$db_name} {$db_tables} > {$SQL_FILENAME}
Search UTF8MB4 characters in PHP
Save this as textsearch.php
and run it on your local or server.
<?php
$searchRegex = '/[\x{10000}-\x{10FFFF}]/iu';
$matches = [];
$handle = fopen("SQL_FILENAME", "r");
if ($handle) {
while (!feof($handle)) {
$buffer = fgets($handle);
if (preg_match($searchRegex, $buffer)) {
$matches[] = $buffer;
}
/*
// If you want to print out which characters are really UTF8MB4, use this pref_match_all instead.
if (preg_match_all($searchRegex, $buffer, $matchesChar)) {
echo "Character(s): " . implode(',', $matchesChar[0]) . "\n";
print_r($buffer);
}
*/
}
fclose($handle);
} else {
echo "nope! cant open it\n";
}
$ php textsearch.php > {$result.sql}
It will filter you the utf8mb4 SQL.
Tested with PHP 7.4.20 with mbstring extension.
Posted on June 9, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.