I had to update the default character set to utf8mb4 on my mysql server. I was getting quite a few submissions on my tomcat hosted systems with emojies like the ? symbol being inserted. These always failed with a database error. Reading up about this it appears the utf8 support of mysql was for up to a 3 byte utf character. Full support requires 4 bytes, and this was a late addition to mysql.
I dutifully converted the database, and tables to utf8mb4, running into issues with the referential integrity I use. Database, tables and columns all converted, set the default client and mysql to utf8mb4 and problem solved. I could now add the ? into the application.
Along came the boss, who now complained about all the capital As, with a hat on top, in this the blasdale.com blog. Arrgh. Yes in the blasdale database there were some latin1 tables from a really old install of wordpress. So a conversion of this database and tables to utf8mb4 but still the capital A with a hat appeared. More googling, yes I had UTF data stored into a latin1 column, so had to run some sql to convert the data.
update wp_posts SET post_content=convert(cast(convert(post_content using latin1) as binary) using utf8mb4);
In total I ran the following SQL statements on the server:
ALTER DATABASE blasdale_blog CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci; ALTER TABLE wp_blc_filters CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_blc_links CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_blc_synch CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_commentmeta CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_comments CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_email_list CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_email_list_config CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_email_list_future CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_hl_twitter_replies CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_hl_twitter_tweets CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_hl_twitter_users CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_links CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_ngg_album CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_ngg_gallery CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_ngg_pictures CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_options CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_postmeta CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_posts CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_subscribe2 CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_term_relationships CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_term_taxonomy CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_termmeta CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_terms CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_usermeta CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_users CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_wfBadLeechers CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_wfBlockedIPLog CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_wfBlocks CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_wfBlocksAdv CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_wfConfig CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_wfCrawlers CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_wfFileMods CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_wfHits CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_wfHoover CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_wfIssues CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_wfKnownFileList CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_wfLeechers CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_wfLockedOut CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_wfLocs CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_wfLogins CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_wfNet404s CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_wfNotifications CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_wfReverseCache CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_wfSNIPCache CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_wfScanners CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_wfStatus CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_wfThrottleLog CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE wp_wfVulnScanners CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; SELECT column_name,character_set_name FROM information_schema.`COLUMNS` WHERE table_schema = "blasdale_blog" AND table_name = "wp_hl_twitter_tweets"; create table wp_posts_bkp LIKE wp_posts; insert wp_posts_bkp select * from wp_posts; update wp_posts SET post_content=convert(cast(convert(post_content using latin1) as binary) using utf8mb4); update wp_posts SET post_title=convert(cast(convert(post_title using latin1) as binary) using utf8mb4); update wp_posts SET post_content_filtered=convert(cast(convert(post_content_filtered using latin1) as binary) using utf8mb4);
I also updated the WordPress configuration to include utf8mb4 instead of utf8:
/** Database Charset to use in creating database tables. */ define('DB_CHARSET', 'utf8mb4');
The problem was not caused by the initial conversion, but by the statements I had placed in the my,cnf configuration file which caused WordPress to assume all connections are utf8mb4.
[mysqld] local-infile=0 datadir=/var/lib/mysql socket=/var/lib/mysql/mysql.sock character-set-client-handshake = FALSE character-set-server = utf8mb4 collation-server = utf8mb4_unicode_ci [client] default-character-set = utf8mb4 [mysql] default-character-set = utf8mb4
I think it is all working now, tomcat is happy and so is my WordPress install. I do though have many tables in other databases which are only utf8. I expect they will function unless someone tries to insert the 💩 into a page or post. I expect I will get around to converting those databases, should be straightforward…..