Unicode characters turn into question marks

Permalink 10 users found helpful
I'm working on an educational web site that will need to incorporate a good deal of Hebrew and Aramaic texts. However, currently, whenever I enter any of these texts in unicode, it displays as a series of question marks (??????) after saving.

I've spent quite literally hours searching the forums and see that other people have had similar problems, however I was unable to find a solution.

I also saw that unicode does work on concrete5.org and was able to save hebrew script and have it display correctly using the test site feature.

Does anyone have any suggestions what's wrong with my setup? Thanks.

 
defunct replied on at Permalink Reply
defunct
Can provide a live link please so we can see your html source?

Are you using UTF-8 html charset in your html?

What charset is your database using?
nova108 replied on at Permalink Reply
I am afraid that I unable to provide a link as the site is currently on a development server.

I can say however that I am using UTF-8 html charset.
mose replied on at Permalink Best Answer Reply
mose
Your database was not created to support UTF8. Assuming that the name of your database is "concrete", it should have been created with this command.

mysql> create database concrete default character set utf8 default collate utf8_unicode_ci;

In php.ini, you should set the following variables.

default_charset = "UTF-8"
mbstring.internal_encoding = UTF-8

If you own the server or have full control over mysql, you can add the following lines to /etc/mysql/my.cnf under the appropriate sections to force all connections to all databases to be UTF8.

[client]
default-character-set=utf8

[mysqld]
init_connect='SET collation_connection = utf8_unicode_ci; SET NAMES utf8;'
default-character-set=utf8
character-set-server=utf8
collation-server=utf8_unicode_ci
skip-character-set-client-handshake

[mysqldump]
default-character-set=utf8

[mysql]
default-character-set=utf8
nova108 replied on at Permalink Reply
I've edited the php.ini and my.cnf files as you described. I've also looked over a number of the character tables in my installation's database, all of which are utf8 and utf8_unicode_ci. I assume this is true for the rest of the database. Unfortunately, the characters are still turning into question marks.

BTW, it may be worth mentioning that I am working on Mac OS Server and the database was set up using MySQL Administrator downloaded from the MySQL website.
mose replied on at Permalink Reply
mose
Be sure to restart mysql and apache after making the changes.

I am not familiar with with the Administrator program. If you have the ability to look at the database variables (mysql> show variables;), there will be a list of character_set_* variables that should be utf8 (except for character_set_filesystem). All of the collation_* variables should be utf8_unicode_ci.

If the name of the database is concrete, you can see the command that was used to create the database with

mysql> show create database concrete;

It should look something like this.

mysql> show create database concrete;
+----------+-------------------------------------------------------------------+
| Database | Create Database |
+----------+-------------------------------------------------------------------+
| concrete | CREATE DATABASE `concrete` /*!40100 DEFAULT CHARACTER SET utf8 */ |
+----------+-------------------------------------------------------------------+
1 row in set (0.02 sec)

Check the configuration of the web server to confirm that it has the following line in its configuration (possibly in /etc/apache2/conf.d/charset).

AddDefaultCharset UTF-8

If it does not have that line (or the line specifies a different default character set), update it and restart the web server.
mose replied on at Permalink Reply
mose
If you are simply displaying information that already exists in the database, the information was probably corrupted when it was entered. You will need to enter the information, again, now that a UTF-8 connection has been established with the database.
mario replied on at Permalink Reply
mario
@c5 team: could we add the above info to the Internationalization Documentation? I think it's fairly important for people that are planning on supporting now or in the future alternate character sets.
nova108 replied on at Permalink Reply
@Mose: I checked the database default charset like you said in the command line. Sure enough, the program had set up the database with a non utf8 charset.

After a little research I modified your create command to:
ALTER DATABASE `concrete` DEFAULT CHARACTER utf8 COLLATE utf8_unicode_ci;

This seems to have done the trick. I can now insert the unicode characters and have them appear. Thanks for your help!

@Mario: I support your suggestion.
harunkaraman replied on at Permalink Reply 1 Attachment
harunkaraman
I had the same problem, and fixed that.

Run this script for your database, and you will see it fixed. The problem is, most likely you had your default charset set to .. swedish_ci. Later changing it to UTF-8 does not help. Because, C5 already created tenth of new tables which are not utf8 and you keep on adding to it. Sometimes, host providers does not watch the default charsets during upgrade.

After backing up your database, copy the content of below php code in a file. I also attached the file, rename the extension to PHP and run it.

Edit the database settings,

Run it with your browser.

Done!

=== PHP CODE BEGINS ===

<?php
$host=' '; //this is the database hostname, Do not change this.
$user=' '; //please set your mysql user name
$pass=' '; // please set your mysql user password
$dbname=' '; //please set your Database name
$charset='utf8'; // specify the character set
$collation='utf8_general_ci'; //specify what collation you wish to use

$db = mysql_connect('localhost',"$user","$pass") or die("mysql could not CONNECT to the database, in correct user or password " . mysql_error());
mysql_select_db("$dbname") or die("Mysql could not SELECT to the database, Please check your database name " . mysql_error());
$result=mysql_query('show tables') or die("Mysql could not execute the command 'show tables' " . mysql_error());
while($tables = mysql_fetch_array($result)) {
foreach ($tables as $key => $value) {
mysql_query("ALTER TABLE $value CONVERT TO CHARACTER SET $charset COLLATE $collation") or die("Could not convert the table " . mysql_error());
}}
mysql_query("ALTER DATABASE $dbname DEFAULT CHARACTER SET $charset COLLATE $collation") or die("could not alter the collation of the databse " . mysql_error());
echo "The collation of your database has been successfully changed!";
?>

=== PHP CODE END ===


You may contact us if you have any problem
http://www.kordil.com
mario replied on at Permalink Reply
mario
thanks for the damage control tip!
ptheis replied on at Permalink Reply
ptheis
Thanks!

this worked great for me. one minor little error in the script, on line 9, 'localhost' should be "$host"; otherwise the script will not work on servers where the database is hosted on another server (such as it is with my setup).

All in all, excellent, and fixed a major problem I was having!
deum replied on at Permalink Reply
Thanks! Nice, simple script. Worked well for me and saved a lot of time!
MASoroush replied on at Permalink Reply
Hi friend, It was great. you solved my problem
thanks a lot.
sevaggelinos replied on at Permalink Reply
Thanks it worked for me too!!!
kinoman104 replied on at Permalink Reply
Hi, I have similar problems, and I'm trying to fix it with your directions. I'm new with all this stuff and don't get how can I run the file?
da4kinov replied on at Permalink Reply
I don't know how to run it too...
sceva replied on at Permalink Reply
sceva
This does run a command on the database, so do a backup of the database first!

I just ran this on a site with Bluehost and it worked beautifully - although I had to re-enter the non-latin words for the ???'s to become letters - probably because without the utf they were corrupted.

Since a couple of people seemed to need a bit more instruction, I will give a shot at it.

Copy the code above starting with
<?php
and ending with
?>
.
Paste it into a text editor like notepad.

The following 3 lines need your login information to connect to your database:
$user='bob'; //please set your mysql user name
$pass='reallyhardtobreakpassword'; // please set your mysql user password
$dbname='bobsdatabase'; //please set your Database name

If you don't know what this information is, contact the tech support with your hosting company and they will tell you how to find it.

Save the file as change_utf.php
Upload the file to the root directory of your website - the same place your index.php file is located. You can do this via ftp or the online file manager your web hosting company provides.

Open a browser and run the script. If your site name is bob.com you should enterhttp://bob.com/change_utf.php

If everything is correct you will get a message in a minute or so stating "The collation of your database has been successfully changed!" Otherwise you will have an error message - read it carefully as it will direct you to what went wrong. Most likely you entered one of the items above incorrectly.

Be sure and delete the change_utf.php from the website when you are finished!! You don't want someone reading the password info!!

Thanks to the others who helped with this post.
kinoman104 replied on at Permalink Reply
Thank you very much!
berardiinjapan replied on at Permalink Reply
berardiinjapan
Thanks for the step-by-step explanation; this was just what I was looking for. Worked great!
globaltest replied on at Permalink Reply
Thx for the script! You saved my day!
youphak replied on at Permalink Reply
We i have create this script and run with php utf8-php.php it show this message error

PHP Warning: Module 'mcrypt' already loaded in Unknown on line 0
PHP Fatal error: Uncaught Error: Call to undefined function mysql_connect()

Pls advise.
xanbei replied on at Permalink Reply
Thank you, turning database encoding to UTF-8 worked!
duxferrarie replied on at Permalink Reply
Great thread, very helpful. Thanks a lot, guys!
witwag replied on at Permalink Reply
witwag
Good script ! Thanks. I had the same problem with Russian (cyrillics) characters and a DB with UTF-8 general ci encoding, but upon saving with TinyMCE all the text would appear with question marks instead of normal russian characters. Since it's the weekend and all my coders are having a rest it's good to be able to solve this quickly.