MySQL 5.1 supports two character sets for storing Unicode data:
ucs2, the UCS-2 Unicode character set.
utf8, the UTF-8 encoding of the Unicode
character set.
In UCS-2 (binary Unicode representation), every character is
represented by a two-byte Unicode code with the most significant
byte first. For example: LATIN CAPITAL LETTER A
has the code 0x0041 and it is stored as a
two-byte sequence: 0x00 0x41. CYRILLIC
SMALL LETTER YERU (Unicode 0x044B) is
stored as a two-byte sequence: 0x04 0x4B. For
Unicode characters and their codes, please refer to the
Unicode Home Page.
The MySQL implementation of UCS-2 stores characters in big-endian byte order and does not use a byte order mark (BOM) at the beginning of UCS-2 values. Other database systems might use little-ending byte order or a BOM, in which case conversion of UCS-2 values will need to be performed when transferring data between those systems and MySQL.
Currently, UCS-2 cannot be used as a client character set, which
means that SET NAMES 'ucs2' does not work.
UTF-8 (Unicode Transform representation) is an alternative way to store Unicode data. It is implemented according to RFC 3629. The idea of UTF-8 is that various Unicode characters are encoded using byte sequences of different lengths:
Basic Latin letters, digits, and punctuation signs use one byte.
Most European and Middle East script letters fit into a two-byte sequence: extended Latin letters (with tilde, macron, acute, grave and other accents), Cyrillic, Greek, Armenian, Hebrew, Arabic, Syriac, and others.
Korean, Chinese, and Japanese ideographs use three-byte sequences.
RFC 3629 describes encoding sequences that take from one to four bytes. Currently, MySQL support for UTF-8 does not include four-byte sequences. (An older standard for UTF-8 encoding is given by RFC 2279, which describes UTF-8 sequences that take from one to six bytes. RFC 3629 renders RFC 2279 obsolete; for this reason, sequences with five and six bytes are no longer used.)
Tip: To save space with UTF-8,
use VARCHAR instead of CHAR.
Otherwise, MySQL must reserve three bytes for each character in a
CHAR CHARACTER SET utf8 column because that is
the maximum possible length. For example, MySQL must reserve 30
bytes for a CHAR(10) CHARACTER SET utf8 column.

User Comments
Simple example:
CREATE TABLE `family` (
`name` varchar(100) NOT NULL,
`savings` decimal(3,2) NOT NULL default '0.00',
`dob` date NOT NULL default '0000-00-00',
PRIMARY KEY (`name`),
UNIQUE KEY `name` (`name`)
) TYPE=InnoDB CHARACTER SET utf8
;
Rob
:)
In previous versions of MySQL, getObject() on an item stored as a varchar returned a character array.
Now it returns a byte array - so if it's a string you want, be sure to call getString() and not getObject().
Keep in mind:
REGEXP / RLIKE are not multi-byte safe.
dev.mysql.com/doc/mysql/en/String_comparison_functions.html
it works. BUT :
make sure you issue the command
'set names "utf8"'
before you do anything. Or you will spend hours wondering why your data is slightly mangled, with mysql giving no sign that anything is up.
Connect with the same characterset as your data to display correctly. This example connects to the MySQL-server using UTF-8:
mysql --default-character-set=utf8 -uyour_username -p -h your_databasehost.your_domain.com your_database
If you get into trouble from a PHP-based web application, check the characterset configurations of these components:
1) the MySQL database
2) php.ini
3) httpd.conf
4) your server
For Thai Unicode UTF-8 characters; MySQL also stores using three-byte sequence.
if you get data via php from your mysql-db (everything utf-8)
but still get '?' for some special characters in your browser
(<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />),
try this:
after mysql_connect() , and mysql_select_db() add this lines:
mysql_query("SET NAMES utf8");
worked for me.
i tried first with the utf8_encode, but this only worked for äüöéè...
and so on, but not for kyrillic and other chars.
Add your own comment.