Running an Internationalization / Localization [or i18n / L10n] friendly website can be tricky, and sometimes downright maddening for those who haven’t yet delved into the world of Unicode. Allowing your users to post in whichever language and / or characters of their choice to your site is crucial for any modern website.
Here are a few things I have very painfully learned over the last 5 or so years on this topic … specifically with PHP and MySQL.
There are hundreds of character sets representing most of the languages on Earth, usually one per geographic location [Latin, Cyrillic, Greek, Arabic, Korean, Chinese etc...]. One character set that covers all of these is
UTF-8
. So how can you put ‘UTF-8
‘ to practical use? Easy … here’s how I’ve done it:Headers! Get your headers!
The most important area to implement
UTF-8
is in your charset
header within your outgoing HTML headers. This tells the browser that you have multi-byte characters in your HTML and you’d like it do display them as such [and not as the default ISO-8859-1
].To do this, put this at the very top of your PHP scripts [with the headers and before any HTML is echoed]:
<?phpheader("Content-Type: text/html; charset=utf-8");?>
And this in your HTML <head> section:
<?phpecho "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />\n";?>
MySQL / UTF-8 love
The second most important thing is to make sure your database is also
UTF-8
friendly. Be sure to set all your table / column collations [char / text] to utf8_unicode_ci
. This tells MySQL to treat this data as UTF-8.Once you’ve done that, you’ll need to tell PHP to connect to the MySQL daemon under a
UTF-8
connection [otherwise the default islatin1
... and your data will be stored in MySQL as such -- no good!]. Run this right after you connect to MySQL:
<?phpmysql_query("SET NAMES 'utf8'");mysql_query("SET CHARACTER SET utf8");?>
Multibyte fun
Last, take advantage of PHP’s Multibyte String Functions! Oftentimes this is as easy as prefixing your string comparison functions with
mb_
. But, before you start using these functions you’ll need to tell PHP which character set to use [once again!] because the default is ISO-8859-1
:
<?phpmb_internal_encoding("UTF-8");?>
Forms
One often neglected method is ensuring that the data the server gets is UTF-8 encoded. One way to try and do this with HTML forms is to include the
accept-charset
attribute in your form tag. I say “try” because it’s just a suggestion to the client which submits the form. Be aware that some clients may not pay much attention to the attribute, especially older browsers. [Thanks to Alejandro for the heads up :-)]
<form action="/action" method="post" accept-charset="utf-8">
If you’ve gotten this far you should see some dramatic improvements to your web site’s accessibility and usability, drawing in users from around the world.
NOTE: This is a work in progress and I fully welcome any new ideas to this cocktail of methods. If you have anything to add, PLEASE DO SO
0 comments:
Post a Comment