Globalization is confusing. Everyone and everything is using different characters in different charcter sets. Oracle is prepared for that and offers several parameters and variables to control the behaviour. But one must be careful when setting these, that’s why I want to give a rough overview of the basic things.
When sending or retrieving character data to/from a database, there are three to four settings that can influence the display of that data.
- That is the encoding inside the database itself.
- The character set which the Oracle Client is using to display character data.
- The charcter set that is used by the operating system at the client side.
- The character set that PuTTY (or whatever terminal emulation you prefer) asumes the remote side is using.
The picture outlines the route which character data takes during the process of reading from or writing to the database. In this post I will talk about the yellow part of the diagram.
Let’s start with the database. There is some character data stored inside that is encoded using the configured database characterset. Now we want to retrieve data from the database. That means the Oracle Client (2) sends the SQL to the database (1) and in turn gets the data which is being converted automatically to the character set that is configured for the Oracle Client. That is typically done using the NLS_LANG variable. Beside that, this is the only point where a character set conversion might happen.
Next the charcter data is being displayed by the operating system (3) which is using it’s own character set. There is no conversion anymore, that means our NLS_LANG setting must match the setting of our OS.
And lastly maybe there is a terminal emulation like PuTTY which also defines a character set which obviously must match to the one used by the OS we are connecting to.
What can go wrong
Having this process in mind we see, that wrong parameter settings may not be recognized immediately. When inserting data into the database with wrong settings and then querying this data will get correct results as the translation is done in the same way. We start seeing wrong characters when we query the data with the correct settings.
Let’s say we have a Windows client and use SQL*Plus inside CMD to insert data. The system wide NLS_LANG variable is set to MSWIN1252 as Windows is using this character set. But as described in a previous post the CMD is using another character set, PC850.
So let’s create a table, insert some data and query that data:
So we see that the special characters that I inserted are displayed correctly when querying the data because the same wrong transformation happens in both directions. More or less, I have no idea why the Euro sign gets messed up… Maybe because PC850 has no Euro sign…
Now I query the data from SQL Developer which is using the windows character set to display data:
Now again a transformation might take place, depends on the database setting. And this time we see wrong data because the transformation was wrong when I inserted the data.
Next step, insert data with SQL Developer:
Inserted and displayed correctly. But obviously CMD shows it as follows:
This is all simply because the CMD renders characters in a different character set as the rest of Windows. So when we change CMD to use the proper code page, it looks like this:
Now the first dataset is rendered differently, but the data from SQL Developer is shown properly.
Another way to display the data properly is to modify the NLS_LANG setting inside CMD:
Only the Euro sign is missing since it is not part of the PC850 character set.
Now I set the NLS_LANG again to MSWIN1252 and insert a third record:
Looks good so far, but again, I should crosscheck that with SQL Developer:
Ok, the data is still displayed properly. So this is the correct setting that we should use for Windows.
But what about Linux? Linux is using UTF8 internally:
So I should set NLS_LANG to AL32UTF in order to get my data displayed correctly:
As expected, the data shows up as it should be. But this is only because my PuTTY is using the right setting. What if I modify PuTTY to use MSWIN1252? Might be a valid setting because my Windows where PuTTY runs on is using that character set:
How does the result now looks like?
Totally messed up since my multibyte output from Linux is being interpreted as singlebyte. So that is not a good idea. The PuTTY character set setting must match the character set that is used by the OS that we connect to:
Be careful when setting NLS parameters on both, client and server, side. You might not notice a misconfiguration as long as you are using the same track for data retrival and insertion. Just use another client to crosscheck the data that you are dealing with. It all depends on the OS and the correct NLS_LANG setting at the client side.
There is a good FAQ from Oracle that outlines the whole NLS things.