NLS_LANG, character sets & more – Part 2

Some days ago I wrote about NLS_LANG, character sets & more. Today I saw another neat effect. I was trying to do it right, but my messages from SQL*Plus still looked scrambled:

12-win-1252

The thing is, that the characters I need are simply not present in the default font. When I change the font, the same screen now looks somehow different:

13-win-1252

Maybe I should switch from German to English and ASCII 7 completely…

Advertisements

NLS_LANG, character sets & more – How to do it right

Preface

Globalization is confusing. Everyone and everything is using different characters in different charcter sets. Oracle is prepared for that and offers several parameters and variables to control the behaviour. But one must be careful when setting these, that’s why I want to give a rough overview of the basic things.

Basics

When sending or retrieving character data to/from a database, there are three to four settings that can influence the display of that data.

nls-kette

  1. That is the encoding inside the database itself.
  2. The character set which the Oracle Client is using to display character data.
  3. The charcter set that is used by the operating system at the client side.
  4. The character set that PuTTY (or whatever terminal emulation you prefer) asumes the remote side is using.

The picture outlines the route which character data takes during the process of reading from or writing to the database. In this post I will talk about the yellow part of the diagram.

Let’s start with the database. There is some character data stored inside that is encoded using the configured database characterset. Now we want to retrieve data from the database. That means the Oracle Client (2) sends the SQL to the database (1) and in turn gets the data which is being converted automatically to the character set that is configured for the Oracle Client. That is typically done using the NLS_LANG variable. Beside that, this is the only point where a character set conversion might happen.

Next the charcter data is being displayed by the operating system (3) which is using it’s own character set. There is no conversion anymore, that means our NLS_LANG setting must match the setting of our OS.

And lastly maybe there is a terminal emulation like PuTTY which also defines a character set which obviously must match to the one used by the OS we are connecting to.

What can go wrong

Having this process in mind we see, that wrong parameter settings may not be recognized immediately. When inserting data into the database with wrong settings and then querying this data will get correct results as the translation is done in the same way. We start seeing wrong characters when we query the data with the correct settings.

Let’s say we have a Windows client and use SQL*Plus inside CMD to insert data. The system wide NLS_LANG variable is set to MSWIN1252 as Windows is using this character set. But as described in a previous post the CMD is using another character set, PC850.

So let’s create a table, insert some data and query that data:

01-win-1252

So we see that the special characters that I inserted are displayed correctly when querying the data because the same wrong transformation happens in both directions. More or less, I have no idea why the Euro sign gets messed up… Maybe because PC850 has no Euro sign…

Now I query the data from SQL Developer which is using the windows character set to display data:

02-win-1252

Now again a transformation might take place, depends on the database setting. And this time we see wrong data because the transformation was wrong when I inserted the data.

Next step, insert data with SQL Developer:

03-win-1252

Inserted and displayed correctly. But obviously CMD shows it as follows:

04-win-1252

This is all simply because the CMD renders characters in a different character set as the rest of Windows. So when we change CMD to use the proper code page, it looks like this:

05-win-1252

Now the first dataset is rendered differently, but the data from SQL Developer is shown properly.

Another way to display the data properly is to modify the NLS_LANG setting inside CMD:

06-win-1252

Only the Euro sign is missing since it is not part of the PC850 character set.

Now I set the NLS_LANG again to MSWIN1252 and insert a third record:

07-win-1252

Looks good so far, but again, I should crosscheck that with SQL Developer:

08-win-1252

Ok, the data is still displayed properly. So this is the correct setting that we should use for Windows.

But what about Linux? Linux is using UTF8 internally:

linux-nls

So I should set NLS_LANG to AL32UTF in order to get my data displayed correctly:

09-win-1252

As expected, the data shows up as it should be. But this is only because my PuTTY is using the right setting. What if I modify PuTTY to use MSWIN1252? Might be a valid setting because my Windows where PuTTY runs on is using that character set:

10-win-1252

How does the result now looks like?

11-win-1252

Totally messed up since my multibyte output from Linux is being interpreted as singlebyte. So that is not a good idea. The PuTTY character set setting must match the character set that is used by the OS that we connect to:

putty-nls

Conclusion

Be careful when setting NLS parameters on both, client and server, side. You might not notice a misconfiguration as long as you are using the same track for data retrival and insertion. Just use another client to crosscheck the data that you are dealing with. It all depends on the OS and the correct NLS_LANG setting at the client side.

There is a good FAQ from Oracle that outlines the whole NLS things.