Oracle Character Set
Published by : Obay Salah , November 19, 2024
Select Character Set:
The database administrator must make two important decisions when creating the database:
The first decision is to choose the size of the Database Block.
(DB_BLOCK_SIZE) which cannot be changed because it is used in creating the System Tablespace, so the size of the Database Block cannot be changed unless the Data Dictionary is recreated
or in other words, the database is recreated.
The second decision is to choose the Character Set. It may be possible for the database administrator to change the Character Set, but it is not always or practically possible.
The Character Set Database stores data in columns of type VARCHAR2 and CHAR and CLOB and LONG.
If the database administrator changes the Set Character, it may destroy the data in columns of this type, so you should choose the Character Set
that will fulfill your current and future needs when creating the database.
If you have data in French or Spanish you need the Western European Character Set, if it is in Russian or Czech you need the Eastern European Character Set, but what if you have data in both Eastern and Western European languages? In addition to the fact that you may need to store data in Korean or Taiwanese languages,
Oracle provides two solutions to this problem:
1- National Character Set: introduced in Oracle Database version 8, which is a second Character Set that is selected at the moment the database is created and is used to store data
in columns of type NCLOB and NCHAR and NVARCHAR2, so if the database administrator expects that most of the His data will be in English and some in Japanese, so he will choose
Western European Character Set for Database Character Set and he will choose Kanji Character Set as National Character Set.
2- Unicode: But in Oracle 9i version, the scenario changed a little as it became necessary for the National Character Set to be of the Unicode type, which is a universal Character Set
that can represent all the characters that can be used in any computer. There are two types of Unicodes that can support the National Character Set:
Fixed-Width, Two-Byte Character Set :AL16UTF16 .
Variable-Width Character Set :UTF8 .
The best choice between these two options revolves around performance and storage efficiency.
Both the Database Character Set and the National Character Set are selected at the moment the database is created.
Change the Character Set:
The database administrator may sometimes need to change the Character Set for several reasons, for example, suppose the database was created with the default value USVASCII,
Later the database administrator discovered that he needed to store characters that were not included in this Character Set, for example a French name,
Before version 9i it was not possible to change the Character Set, but it is Version 9i and later The database now supports changing the Character Set, but there is no guarantee that this process will succeed. It is the responsibility of the database administrator to ensure that the Character Set conversion process will not damage the data. The problem is simply that the Character Set conversion operation
may not be able to format the current data in the Datafile, for example if the database administrator changes the Character Set from Western European to Eastern European
then many of the data circulating in Western Europe will appear with disastrous results.
Oracle provides two tools that help in determining the possibility of changing the Character Set:
1- Database Character Set Scanner, which is a tool that executes independently, as it connects to the database, accesses the Datafiles, and issues a report of potential problems
(csscan.exe on Windows & Unix on csscan).
csscan system/password full=y tochar=utf8
This command connects to the database via the SYSTEM user and scans the Data Files and checks whether the conversion to UTF8 may cause any problems.
The problem with converting to UTF8 is that characters that were encoded in the original Character Set as Byte One will require Byte Two in UTF8,
so the data may not fit in the column after the change.
The Database Character Set Scanner will produce a comprehensive report for each row where a problem occurs in the new Set Character, and you should then take appropriate actions to correct the problems before conversion if possible. 2- The Language and Character Set File Scanner This tool will try to identify the language and Character Set used in a text file,
It works on plain text only, this tool is very useful if you have data that you want to upload to the database and you do not know what the data is, the tool
examines the file and guesses the language and Character Set of the data. After making sure that the Character Set can be changed without any damage, you can execute the command.
Alter database character set utf8
You can also change the National Character Set in the same way Alter database national character set, but there is no guarantee that there will be no problems and this is the job of the database administrator.
Comments
no comment yet!