Monday, 19 June 2017

Choosing Between Char, Varchar and NVarChar Data Types

What is the best way to store a string data type into an SQL Server database? I’d probably decide not to influence your decision about choosing which one was the best among the three, but I’d demonstrate an example with which we could use to differentiate each data type from the rest.

In SQL Server, we can interchangeably use CHAR and VARCHAR or NVARCHAR to store a string value like ‘GLEN’ for example. Each of which will allocate the right space for it and the effect of how the data is stored is respective to each own.

For example, the CHAR data type can hold the string ‘GLEN’ and so does VARCHAR but these won’t tell you about the kind of strategy being used to save the word ‘GLEN’ into a field. So to give us a bird’s eye view of how the SQL Server stores data on each data type, here are the basic differences between the three:

Data is padded with blanks to fill the entire field size. For instance, if one creates a CHAR(20) field to store a string with 4 characters only,  the system will pad it with additional 16 blanks to fill in the unused storage space being allocated. This data type requires 1 byte of storage and can store up to 8000 characters.

Unlike the CHAR data type, if one creates the field of the same length, e.g. VARCHAR(20), the system will store the actual value of the string ‘GLEN’ and will not pad it with extra blanks. Using this data type will actually save us an extra space in the long run. Like CHAR, this requires 1 byte of storage and can store up to 8000 characters.

Like the VARCHAR data type, this will also store the actual value of the string and does not pad the unused allocated characters with blanks. The advantage of using NVARCHAR over the two counterparts is that it supports both Unicode and Non-Unicode Data Types. As opposed to VARCHAR, this requires 2 bytes of storage and can only store up to 4000 characters.

Difference between Unicode and Non-Unicode Data Types

An example of Non-Unicode data types is the US English characters. Data types that use 1 byte to encode each character can only represent 256 different characters. The client tools like ADO.Net, uses the collation of Non-Unicode data as a hint to choose the code page of the data, like 1252 for Latin1 (ANSI) as an example.

With Unicode, however, the client does not need to interpret the data according to a code page, because the data itself is already multi-byte and the client will just interpret it according to the UCS-2 encoding which is the actual Unicode style preferred by SQL Server.

SQL Server Example

In this example, we have created a table with three fields of different data types: CHAR, VARCHAR and NVARCHAR. We then filled it in with similar values, and for this case, it’s my name, GLEN.

To easily find out how much storage space our strings have taken up, I have used the function DATALENGTH(). The result shows the spaces the three data types have used up respectively.

CHAR = 20 bytes
VARCHAR = 4 bytes
NVARCHAR = 8 bytes

Notice that NVARCHAR uses 2 bytes of storage for each characters. It can be said that using NVARCHAR is not a wise choice because the additional byte isn’t necessary if the system we’re creating is exclusively using the US English language only. However, if the program you wish to create requires support for at least one foreign language, like the Korean Hangul or the Japanese Kanji for instance, then using NVARCHAR data type is the way to go.

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...