What is the N prefix in MSSQL all about?

I used to learn about SQL, especially MySQL. Then now my job requires me to understand Microsoft SQL Server (mssql). So I make these notes to guide me and others who want to transition to mssql. I usually summarized what I found and understood so far from browsing the internet.

My takeaways:

  • It's declaring the string as NCHAR, NVARCHAR, or NTEXT data type, rather than CHAR, VARCHAR, or TEXT.

  • This N prefix denotes that subsequent string is in Unicode (the N actually stands for National language character set).

  • To quote from Microsoft:

    Without the N prefix, the string is converted to the default code page of the database and may not recognize certain characters.

  • The NVARCHAR column can store any Unicode data. The VARCHAR column is restricted to an 8-bit.

  • By using NVARCHAR rather than VARCHAR, we can avoid doing encoding conversions every time we read from or write to the database. Some people think that VARCHAR should be used because it takes up less space. Codepage incompatibilities are pain and Unicode is the cure for codepage problems.

  • The VARCHAR data type represents the non-Unicode variable-length string data type. We can store letters, numbers, and special characters in it.

N represents string size in bytes (not a value representing the number of characters in a string).
It stores a maximum of 8000 Non-Unicode characters.
It takes 1 byte per character. If we do not explicitly specify the value for N, it takes 1-byte storage.

-- DECLARE @var AS VARCHAR(N) = 'text';
DECLARE @text AS VARCHAR(100) = 'this VARCHAR yields 33 characters';
SELECT
    @text AS Output,
    DATALENGTH(@text) AS Length;

The query above defines VARCHAR data type with 100 bytes of data.
VARCHAR DDL-TSQL
It returns the string length as 33 because of 1 byte per character, including a space character.
VARCHAR query

  • The NVARCHAR data type is for the Unicode variable-length character data type. We can store both non-Unicode and Unicode characters (Japanese Kanji, Korean Hangul, etc.).

N represents string size in bytes (not a value representing the number of characters in a string).
It stores a maximum of 4000 Unicode and Non-Unicode characters.
It takes 2 byte per character. If we do not explicitly specify the value for N, it takes 2-byte storage.

-- DECLARE @var AS NVARCHAR(N) = N'text';
DECLARE @text AS NVARCHAR(100) = N'this NVARCHAR yields 34 characters';
SELECT
    @text AS Output,
    DATALENGTH(@text) AS Length;

The query above defines NVARCHAR data type with 100 bytes of data.
NVARCHAR DDL-TSQL
It returns the string length of 68 because NVARCHAR takes 2 bytes per character.
NVARCHAR query

  • Storing example:
CREATE TABLE UserComments
(
    ID int IDENTITY (1,1),
    [Language] VARCHAR(50),
    [Comment] VARCHAR(200),
    [NewComment] NVARCHAR(200)
);

INSERT INTO UserComments ([Language],[Comment],[NewComment]) 
VALUES ('English','Hello World', N'Hello World');
INSERT INTO UserComments ([Language],[Comment],[NewComment]) 
VALUES ('Japanese','こんにちは世界', N'こんにちは世界');
INSERT INTO UserComments ([Language],[Comment],[NewComment]) 
VALUES ('Hindi','नमस्ते दुनिया', N'नमस्ते दुनिया');

SELECT * FROM UserComments;

References:

51