Eliminating Redundant Delimiters with NULL

E


Nullability is a concept that occasionally causes consternation among SQL developers, especially new ones. It is a somewhat controversial construct and some believe relational databases should not allow NULLs at all. (See C.J. Date)

Like it or not, the concept made it into the relational model and we might as well take advantage of it where possible. Imagine a table that contains parts of a mailing address split over multiple columns:

uidAddressLine1AddressLine2AddressLine3
115722 Dapin PlaceSuite 2042nd Floor
2224 Debs CircleLevel 4NULL
39 Gateway TerraceNULLUnit #9
4NULL07418 Hagan CenterNULL
528154 American Ash ParkNULLNULL
6NULLNULL601 Business Park Circle

The table has a unique identifier and three non-key columns – AddressLine1 through AddressLine3 – which contain the address, split into parts. The address columns may contain text or NULLs. Never mind that these columns violate the no repeating groups requirement of first normal form and pretend we have no control over the table structure; we just have to use it. Below is the code that creates our sample data.

DROP TABLE IF EXISTS #Address;

SELECT *
INTO #Address
FROM
(
	VALUES
	('1', '15722 Dapin Place', 'Suite 204', '2nd Floor'),
	('2', '224 Debs Circle', 'Level 4', NULL),
	('3', '9 Gateway Terrace', NULL, 'Unit #9'),
	('4', NULL, '07418 Hagan Center', NULL),
	('5', '28154 American Ash Park', NULL, NULL),
	('6', NULL, NULL, '601 Business Park Circle')
) AS x (uid, AddrLine1, AddrLine2, AddrLine3);

Our challenge is to concatenate the contents of the three address columns into a single column, delimited by a carriage/return line feed, so that each element appears on a separate line when displayed. “Concatenate” is just a fancy word for combining two or more elements to create a single new element. A delimiter should only appear between non-NULL elements and not before the first element or after the last one. Carriage return is ASCII character 13 and line feed is character 10, which in SQL become CHAR(13) and CHAR(10), respectively.

This post applies specifically to Microsoft SQL Server, but the concepts and many of the language elements will apply to any SQL-based platform. The SQL Server concatenation operator is the addition symbol (+). The default behavior of concatenation is that concatenating a NULL operand with any other operand – text, date or numeric constants, columns, variables, parameters, etc. – results in a NULL. In the example below, which concatenates a string with a NULL, both queries return NULL.

SELECT '1600 Pennsylvania Ave.' + NULL;

SELECT  NULL + '1600 Pennsylvania Ave.';

Using our example temp table created earlier, the code below will return a NULL for every row in the #Address table except the first, which has no NULL columns.

SELECT 
	uid,
	AddrLine1 + AddrLine2 + AddrLine3 AS Address
FROM #Address;
uidAddress
115722 Dapin PlaceSuite 2042nd Floor
2NULL
3NULL
4NULL
5NULL
6NULL

One way around this behavior is to turn it off with the following command: SET CONCAT_NULL_YIELDS_NULL OFF. In the example below, both queries return 1600 Pennsylvania Ave. The setting only affects the current connection to the server and stays active until the session ends or until the default is turned back on: SET CONCAT_NULL_YIELDS_NULL ON.

SET CONCAT_NULL_YIELDS_NULL OFF;

SELECT '1600 Pennsylvania Ave.' + NULL;

SELECT  NULL + '1600 Pennsylvania Ave.';

Despite the fact that this command will keep NULLs from causing the query results to be NULL, using it is generally considered bad practice and is frowned upon. Before we move on, restore the default NULL-handling behavior.

SET CONCAT_NULL_YIELDS_NULL ON;

Our next step is to address the NULLs to allow the non-NULL address elements to shine through. The ISNULL function takes two arguments and returns the first one if it is not NULL and the second one otherwise.

SELECT 
	uid,
	ISNULL(AddrLine1, '')
	+ ISNULL(AddrLine2, '')
	+ ISNULL(AddrLine3, '')
FROM #Address;
uidAddress
115722 Dapin PlaceSuite 2042nd Floor
2224 Debs CircleLevel 4
39 Gateway TerraceUnit #9
407418 Hagan Center
528154 American Ash Park
6601 Business Park Circle

We are now a step closer to our desired result – every address is non-NULL. The next step is to start adding in the delimiters. The end goal is to use carriage return/line feed as the delimiters, but for now, to make both the code and the results more readable, we will use a pair of tilde characters (~) as the delimiter.

SELECT 
	uid,
	ISNULL(AddrLine1, '')
	+ '~~' + ISNULL(AddrLine2, '')
	+ '~~' + ISNULL(AddrLine3, '')
FROM #Address;
uidAddress
115722 Dapin Place~~Suite 204~~2nd Floor
2224 Debs Circle~~Level 4~~
39 Gateway Terrace~~~~Unit #9
4~~~~07418 Hagan Center~~~~
528154 American Ash Park~~~~
6~~~~601 Business Park Circle

Now to remove the unwanted delimiters – duplicates and ones appearing before or after the address column. A common approach is to use CASE statements to conditionally add the delimiters if the columns to the left and right of the delimiters are not NULL:

SELECT
	uid,
	ISNULL(AddrLine1, '')
	+ CASE WHEN AddrLine1 IS NOT NULL AND AddrLine2 IS NOT NULL THEN '~~' ELSE '' END
	+ ISNULL(AddrLine2, '')
	+ CASE WHEN AddrLine2 IS NOT NULL AND AddrLine3 IS NOT NULL THEN '~~' ELSE '' END
	+ ISNULL(AddrLine3, '')
FROM #Address;

Another step forward. There are no longer delimiters at the beginning or end of the addresses, but the address for uid 3 is missing its delimiter because AddrLine2 is NULL and the delimiter logic only checks adjacent columns for NULLs:

uidAddress
115722 Dapin Place~~Suite 204~~2nd Floor
2224 Debs Circle~~Level 4
39 Gateway TerraceUnit #9
407418 Hagan Center
528154 American Ash Park
6601 Business Park Circle

One solution to this issue is to add additional logic to the case statement for the delimiter between the first two address lines so that it checks both line 2 and line 3 for nullity.

SELECT
	uid,
	ISNULL(AddrLine1, '')
	+ CASE WHEN AddrLine1 IS NOT NULL AND (AddrLine2 IS NOT NULL OR (AddrLine2 IS NULL AND AddrLine3 IS NOT NULL)) THEN '~~' ELSE '' END
	+ ISNULL(AddrLine2, '')
	+ CASE WHEN AddrLine2 IS NOT NULL AND AddrLine3 IS NOT NULL THEN '~~' ELSE '' END
	+ ISNULL(AddrLine3, '')
FROM #Address;

We now have a solution that produces the desired result: delimiters only between non-NULL columns and no delimiters before or after the address. However, the solution is almost worse than the problem. It’s long and difficult to read, it’s redundant – every column name appears multiple times, and the delimiter handling isn’t uniform – the first delimiter has additional logic that the second delimiter doesn’t need.

uidAddress
115722 Dapin Place~~Suite 204~~2nd Floor
2224 Debs Circle~~Level 4
39 Gateway Terrace~~Unit #9
407418 Hagan Center
528154 American Ash Park
6601 Business Park Circle

In the spirit of pithiness, it’s possible to replace the case statements in the previous query with the IIF (immediate if) function, which results in a shorter query, but it isn’t much of an improvement in terms of readability.

SELECT
	uid,
	ISNULL(AddrLine1, '')
	+ IIF(AddrLine1 IS NOT NULL, IIF(AddrLine2 IS NOT NULL OR (AddrLine2 IS NULL AND AddrLine3 IS NOT NULL), '~~', ''), '')
	+ ISNULL(AddrLine2, '')
	+ IIF(AddrLine2 IS NOT NULL AND AddrLine3 IS NOT NULL, '~~', '')
	+ ISNULL(AddrLine3, '')
FROM #Address;

The main problem thus far is that we have been fighting against the NULLs and SQL’s handling of them rather than using them to our advantage. By leveraging the fact that concatenation with NULLs yields NULL, we can embed the delimiters in the ISNULL functions we are already using. If a source column, AddressLine2 for example, may contain a null, we can concatenate the delimiter and the source field all within an ISNULL function call: 

SELECT ISNULL('~~' + AddressLine2, '') FROM #Address;

If AddressLine2 is NULL, concatenating it with the delimiter will yield a null, which the ISNULL function will convert to an empty string, no additional checking required. Applying that concept to all of the columns:

SELECT
	uid,
	ISNULL('~~' + AddrLine1, '')
	+ ISNULL('~~' + AddrLine2, '')
	+ ISNULL('~~' + AddrLine3, '') AS Address
FROM #Address;

We’ve taken a step forward in simplicity and readability, but a step backward in functionality. Every line now has a leading delimiter.

uidAddress
1~~15722 Dapin Place~~Suite 204~~2nd Floor
2~~224 Debs Circle~~Level 4
3~~9 Gateway Terrace~~Unit #9
4~~07418 Hagan Center
5~~28154 American Ash Park
6~~601 Business Park Circle

It’s easy to think that removing the delimiter from the first AddrLine1 column will resolve the issue, and it will if the first column isn’t NULL, but the issue then crops up for the second or subsequent columns – whichever column is the first non-NULL address line column because each column assumes the one prior to it is non-NULL and therefore that a delimiter is needed.

SELECT
	uid,
	ISNULL(AddrLine1, '')
	+ ISNULL('~~' + AddrLine2, '')
	+ ISNULL('~~' + AddrLine3, '') AS Address
FROM #Address;
uidAddress
115722 Dapin Place~~Suite 204~~2nd Floor
2224 Debs Circle~~Level 4
39 Gateway Terrace~~Unit #9
4~~07418 Hagan Center
528154 American Ash Park
6~~601 Business Park Circle

The solution is to use the STUFF function to remove the delimiter from the beginning of the address, knowing that without it, the address will always have a leading delimiter. STUFF takes the concatenated address as its first argument, the starting position of the replacement as its second, the number of characters to replace as its third and an empty string, which will be the replacement character, as its fourth. If every column is NULL, the concatenated string is NULL and STUFF simply yields NULL.

SELECT uid,
	STUFF(ISNULL('~~' + AddrLine1, '')
	+ ISNULL('~~' + AddrLine2, '')
	+ ISNULL('~~' + AddrLine3, ''), 1, 2, '') AS Address
FROM #Address;

We’re now back to the correct behavior with a much simpler, shorter and more readable query.

uidAddress
115722 Dapin Place~~Suite 204~~2nd Floor
2224 Debs Circle~~Level 4
39 Gateway Terrace~~Unit #9
407418 Hagan Center
528154 American Ash Park
6601 Business Park Circle

The CONCAT function, available in SQL Server 2012 and later, provides an even simpler method for concatenating strings. CONCAT takes 2 to 254 string arguments and yields a string, ignoring any NULL arguments. Rewriting the previous query using CONCAT with each column and its delimiter as the arguments, we arrive at the query below, which produces the desired results. The STUFF function is still required to remove the extraneous first delimiter.

SELECT uid,
STUFF(CONCAT( '~~' + AddrLine1, '~~' + AddrLine2, '~~' + AddrLine3 ), 1, 2, '') AS Address
FROM #Address;

Finally, in SQL Server 2017 and later, the CONCAT_WS function takes simplicity a step further. The WS means “With Separator”. The first argument to the function is the desired separator and the second and subsequent arguments are the values to concatenate. CONCAT_WS automatically places the separator between any non-NULL arguments.

SELECT uid,
CONCAT_WS('~~', AddrLine1, AddrLine2, AddrLine3)
FROM #Address;

This function meets all of our objectives – delimiters only between non-NULL values and no delimiters before or after the address. The STUFF function is no longer required.

Finishing Touches

The final step is to replace the tilde placeholders for the delimiters with the carriage return/line feed we would actually like to use. All three solutions appear below, followed by the results, which are the same.

Pre-SQL Server 2012

SELECT uid,
	STUFF(ISNULL(CHAR(13) + CHAR(10) + AddrLine1, '')
	+ ISNULL(CHAR(13) + CHAR(10) + AddrLine2, '')
	+ ISNULL(CHAR(13) + CHAR(10) + AddrLine3, ''), 1, 2, '') AS Address
FROM #Address;

SQL Server 2012 & Later

SELECT uid,
STUFF(CONCAT(CHAR(13) + CHAR(10) + AddrLine1,
			 CHAR(13) + CHAR(10) + AddrLine2,
			 CHAR(13) + CHAR(10) + AddrLine3), 1, 2, ''
) AS Address
FROM #Address;

SQL Server 2017 & Later

SELECT uid,
CONCAT_WS(CHAR(13) + CHAR(10), AddrLine1, AddrLine2, AddrLine3) AS Address
FROM #Address;

Results

uid Address


1 15722 Dapin Place
Suite 204
2nd Floor
2 224 Debs Circle
Level 4
3 9 Gateway Terrace
Unit #9
4 07418 Hagan Center
5 28154 American Ash Park
6 601 Business Park Circle

How pithy.

About the author

By admin

Recent Posts

Recent Comments

Archives

Categories

Meta