I did a lot of programming in the beginning of my carrier and always preferred encapsulating code wherever possible. For instance, whenever there is a requirement to write the SQL query to concatenate string values by a separator within a group, I’ve always favored user defined function.
Though user defined function works fine in this kind of requirement, it sometimes has a negative impact on performance. Thankfully, SQL Server 2016 and 2017 came up with several new T-SQL functions.
With the previous versions of SQL Server, developers had to write lengthy and complex logic for the string manipulations. This article illustrates the basic functionality of new T-SQL function STRING_AGG and demonstrates how this new function can be used to concatenate the string values and get better performance over user defined functions or any other alternatives.
The numeric aggregate functions like SUM, AVG, COUNT, MIN, MAX have been in use since ANSI SQL came in. Like the numeric aggregate functions, STRING_AGG function concatenates string values and places separators between them. It also allows you to organize the data within a group.
insert dbo.category(category_group_code, category_desc)select 1,’Travel’
insert dbo.category(category_group_code, category_desc)select 1,’Meals’
insert dbo.category(category_group_code, category_desc)select 1,’Lodging’
insert dbo.category(category_group_code, category_desc)select 2,’Entertainment’
insert dbo.category(category_group_code, category_desc)select 2,’Clothing’
insert dbo.category(category_group_code, category_desc)select 2,’Electronics’
insert dbo.category(category_group_code, category_desc)select 2,’Communication’
insert dbo.category(category_group_code, category_desc)select 2,’Groceries’
insert dbo.category(category_group_code, category_desc)select 2,’Home Supplies’
insert dbo.category(category_group_code, category_desc)select 2,’Misc’
select category_group_code as [Group],STRING_AGG(category_desc,’, ‘)withingroup (orderby category_desc) Categories
Let’s see how this function can be used to improve the query performance. I’m going to demonstrate examples of using this function over two old techniques to show the performance differences.
The following T-SQL script will create and populate two tables, Employee and Department.
CreateIndex IX_DeptId on Employee(DeptId)
DECLARE @cnt INT
SET @cnt = 1
DECLARE @rnd FLOAT, @EmpName varchar(50), @DeptName varchar(50)
WHILE @cnt <= 1000000
select @rnd =ROUND((RAND()* 10000),2)
select @EmpName =’Emp’+trim(str(@cnt))
INSERTINTO dbo.Employee(EmpId, EmpName, DeptId)VALUES (@cnt, @EmpName, @rnd)
if @cnt <= 10000
select @DeptName =’Department’+trim(str(@cnt))
insertinto Department(DeptId, DeptName)values (@cnt, @DeptName)
SET @cnt = @cnt + 1
Let’s write a query to get all employee names separated by commas for each department.
For this kind of requirement, the usual approach is to write a user defined function like below to get all employees separated by commas for a department. I’ve passed DeptId as a parameter in UDF and the query in the UDF concatenates all employee names separated by commas for that department. Here, COALESCE function has been used to concatenate employee names.
|— create user defined function
createoralterfunction dbo.udf_comma_delimited(@DeptId int)
declare @return_values varchar(8000)
SELECT @return_values =COALESCE(@return_values +’, ‘,”)+ EmpName
WHERE DeptId = @DeptId
dbcc dropcleanbuffers withno_infomsgs;
— Passing DeptId in user defined function to get all employee names separated by comma for the DeptId
select DeptId, dbo.udf_comma_delimited(DeptId) Employees
SQL Server Execution Times:
CPU time = 4468 ms, elapsed time = 4850 ms.
The above query with the user defined function takes 5 seconds. Several times, we see that user defined function is the culprit and could be the reason for the slowness of the query. To eliminate the user defined function from the query, let’s try another popular approach of using FOR XML query to concatenate the string with a separator.
(selectstuff((select’, ‘+ EmpName as [text()]
where DeptId = d.DeptId
from Department d
SQL Server Execution Times:
CPU time = 4984 ms, elapsed time = 5341 ms.
The time taken by both the queries (using UDF or FOR XML) is around five seconds.
Let’s try the STRING_AGG function now.
|select DeptId,string_agg(EmpName,’, ‘)withingroup (orderby EmpName) Employees
SQL Server Execution Times:
CPU time = 4375 ms, elapsed time = 1030 ms.
The query with the STRING_AGG takes 1 second compared to 5 seconds taken by other two approaches.
In the three different scenarios tested, STRING_AGG function performed significantly better than the two other older methods. Thus, it is advisable to replace old techniques of user defined function or FOR XML method with the new STRING_AGG function.
Want to learn more performance tuning techniques? Download the slide from the “Optimize SQL Server: Query Tuning Techniques” presentation.
Subscribe to Our Blog
Never miss a post! Stay up to date with the latest database, application and analytics tips and news. Delivered in a handy bi-weekly update straight to your inbox. You can unsubscribe at any time.
Most people will encounter this error when their application tries to connect to an Oracle database service, but it can also be raised by one database instance trying to connect to another database service via a database link.
Which RAID should you use with SQL Server? Learn the differences between RAID 0, RAID 1, RAID 5, and RAID 10, along with best practices.