Monthly Archives: May 2016

Improving the performance of functions

Recently I had to employ functions to perform a kind of name resolution on a column within the database. The database had a user table with usr_name and usr_id and then all other FK tables used usr_id . Within the procedures we could naturally do joins to fetch the username but there were many other tables similar to Users table. So we would end up having to do many different joins every time on each procedure where additional details were required.

The point here is that we had to use functions since it made a lot of sense, instead of doing joins with 8 different tables to fetch 8 different types of names. However this also posed the problem of having to work with code that was inherently inefficient. I have always asked developers to avoid using functions where possible to help improve performance but this time we needed ease of coding along with decent performance. As far as options go we could use scalar values functions or table values functions. The very nature of these two types of functions is different. As its name suggest a scalar function returns only one value, i.e. one input one output usually this works for performing calculations like service tax etc. Table values functions on the other returns a table. Best used for things like fetching a list of addresses when a zip code is passed or something like that. Naturally in my scenario in terms of pure logic of operations a scalar function is what is needed. It will accept a usr_id and return the fullname of the user but scalar functions are extremely slow and gets worse as the number of rows in the base table increases. Here is the performance comparison for the same.

SELECT *
FROM fk_table f
LEFT outer join pk_users  u
ON u.usr_id= f.usr_id

 

The above numbers are after three runs with clustered index on tranid and usr_id.

Now we create a scalar function to fetch the data.

SELECT dbo.fn_getusername(f.usr_id) , f.*
FROM fk_table f

 


As you can see from the above screenshot scalar functions to bring a significant performance overhead , so we are now going to replace the same logic with a inline table valued function.

SELECT *
FROM fk_table u
OUTER APPLY  fn_tblusername(u.usr_id) f

So as you can see from the above screenshots replacing with the table valued function gave me comparable performance as using an inline query without the row by row overhead of a scalar function. Which ultimately serves the purpose of being easy to reuse my code as well as get an acceptable level of performance.

 

Additional references

/****** CREATE THE REQUIRED TABLES ******/
CREATE TABLE pk_users
( usr_id INT  IDENTITY(1,1),
usr_name VARCHAR(100))
GO
CREATE TABLE fk_table
( tranid INT ,  
usr_id INT , 
somedate  DATETIME,
sometext VARCHAR(100)
)
GO
/****** INSERT TEST DATA ******/
INSERT INTO pk_users
SELECT CAST(newid() AS VARCHAR(100))
GO 1000
 
INSERT INTO fk_table
SELECT  ROUND(RAND()*10000,0) , ROUND(RAND()*10000,0) , GETDATE() , REPLICATE('a',100)
 
GO 10000
 
/****** CAPTURE STATISTICS ******/
SET STATISTICS TIME ON
SET STATISTICS IO ON
 
SELECT *
FROM fk_table f
LEFT outer join pk_users  u
ON u.usr_id= f.usr_id
 
GO
/****** CREATE SCALAR FUNCTIONS ******/
 
CREATE FUNCTION fn_getusername 
(@usr_id INT
)
RETURNS VARCHAR(100)
AS
BEGIN
	DECLARE @usr_name VARCHAR(100)
	SELECT @usr_name = usr_name FROM pk_users WHERE usr_id = @usr_id
	RETURN @usr_name
END
 
GO
SELECT dbo.fn_getusername(f.usr_id) , f.*
FROM fk_table f
 
GO 
/****** CREATE TABLE VALUED FUNCTIONS ******/
CREATE FUNCTION fn_tblusername 
(	
	@usr_id INT 
)
RETURNS TABLE 
AS
RETURN 
(
	SELECT usr_name  FROM pk_users
	WHERE usr_id =@usr_id
)
GO
SELECT *
FROM fk_table u
OUTER APPLY  fn_tblusername(u.usr_id) f

An Overview of the tools SQL Server database professionals use

Recently I was asked what tools database professionals use and I wanted to clarify a little bit on the topic so here goes. Previously we used to have a number of different tools when working with the database. If you look at just SQL 2000 we had Query Analyzer and Enterprise manager one for the Developer and another for the DB. That evolved into SSMS which combined everything and put all the stuff the developer and DBA needed into a single customized version of Visual Studio called SSMS or SQL Server management studio. However because BI tools were still evolving at that time we ended up with another UI just for BI Projects called BIDS (Business Intelligence Development studio). This worked well initially since teams were broken up into DBA-DEVELOPER, MSBI DEVELOPER and .NET DEVELOPER. So we ended up with three different job roles and therefore three different tools. .NET Developers continued to code in C#, Vb.NET and the occasional database projects. The DBA had everything he needed in SSMS such as PBM, CMS, SQL Agent etc. and the MSBI had SSRS, SSIS and SSAS in BIDS. But SSMS and BIDS are actually just customized versions of Visual Studio so it was inevitable at some point they would merge into a single framework where all coding can happen be it a web page , stored procedure or a SSIS package. Hence we arrive at the latest version which is SSMS for the DBA, SSDT for the MSBI Developer and Visual Studio for the database developer.

While these are still three different products they are at their core the same thing. A UI which contains a common toolset for what most programmers need along with very specific projects types that can be imported as needed. The challenge at this point is MS releases new versions of Visual studio frequently and along with that different editions. A project that’s created in one version needs to be upgraded before it can be opened in a new version. Often the upgrade is irreversible and now you maintain two different versions of the same code. One that opens fine in BIDS and another that’s been upgraded to open in VS 2012. At some point the hope is these teething troubles will be overcome and projects will open seamlessly between different types and we will achieve better integration of code across the different projects but until then we all we can do is wait and watch.

Here is another diagram just from the tools perspective