Monthly Archives: June 2015

MS SQL Server and Tic Toc and why I changed my mind that it’s a good idea

I am not sure MS has got it right with the release cycle for SQL server. While this debate has happened before (I was then excited that there will be newer versions of SQL available sooner) now I am older and wiser. The reason I don’t think this approach will work in the long run is because I made the below comparison.

If you have worked in databases long enough you know data is money. Where do we store money? In Banks and Bank Vaults. Now imagine you start a Bank and buy the latest and greatest version of the safe from the provider. The safe is such a big and important investment since it ties into almost all aspects of the bank such as the architecture of the building, the processes followed by employees and the overall quality of the bank. After all this effort imagine if after 2 years the provider come to you with a newer and better version of the Safe. Now naturally most people would not bother to upgrade as long as the safe is complete and does everything the bank wants from it. But what if the safe had a few minor bugs and defects that while not mission critical still prevents you from fully utilizing the safe to its fullest potential. Say for example the safe has a feature that uses facial recognition but can be easily fooled using a photo. So rather than use the feature we stick the number combination. This is what happened with SQL 2012. Columnstore indexes, AlwaysOn while good hadn’t matured enough. They matured in SQL 2014 but by then it’s too late.

This is what’s happened as far as I can see. So many client upgraded to 2012 that there doesn’t seem to be a market for 2014? But does 2012 do everything the client wants, no not really. I guess what I am saying is there needs to more time in between release to allow dev teams at MS to Mature and work out all the bugs in the new features. Else SQL upgrades will be a weird game of catchup and most client are not going to play that game long before switching.

Another reason that prompted me to make this comparison is because I did some BI on the type of trainings/ Consulting projects my clients asked and for most part till SQL 2012 the trend has been to upskill employees with the latest and greatest even if the company still used 2005 mainly because they wanted to be able to say to clients “our employees are up to date with the latest technologies” but with SQL 2014 they still haven’t gotten over the hang over from the SQL 2012 training to invest in 2014 and most likely they realize there is no point in upskilling to 2014 coz 2016 is just round the corner.

While I hope I am wrong, I get a bad feeling that the release cycle is going to spell trouble for MS SQL Server as a dependable stable platform that makes commercial sense.

A classic case of ” too soon of a good thing”

NULL Management, SPARSE Columns, Vertical partitioning and a query

When discussing DW design I often take a lot of time to emphasize the impact Nulls have on query performance, aggregations, design, storage and much more. While this is not new to most seasoned DW developers it is often overlooked during the design phase and added at a later point in time mostly after testing etc. In the interest of putting it out there I figured I will post the topic here as well as a script I created few years back to identify suitable columns in the table.

http://enabledbusinesssolutions.com/blog/which-is-better-count-or-count-id/

Here is a post on how nulls affect COUNT behavior and performance, the default measure added for every measure group is count so I guess this is important.

Now NULLS Occupy space and as a result a wide table that has a large number of columns which are mostly NULL should ideally be carved out into its own table with sparse columns which is what we call vertical partitioning. In fact I have often seen implementations where the wide table is converted into key value pairs so as to avoid having as many nulls in the table.

ID Name AGE PhNumber SSN

1 Jayanth 4 123-456-7890 NULL

2 James 5 987-064-1234 123456

ID FName FValue

1 Name Jayanth

1 PhNumber 123-456-7890

1 Age 4

2 Name James

2 Age 5

2 PhNumber 987-064-1234

2 SSN 123456

While the second approach does in fact eliminate nulls, it also significantly adds to the number of rows while at the same time requiring complex pivoting of data to bring it back to a record set format that users can understand. I.e. the second approach works fine for OLTP databases where all records of an individual might need to be fetched but doesn’t lend itself to aggregation which is what the DW would require.

The query below identifies nullable columns within a table and then check if there are actual null values in those nullable columns. It then either generates a view or actually creates a view with only those columns which are having values for all rows.

USE [madworks]
GO
/****** Object: StoredProcedure [dbo].[CreateNotNullViews] Script Date: 6/23/2015 1:35:32 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

CREATE PROCEDURE [dbo].[CreateNotNullViews]
(@Schema_name NVARCHAR(256),
@table_name NVARCHAR(256),
@Defonly bit
)
AS
BEGIN
/***********************************************************
Author name :- Jayanth Kurup
Created date :- 201204012
Purpose:- Creates view definition or view with all columns in the table
that have not null values .
@defonyl paramter accepts values of 1 or 0
1 = schema definition (it wont create the actual view)
0 = it will create the view.

***********************************************************/

SET NOCOUNT ON
;
WITH CTE AS (
SELECT * FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = @TABLE_NAME
AND TABLE_SCHEMA =@SCHEMA_NAME
)

SELECT ROW_NUMBER() OVER( ORDER BY c.COLUMN_NAME) AS Colids , c.COLUMN_NAME , d.texts , null AS 'NullColumn'
INTO #tempcolstore
FROM CTE c
CROSS APPLY (SELECT ' IF EXISTS (SELECT 1 FROM '+@schema_name +'.'+@table_name +' WHERE '+c.COLUMN_NAME+' IS NULL)
BEGIN
SELECT '+CHAR(39)+c.COLUMN_NAME+char(39)+' AS NULLCOLUMN
END' as texts)d
WHERE IS_NULLABLE = 'YES'

---- THE ABOVE CODE TELLS THE LOGIC TO IGNORE NULLABLE COLUMNS

DECLARE @colid INT
SET @colid = 1
DECLARE @NullLIST TABLE (Columnname nvarchar(256))
WHILE @colid < = (select MAX(Colids) from #tempcolstore) BEGIN DECLARE @sql NVARCHAR(MAX) SET @SQL = ( SELECT TEXTS FROM #TEMPCOLSTORE WHERE COLIDS =@COLID) --PRINT @SQL INSERT INTO @NULLLIST EXEC SP_EXECUTESQL @SQL SET @colid = @colid +1 END ---- THE ABOVE CODE PERFORMS THE NULL EXISTANCE CHECK SET @SQL = (select 'CREATE VIEW '+@Schema_name+'.vw_'+@table_name +' AS SELECT ')+ (select distinct Replace((select distinct u2.COLUMN_NAME + ',' as 'data()' from INFORMATION_SCHEMA.COLUMNS u2 WHERE TABLE_NAME = @TABLE_NAME AND TABLE_SCHEMA = @SCHEMA_NAME AND COLUMN_NAME NOT IN (SELECT COLUMNNAME FROM @NULLLIST ) FOR XML PATH('')) + '$', ',$', '') as Roles from INFORMATION_SCHEMA.COLUMNS u1 WHERE TABLE_NAME = @TABLE_NAME AND TABLE_SCHEMA = @SCHEMA_NAME AND COLUMN_NAME NOT IN (SELECT COLUMNNAME FROM @NULLLIST ))+ ' FROM '+@schema_name+'.'+@table_name --- THE ABOVE CODE CREATES THE VIEW DEFINITION IF @DEFONLY =1 BEGIN Select @SQL as ViewOutput END ELSE BEGIN EXEC SP_EXECUTESQL @SQL PRINT 'SUCCESS :- View created' END --- PERFORMS ACTION AS PER USER INPUT. DROP TABLE #TEMPCOLSTORE ---- CLEAN UP END

If the column is nullable but doesn't have any null values then the script will include that column as part of the view definition say for example the Phnumber column in the above table, if the column is not nullable it will always appear in the view definition say for example the Name column in the same table and if the column is nullable and has at least 1 null value in the column it will be ignored from the view definition like the SSN column.

Therefore the query will create the table structure for the parent table in vertical partitioning. Why bother with all this, because it's better to implement proper design in retrospect than to not do it at all; a case in point is horizontal partitioning which almost never gets implement proactively.

Auto Update Statistics when does it occur

I was recently asked when does the actual Auto Update Stats trigger and while I know its during query compile I wasn’t exactly sure what the thresholds were so I did some RnD, the results are anyway well documented in the link in the references but I have still posted the results here because I went through the trouble of RnD before googling.

The First thing to keep in mind is STATISTICS are not created unless this option is set in the database

The Second thing to keep in mind is that even then the statistics still won’t be created unless a query is executed and actual rows are inserted into the table.

The Third thing to keep in mind is you can find out the time the Statistics were last updated by either this query

 

select

*

from

sys.dm_db_stats_properties

(object_id,

stat_id) available in 2012 SP1 and Above

 

OR

 

SELECT name AS stats_name,

 

STATS_DATE(object_id, stats_id) AS statistics_update_date

 

FROM sys.stats

 

WHERE object_id = OBJECT_ID(‘Person.Address’);

 

OR

 

Statistics properties

 

OR

 

DBCC SHOW_STATISTICS

 

The Fourth thing to keep in mind is that Stats get updated when the table structure changes or roughly around 500 rows are inserted into the table , the width of the row has no role in stats update.

The Fifth thing to remember is that statistics are updated Asynchronously when the below option is set

This means that the current query will continue to execute with stale statistics but another thread is launched which will in parallel update the stats so that the next query gets the benefit.

If you create an index on a single column it is good to drop the old auto created statistics since the two might be out of sync and there is no guarantee which one SQL optimizer will use.

Added after linked in chat mentioned below :-The larger the table the longer it will take for Auto stats to update since futuer updates are governed by the threshold of 100+ 20% of row count. Having said that, this logic doesn’t apply to filtered stats where the selectivity of the column dictates the threshold.

Also , it might be worth while to lookup trace flag 2371 in SQL server 2008 R2 SP1 .

Reference

http://www.mssqltips.com/sqlservertip/2766/sql-server-auto-update-and-auto-create-statistics-options/

A very good chat we had on linkedin is mentioned below

https://www.linkedin.com/grp/post/66097-6017174523202330628#commentID_discussion%3A6017174523202330628%3Agroup%3A66097

TRACE FLAG 1118 – Proportional Filling and the tempdb

Often when conducting trainings I ask my participants not to take my word for it but try things out for themselves. I was asked a question about the TRACE FLAG 1118 during a recent training for SQL 2012 and wanted to bring attention to the discussion here. The participants had read in a blog that the use of TRACE FLAG 1118 would ensure that all the files in the file group grew simultaneously when using this flag even when the Auto growth settings were different. I argued that the approach to ensure that this could be done was make sure the Autogrowth settings were the same for each file therefore eliminating the need for the trace flag and leveraging proportional filling in fact I went further to suggest that tempdb file sizes should be pre-allocated to reduce the impact of growth. Here is the proof.

This test was run on SQL 2012.

BEFORE THE TEST

WITH TRACE FLAG OFF


RUNNING THE QUERY AFTER SHRINKING THE DATABASE BUT WITH TRACE FLAG ON THIS TIME.


So my advice is to try an avoid trace flags when there are better options available. Mainly because every trace flag enabled is just one more deviation from a standard configuration. Besides using this flag is meant for the benefit it gives in another aspect for page allocations more than the need to grow files together. Here is a very good blog from Paul Randall discussing the internals.

http://www.sqlskills.com/blogs/paul/misconceptions-around-tf-1118/