Monthly Archives: April 2018

Snapshot isolation level when database is set to read only.

Recently there was a question posted on #sqlhelp about why a database switched to Snapshot isolation level when you make it read only and then reverts back to read committed mode when its switched back to read write.

You can use the below script to recreate the behavior being described

IF exists (SELECT 1 FROM sys.databases WHERE name ='example1')
DROP  DATABASE example1
go 
CREATE DATABASE example1
go 
ALTER DATABASE example1
SET recovery FULL
go 
SELECT  name  ,  snapshot_isolation_state_desc, recovery_model_desc,is_read_only FROM sys.databases
WHERE name  =  'example1' 
go   
ALTER DATABASE example1
SET read_only 
go
SELECT  name  ,  snapshot_isolation_state_desc,recovery_model_desc,is_read_only FROM sys.databases
WHERE name  =  'example1' 
go
ALTER DATABASE example1
SET read_write
go
SELECT  name  ,  snapshot_isolation_state_desc,recovery_model_desc,is_read_only FROM sys.databases
WHERE name  =  'example1' 
go
ALTER DATABASE example1
SET recovery simple
go
ALTER DATABASE example1
SET read_only 
go
SELECT  name  ,  snapshot_isolation_state_desc,recovery_model_desc,is_read_only FROM sys.databases
WHERE name  =  'example1' 
go
ALTER DATABASE example1
SET read_write
go
SELECT  name  ,  snapshot_isolation_state_desc,recovery_model_desc,is_read_only FROM sys.databases
WHERE name  =  'example1' 
go
 
SELECT  * FROM sys.fn_dblog(NULL,NULL)

This issue can be replicated on other versions of SQL Server as well and we can see it doesn’t really have to do with the t log management part of it. I did notice that this behavior didn’t seem to occur for a database in which I have implemented query store.

So why is this question important?

If the database is in read_only mode why does it change to snapshot isolation level? What is the advantage? For this we need to understand what Snapshot isolation level does. When a database has snapshot isolation level enabled the database engine starts implementing row versioning and moves pages that need to be updated to the version store in tempdb. But for a database that is read only there will be no DML so there won’t be any row versions created. In other words Enabling snapshot isolation level doesn’t really cause any row version to be generated thus making the behavior of row versioning defunct.

So clearly we don’t get any advantaged due to that behavior so maybe it has to do it the fewer number of locks that are required.

To simulate the locking behavior try running the below script using the different modes read_write and Read_only.

USE example1
 
	BEGIN TRAN 
 
	DECLARE @counter INT =0
	WHILE @counter < =1000
	BEGIN
	SELECT  * FROM abcd 
	WHERE id =@counter
	SET @counter=@counter+1 
	END 
	COMMIT TRAN

While the above query is executing run the below query in another window.

SELECT  DB_NAME(resource_database_id) , * FROM sys.dm_tran_locks

The below screenshots show the locks acquired in read_write and read_only mode.

Read_Write – As expected we acquire page level locks here to ensure ACID


Read_only- Here we see we still acquire a HOBT BULK Operation lock which can be avoided by putting a clustered index on the table as shown in the next step.


Read_only with clustered index – Notice we no longer acquire any locks on individual pages on HOBT allocations like partitions etc.

Naturally this makes lock management a lot easier. So there is an advantage to having the read only database running in snapshot isolation level.

Cannot use row granularity hint on the table

Got approached with the below error message recently and thought it might be an interesting problem to look into.

ProgrammingError((\’42000\’, 651, \'[Microsoft][SQL Server Native Client 11.0][SQL Server]Cannot use the ROW granularity hint on the table “XXXXXX” because locking at the specified granularity is inhibited.\’, 11159) 

The client faces this issue when they implement a clustered Columnstore index. So first let’s try to recreate the problem and then we can explore the reasons for it. To start off I am going to create a table with the Clustered Columnstore index only.

Once the Clustered Columnstore index is created we apply the below query against the table and we see the error message as shown. The error message clearly indicates it’s because on a hint being applied to the table. But is this the only way the error can appear?

What if we replaced the clustered Columnstore index with a non-clustered Columnstore index against all columns?

As you can see the issue still persists because we ask the optimizer to acquire row level locks. So what happens when we remove the hint?

Notice that without the hint the query executes fine and the results show that Columnstore indexes are stored in LOB pages not traditional IN row data pages. As a result there is not possibility for the optimizer to acquire specific row level locks within the LOB pages especially considering the compressed format the data is stored within these pages.

So what is the solution? Simply remove the query hint and execute the query as needed. Add traditional non clustered indexes as needed to acquire seek behavior on specific rows on a query by query basis.

#sqlhelp | should you remove indexes from tables before ETL?

Starting today we are introducing a new type of blog content which covers questions from #sqlhelp. We pick random questions from the feed and try to cover them in detail, while trying to explore other options or scenarios that might not be so obvious.

The Questions is should you remove the index before doing ETL? Some points to keep in mind here are:-

  • There is a PK which might indicate a clustered index.
  • They are doing ETL so we can reasonably expect at least 100K rows per batch.
  • There is no mention of Bulk insert or the way the ETL is being performed.
  • Would truncating the table and loading the data affect the speed of the ETL compared to say delete and load?
  • Does the index need to be rebuilt.

Because we don’t know the type of tool being used to perform ETL we will simply explore the speed of the operation wrt. DB here. To start off I have a table called OntimeEtl which contains 509519 rows. We will move the data from this table in the Madworks database to another table in the ETL database under different scenarios to measure the load times.

The ETL database is in Full recovery model and has an initial size of 500MB so that auto growth etc. doesn’t impact the load times. First we load into a heap table with no index and only an identity column added which can be used later for PK tests. Below are the results of the test. As you can see regardless of the nature of the underlying Disk a non-indexed scenario always performs faster than a table with an index. More interestingly we see that using a NON Clustered Unique Index is a better choice than using the default Clustered index that gets created with a PK. The below table is ordered best to worst from left to right.

Run

SSD No Index

SSD NCIX

SSD Clus Index

HDD No Index

HDD Clus Index

1

9586

10580

13334

14730

22717

2

8937

11150

15013

15220

20793

3

9700

11107

13060

15090

20560

4

9923

10417

12244

15253

19613

5

8844

10466

13843

16390

23610

 

So now that it’s clear we should drop the index to improve the ETL performance the question of rebuild doesn’t arise. But let’s assume we decided to keep the index anyway, the next question would be should we rebuild the index?

Below are the index physical stats for the clustered index when we use the recordid column as the PK and it’s of data type int and the input data is an identity column.

Below is the index stats for when the recordid column data type is changed to varchar(10)

In either case we see that after we insert data into the table the clustered index isn’t really fragmented. This is true even if the incoming data is arriving in random order. We can see that an ordered column value like identify column results in more pages per fragment which is generally better for scans. Whereas if we store the data in varchar (10) as shown below

Fewer pages are stored per fragment as a result of page splits. Which is evident from the slightly higher number of pages in the second scenario where we store the data as text. So now it’s clear that even if we did leave the index in place we don’t need to rebuild them after the ETL. Note that rebuilding the clustered index will result in rebuild of non-clustered indexes as well.

So to summarize:-Drop the index of the table when performing ETL operations this will result in the table being loaded faster and then by adding the index later you will have an index with fewer page splits and lesser fragmentation.

 

 

    

The-certificate-chain-was-issued-by-an-authority-that-is-not-trusted-when-conn -FIXED

Interesting little problem came up today when migrating a database from Azure SQL database to local server. The connections sting in the web app had the following property set

Encrypt=True;

TrustServerCertificate=False
					

 

The combination of the two keywords ensures that data transfer from Website to Database server is encrypted using a certificate generated by the user. Essentially this makes sure that the data cannot be read during data transfer using the Azure network. However when we migrated to the local SQL Server database we didn’t have a certificate created so using the connection string as it was ( when the app was using Azure SQL database results in error “the-certificate-chain-was-issued-by-an-authority-that-is-not-trusted-when-conn”

The solution here is to create a user defined certificate and use that when connection to SQL Server but in this case we were working on a Dev environment so didn’t need a certificate issued by a trusted authority. To rectify the issue you simply need to change TrustServerCertificatete = True which will then allow the connection to use self-signed certificates instead. Through this approach we still implement security (instead of Setting Encyprt=False”) but are not constrained by the need to purchase a trusted certificate.

References

https://docs.microsoft.com/en-us/sql/relational-databases/native-client/applications/using-connection-string-keywords-with-sql-server-native-client

https://stackoverflow.com/questions/17615260/the-certificate-chain-was-issued-by-an-authority-that-is-not-trusted-when-conn

Does god exist? A BI guy’s dilemma

You must be wondering what the heck this means? To give you some context I was recently asked if I believed in God. The answer is:-

As a kid yes because my parents were god fearing and I copied them when it came to concepts I didn’t have a clear understanding of.

As a teenager I couldn’t care about religion because I figured it doesn’t really affect my day to day life

As a young adult I was confident that God is simply something Humans invented in order to have a civilized or easy to control society and that science can explain everything so God couldn’t possibly exist.

As a middle aged man now I am confused. If God exists then he/she/they/it must serve a purpose. If the purpose is to give hope then that is sufficient reason to believe in one. If the purpose is to govern the universe then I don’t buy it because we already know the rules that affect the universe so it doesn’t require supervision (therefore making God redundant) or to reward/punish good and bad deeds.

Either way I am not starting a philosophical discussion here. The focus of this blog is purely technical. The good books suggest a model called creationism where the universe came into existence out of blue. Now we know the universe came into existence after the Big Bang. I am not going to quote the books here because if you’re reading this I would like you to do your own theological and Quantum physical research (trust me it will open your eyes). Now the Big Bang model obviously explains everything after the Bang except the details of the bang itself. But this too is something that we have already explored in detail and is often cited in the heated debate about Big Bang vs Creationism ( God created the big bang and everything after that is purely physics) the same goes for Black holes. So then what is my problem? The source of my confusion is how close the religions got it. Imagine this if you asked a man in Damascus 10000 years ago to predict how humans would communicate today do you really think his answer would resemble anything like Cell Phones we have today? This amount of precision cannot be a coincidence. In BI we don’t deal with coincidence. Everything is based on probabilities of events occurring. So for ancient humans to pick a theory that closely resembles the truth requires them choosing it because it had the highest probability. So how could they have done that without any of the tools and technologies we have today to study the universe? They could just have easily assumed evolution based on the information available/unavailable to them, or aliens, or that they themselves are Gods or any other number of theories all of which would have been equally valid given what choices they had available then.

Religions also suggest a single family theory where Adam/Eve or Manu/Satarupa. This theory is also often debased during scientific discussion because just like Quantum Theory explains everything physical, Evolution explains everything biological. However mitochondrial DNA studies which are used to support the Out of Africa Origin theory suggests that we all came from somewhere very close to each other. Notice I didn’t say the same Mother. All women share a Common Genetic inheritance called the mt-MRCA which is passed on from mother to daughter. While this doesn’t mean that all women came from the same mother studies have accounted for mutations that occur over time and concluded that the timeframe fits and we all originated from Africa and spread out from there. ** There is a competing theory that suggests humans evolved independently in different parts of the world and interbred. But I am not in favor of it because in BI we abhor Coincidence, for humans to evolve in relatively close time frame but geographically distant places with the same physiology and socio cultural attributes reeks of chance. Besides it goes against everything else we know about history. If the Neanderthal was made extinct by Homo sapiens why wouldn’t we repeat the same to each other based on how different we see ourselves from them? Mayans and Spain, Americans and Native Americans, Australia and the Aborigines are all examples of how human beings from the same species couldn’t adapt to biological and technological differences between them. But I digress the fact is out of all the probable explanations for how we came about how did we identify and standardize this one. Arguably this one is a bit easier to argue since we simple look at our parents and assume they came from theirs and so on and so forth. So while humans like the universe didn’t appear out of the Blue we still have a single origin which is one of the tenants of religion.

There are other examples which I have encountered but I am sticking to the above two because they are most relatable to the general public. The fact is science has been proving religion in some cases and disproving religion is many others. But what does that mean? Well its simple really, make a hundred predications and you are bound to get a few right so maybe that’s just it!

In BI we assume the validity of our model to be correct if it beats random chance with greater accuracy after every iteration. So with every iteration we should diverge further from our original assumption based on new data but what we see here is that after the iterations we keep tending to the original model. And that doesn’t make sense.