I came across a comment on linkedin today about the difference between the actual and estimated execution plan in SQL Server and how it is used to determine performance issues. I wanted to point out that the estimated and actual execution plans are most of the time similar but there are plenty of cases where they will be different at which point using the execution plan from within the DMV doesn’t really make any sense. I have already pointed this out in another blog. But I am mentioning another example here for the sake of the reader.
If you look at the below query you will see there is conditional logic here
When viewing the estimated execution plan you see a generic plan that will work regardless of input parameter.
While the actual execution plan looks completely different based on input criteria.
So basically the estimated execution plan cannot be used to determine the exact events in all cases.
The Second thing is that the plan you view when you run the below query
SELECT q.plan_handle, h.query_plan FROM sys.dm_exec_query_stats q outer apply sys.dm_exec_sql_text(q.sql_handle) g outer apply sys.dm_exec_query_plan(q.plan_handle) h WHERE g.TEXT like '%getdatafromcondition%'
is the estimated execution plan, I understand that some DBA assume this to be the actual plan. If you need the actual plan run profiler of XEs instead.
The third thing I wanted to address was that time is not a criteria when arriving at the best plan for execution. But before that we need to address what best means. Here we refer to the best plan for now not the best possible plan. The best plan for now is determined keeping in mind a number of factors which arrive at a relative number we call cost. This number is not absolute and cannot be used to compare cost across plans either. The “cost” however is a multifaceted component which is derived by looking at a number of factors such as number of rows being fetched the type of algorithm being used , the current system overhead etc. But Time for the query to complete execution is not a factor here.
Think of it this way. I want to travel from Bangalore to Mysore (about 178 Km away). Could assume that I average about 45Km per hour so I should take 4 hours to reach there. But the fact is it doesn’t work like that. E.g. I won’t know how much I average until I actually make the drive. But we also reuse the first plan so we get into a chicken an egg situation where you won’t know how long the query will take until you actually run it and as a result can’t estimate for it.
The second thing is like the road my average speed is determined by a number of factors such as my car (system configuration), the traffic rules (resource governor) the actual traffic (system load), the weather (locking, blocking etc.) and while some of these like resource governor are fixed and can be planned for the others aren’t.
So at arrive at an execution plan which depends on a factor that can vary wildly from time to time will often result in plans that are less effective most of the time. Having said this there is a time component that is present and that is the time taken to do the planning. In this case the time taken to plan a trip to Mysore could take just as long as the trip itself. The optimizer decides to figure out the best possible plan keeping cost and the time taken to arrive at such a plan( not the time taken for the plan to actually execute). I hope this helps clarify some of the concepts.
The other bog