Today I want to address two very common issues when it comes to query plan tuning or query optimizing. At first I will have a look on implicit converts and the consequences. The second topic is all about the query optimizer and how quick one can have bad execution plans. For both issues I will five advice what to do in such a situation.
Whenever you tell SQL Server to compare or evaluate two expressions that have different incomparable data types the SQL Server will adjust one of the expressions to a fitting data type so it can evaluate the expression. A very basic example is comparing integer with float numbers. Most likely the integer number will get transferred into a floating number to then compare it with the other float number.
But in SQL this can have fatal consequences. At first the optimizer does not know how a transformed column is distributed. The underlying statistics only apply for the original data type. So if the column gets converted due to be comparable the optimizer has no information how the compared value is actually distributed within the column. This problem is called CardinalityEstimate.
Another consequence is that the optimizer also cannot use an index anymore (for seeking). So it will do a scan because the B-tree is built with the original data type. It is just incompatible now. This problem is called SeekPlan. You can find the information in the query plan.
Have a look at that easy example. Keep in mind that the Column NationalIDNumber is of data type varchar.
Query 1: SELECT BusinessEntityID, NationalIDNumber, LoginID FROM HumanResources.Employee WHERE NationalIDNumber = 112457891
Query 2: SELECT BusinessEntityID, NationalIDNumber, LoginID FROM HumanResources.Employee WHERE NationalIDNumber = '112457891'
When you have a look at the query plans you can see the situation I described. Because in the first query the constant 112457891 is changed to varchar, it is not possible for the optimizer to use the index on NationalIDNumber.
Bad Query:
Good Query:
Also you can see that the second plan using an index seek is faster (relative to the batch)
Check your plan for Warnings (yellow exclamation mark). Try to resolve the issues. If data type mismatch is in your query as constant, just change the constants. If mismatch occurs because data types mismatches in tables consider a persisted calculated column that converts the value in the correct format. Then statistics can be calculated on that persisted column.
The second issue I want to address is also quite common but still very unknown. What do you think how many tables have to be involved in a join / subquery scenario to confuse the query optimizer? What is the threshold where the optimizer starts struggling and not returning always the perfect plan?
The answer is quite complex but let me just show you this example. If three tables are involved in a query via join, the query optimizer has 3! = 1*2*3 = 6 possibilities for different join order. With ten tables you already have 10! = 3628800 possibilities.
The optimizer cannot have a look at all these tables.
SELECT Sales.SalesTerritory.TerritoryID, Sales.SalesTerritoryHistory.BusinessEntityID, Sales.SalesReason.SalesReasonID FROM Sales.SalesOrderDetail INNER JOIN Sales.SalesOrderHeader ON Sales.SalesOrderDetail.SalesOrderID = Sales.SalesOrderHeader.SalesOrderID INNER JOIN Sales.SalesOrderHeaderSalesReason ON Sales.SalesOrderHeader.SalesOrderID = Sales.SalesOrderHeaderSalesReason.SalesOrderID INNER JOIN Sales.SalesPerson ON Sales.SalesOrderHeader.SalesPersonID = Sales.SalesPerson.BusinessEntityID INNER JOIN Sales.SalesPersonQuotaHistory ON Sales.SalesPerson.BusinessEntityID = Sales.SalesPersonQuotaHistory.BusinessEntityID INNER JOIN Sales.SalesReason ON Sales.SalesOrderHeaderSalesReason.SalesReasonID = Sales.SalesReason.SalesReasonID INNER JOIN Sales.SalesTerritory ON Sales.SalesOrderHeader.TerritoryID = Sales.SalesTerritory.TerritoryID AND Sales.SalesPerson.TerritoryID = Sales.SalesTerritory.TerritoryID INNER JOIN Sales.SalesTerritoryHistory ON Sales.SalesPerson.BusinessEntityID = Sales.SalesTerritoryHistory.BusinessEntityID AND Sales.SalesTerritory.TerritoryID = Sales.SalesTerritoryHistory.TerritoryID
It is taken from AdventureWorks2012 and basically joins everything from the sales schema.
If you have a look at the query plan you realize this hint:
It tells us that the optimizer stopped evaluating plans at timeout and just chose the best one so far.
So is there a possibility to give it more time to evaluate? Yes. Have a look at the query plan with Traceflags 2301 and 8780 turned on:
If you compare those two plans you will see that the second plan is actually better.
The tricky thing about that fact is that maybe your query runs fine for a year. But there can be a tipping point where decisions are turned around from query optimizer and with bad luck it won’t find a good plan for you in time.