Worth Corner, Crawley, RH10 7SL

Support Hours - Mon-Sat, 8.00-18.00

Welcome to our SQL Server blog

This blog was created in order to pass on some of our knowledge and advice in the hope that the community might find it useful.

Obviously Microsoft SQL Server is our area of expertise and therefore this is our most thorough blog with everything from the basic through to the the incredibly advanced and, as such, we should have something for everyone within our pages.

2014 Cardinality Estimator Exponential Backoff
Published: Dec 11, 2016
This is going to be a very basic explanation as to the differences seen in my previous post (about TF 9481) in which we encountered the new and old cardinality estimators providing different estimates to what looked like a very simplistic query.

For a much more in depth explanation of this I would recommend looking at Paul White’s blog page here.

So anyway, here was the query that I was using in my last post:

use AdventureWorks2012

select *
from sales.SalesOrderHeader
where orderDate between ''2005-01-01'' and ''2005-12-31''
and subTotal between 10000 and 20000

So why did this give different estimates when ran with SQL 2014 vs the older CE? Well it all comes down to the way SQL Server deals with combining columns from the same table in a WHERE clause.

Let’s break this down and have a look (we could use DBCC SHOW_STATISTICS to see this in greater detail, but for the sake of simplicity I’m going to use a simpler method)…

Firstly we’ll ask the OLD CE for its estimates when we run the following:

select *
from sales.SalesOrderHeader
where orderDate between ''2005-01-01'' and ''2005-12-31''
option (querytraceon 9481)

select *
from sales.SalesOrderHeader
where subTotal between 10000 and 20000
option (querytraceon 9481)

Okay, so what is SQL Server doing behind the scenes when we ask it to use both predicates in the WHERE clause?

Well we know that the number of rows in the table is 31456 (you can do that without my help), there SQL Server applies the following calculation:

select (1372.08 / 31456) -- proportion of rows expected to be returned for orderDate
* (308.504 / 31456) -- proportion of rows expected to be returned for subTotal
* 31456 -- number of rows in the table

Executing this gives us:

So let’s have a look at the estimated query as a whole:

And that’s how it’s calculated in the old CE.

So what happens in the new one? Well basically there was always an issue in which columns in the same table could well be related to each other and therefore the old rule of making them mutually exclusive is simply too harsh. However, we know that most columns aren’t related with too strong a link and therefore the new CE takes the view that we might have a correlation and therefore we exponentially back off with our calculation.

We do this by taking the square root of the selectivity and therefore, because selectivity will always be between 0 and 1, this has the effect of raising the estimate.

For example, let’s say our selectivity estimate is 0.2… the square root of 0.2 is 0.44, therefore raising our selectivity estimate (hopefully you’re following this).

Also note that SQL Server will always estimate using the most selective predicate first.

We know that the subTotal column produces the fewest rows and therefore SQL Server will keep that estimate and THEN apply the orderDate estimate using the Exponential Backoff method.

Anyway, we now place this into the calculation:

select (308.504 / 31456) -- proportion of rows expected to be returned for orderDate
* sqrt(1372.08 / 31456) -- proportion of rows expected to be returned for subTotal
* 31456 -- number of rows in the table

And looking at the Estimated Plan with the new CE we can see:

Therefore we can now see how the new Cardinality Estimator works for multiple column AND predicates in a WHERE clause.

As a note, in this example the new CE is absolutely spot on with its estimate compared to actuals, but this isn’t always the case as this is still an estimate based on a formula which is trying for a “best fit for all situations” and therefore will inevitably have flaws… but it’s worth knowing how it’s calculated when you’re testing in regard to an upgrade to the latest version.
Leave a Comment
Your email address will not be published. All fields are mandatory.
NB: Comments will only appear once they have been moderated.

SQL  World  CEO
Kevin  Urquhart

iPhone Selfie

I am a SQL Server DBA, Architect, Developer, Trainer, and CEO of SQL World. This is my blog in which I’m simply trying to share my SQL knowledge and experiences with the world.


© Copyright 2020 SQLTraining Ltd.