Worth Corner, Crawley, RH10 7SL

Support Hours - Mon-Sat, 8.00-18.00

Welcome to our SQL Server blog

This blog was created in order to pass on some of our knowledge and advice in the hope that the community might find it useful.

Obviously Microsoft SQL Server is our area of expertise and therefore this is our most thorough blog with everything from the basic through to the the incredibly advanced and, as such, we should have something for everyone within our pages.

Be Very Careful with Sampled Statistics
Published: Dec 18, 2016
Okay, we all know that Sampled Statistics are not perfect otherwise there would be no need for a Full Scan stats update. However, there’s an issue I encountered recently that has really bothered me in regard to these.

When we perform a sampled stats refresh we know that SQL Server will sample a smaller and smaller percentage of the rows in our table (which is based on data size, not pure row count) so we’re all accustomed to living with that, but there’s another flaw that really drives me mad…

During some of the work I’ve been doing recently we are constantly updating statistics in order to get good plans (which is pretty standard) because we’re inserting a lot of ascending key data and, in older versions than SQL 2014, this can cause a lot of problems with SQL Server thinking we’re only obtaining 1 record and therefore giving Nested Loops when we need Hash Joins.

This can be alleviated using Trace Flags (I’ll cover those separately) but it’s still not perfect if you ever do anything which causes SQL Server to think the keys may not actually be ascending.

So anyway… “You’re updating stats so what’s the problem?”, I hear you ask.

Well, these tables are upwards of 150GB in size in some cases and therefore we’re relying on sampled stats because a full scan simply isn’t feasible.

BUT, as I found out, SQL Server doesn’t seem to think that it’s necessary to book end the sample. Ie. It doesn’t always sample the min and max values in the table.

To me this seems pretty fundamental as you would think it would simply take the first, the last, and then whatever sample it wanted.

When dealing with ascending keys it means that nearly all queries being run are going to want the most recent key values and SQL simply doesn’t pick them up. Therefore I was finding that even running an update stats wouldn’t help with my execution plans and in the end I had to resort to filtered stats (which can be tricky if you don’t have something nice like Date to work with) alongside query hints.

These are all viable but fiddly and wouldn’t be needed if Sampled Stats didn’t have, what seems to me, a massive flaw.

If you want to have a look yourself, then run the following code and have a look at the output:

use testDB

if object_id('myTestStatsTable') is not null drop table myTestStatsTable

create table myTestStatsTable
id int identity,
idINT int,
myValue char(1000) default('a'),
constraint pk_myTestStatsTable primary key clustered(id)

declare @statHeader table
keyName varchar(50),
updateTime varchar(50),
noRows int,
sampledRows int,
steps tinyint,
density float,
avgKeyLength float,
stringIndex varchar(3),
filterExp varchar(10),
unfilteredRows int

declare @statsHist table
rangeHiKey int,
rangeRows float,
eqRows float,
distintRangeRows float,
avgRangeRows float

declare @output table
id int identity,
noRows int,
sampledRows int,
maxStatsID int

declare @rowLimit int = 1000000, @sql varchar(max)

while isnull((select max(id) from myTestStatsTable), 0) <= @rowLimit
   insert into
select top (10000) 1
from sys.objects o
cross apply sys.columns c
cross apply sys.allocation_units a
if (select max(id) from myTestStatsTable) % 50000 = 0
       update statistics

select @sql = 'dbcc show_statistics (''myTestStatsTable'', ''pk_myTestStatsTable'') with stat_header'

insert into @statHeader
exec (@sql)

select @sql = 'dbcc show_statistics (''myTestStatsTable'', ''pk_myTestStatsTable'') with histogram'

insert into @statsHist
exec (@sql)

insert into @output
select s.noRows, s.sampledRows, h.highValue
from @statHeader s
cross apply
select max(rangeHiKey) highValue
from @statsHist
) h

delete from @statHeader
delete from @statsHist

noRows, sampledRows,
sampledRows / (noRows * 1.)) * 100 as percentSampled,
from @output
order by id

Leave a Comment
Your email address will not be published. All fields are mandatory.
NB: Comments will only appear once they have been moderated.

SQL  World  CEO
Kevin  Urquhart

iPhone Selfie

I am a SQL Server DBA, Architect, Developer, Trainer, and CEO of SQL World. This is my blog in which I’m simply trying to share my SQL knowledge and experiences with the world.


© Copyright 2020 SQLTraining Ltd.