Worth Corner, Crawley, RH10 7SL

Support Hours - Mon-Sat, 8.00-18.00

Welcome to our SQL Server blog

This blog was created in order to pass on some of our knowledge and advice in the hope that the community might find it useful.

Obviously Microsoft SQL Server is our area of expertise and therefore this is our most thorough blog with everything from the basic through to the the incredibly advanced and, as such, we should have something for everyone within our pages.

Lead and Lag Built In Functions
Published: Nov 06, 2016
These aren’t new functions as they were brought in for SQL 2012, but I was using them the other day in a SQL 2014 environment and the person I was talking to at the time had never heard of them, so I thought I’d make mention of them here in case other people hadn’t heard of them either.

These are basically functions along the same line as RANK and ROW_NUMBER in that they take an OVER clause.

Their purpose is to basically prevent self-join queries when comparing datasets to themselves.

Sounds confusing, but you’ll see what I mean…

Using my extended AdventureWorks (as usual) we want to compare the total sales amount for each day with the respective previous days total sales amount.

Basically we want to do something like this:

use AdventureWorks2014

if object_id('tempDB..#orders') is not null
drop table #orders

select row_number() over(order by orderDate) id,
orderDate, sum(TotalDue) totalDue
into #orders
from sales.SalesOrderHeaderEnlarged
group by orderDate

select o.OrderDate, o.totalDue dayTotalDue, o1.totalDue previousDayTotalDue
from #orders o
join #orders o1
on o.id = o1.id + 1
order by o.id

What we’re interested in is the last query which produces our results by performing a self-join on the temporary orders table.

This can be quite a heavy task (in this case not because it’s a small table of only around 2000 records, but it will still demonstrate performance well enough).

We’re joining the table to itself:

And in this case we’re scanning the orders table twice (makes sense) and we also require two internal tables to be created and used in the background…

So how else could we write this? Well the easiest way is using LAG:

select OrderDate, totalDue dayTotal,
lag(totalDue, 1) over(order by orderDate) previousDayTotalDue
from #orders
order by id

For reference, the column in the lag function itself is the column we want to align, in this case totalDue. And the number 1 means that we want to shift it by 1 row. If we placed 2 there we could compare each date’s totals to those 2 days prior.

Hopefully that makes sense.

Well this is the outcome of this query:

It’s much longer, but you can see that we only hit the #orders table once.

As we expected… we just scanned the table once, therefore halving the reads and we also only used 1 internal table. Therefore this is much more efficient.

Let’s now consider something else… let’s compare each day to the day before and the following day…

select o.OrderDate, o.totalDue dayTotalDue, o1.totalDue previousDayTotalDue,
o2.totalDue followingDayTotalDue
from #orders o
join #orders o1
on o.id = o1.id + 1
join #orders o2
on o.id = o2.id - 1
order by o.id

Now this is starting to get a little messy. The SQL is growing, joins are getting confusing, and the execution plan, as expected, shows 3 table scans:

Which is reflected in the stats:

So now let’s use our new functions… both Lead and Lag:

select OrderDate, totalDue dayTotal,
lag(totalDue, 1) over(order by orderDate) previousDayTotalDue,
lead(totalDue, 1) over(order by orderDate) followingDayTotalDue
from #orders
order by id

Much neater SQL and we only scan the table once:

The other thing to note is that the Lead and Lag functions do return the values in which there is no match… like an outer join…

In my old style query these wouldn’t appear. Therefore you would need to remove them in the where clause if you wished.

The other option is that you can simply have them listed as 0 if you want by adding a parameter to the Lag and Lead functions (you can use any value you like, I’m just choosing 0):

select OrderDate, totalDue dayTotal,
lag(totalDue, 1, 0) over(order by orderDate) previousDayTotalDue,
lead(totalDue, 1, 0) over(order by orderDate) followingDayTotalDue
from #orders
order by id

And this will solve your problem:

I would definitely recommend using this over the old style for any comparison queries like the above. It’s much more efficient and I don’t (yet at least) know of any gotchas to worry about with it.
Leave a Comment
Your email address will not be published. All fields are mandatory.
NB: Comments will only appear once they have been moderated.

SQL  World  CEO
Kevin  Urquhart

iPhone Selfie

I am a SQL Server DBA, Architect, Developer, Trainer, and CEO of SQL World. This is my blog in which I’m simply trying to share my SQL knowledge and experiences with the world.


© Copyright 2020 SQLTraining Ltd.