The debate about INSERT INTO vs SELECT INTO can be quite long and get people quite heated and therefore I’m not going to go into the entire thing right now, but what I did want to mention was something that I encountered the other day and which could be worth considering next time you’re deciding which approach to take.
Basically this is in regard to the logging done internally within SQL Server depending on which syntax you use. Although note that this will only really benefit you if you are using real tables in a database and not # tables because everything in tempDB is minimally logged and therefore this makes no difference for temp tables.
Let’s have a look at a couple of examples using the AdventureWorks2012 database.
My database is in SIMPLE recovery and therefore what we’re going to do is to firstly perform a CHECKPOINT to clear the log (which will always leave 3 records), perform a SELECT INTO, and then check the number of log records generated:
use AdventureWorks2012
go
checkpoint
go
select count(*) from fn_dblog(null, null)
select SalesOrderID, SalesOrderDetailID, CarrierTrackingNumber, OrderQty, ProductID,
SpecialOfferID, UnitPrice, UnitPriceDiscount, rowguid, ModifiedDate
into dbo.temp
from sales.SalesOrderDetail
select count(*) from fn_dblog(null, null)
drop table temp
go
Now let’s perform the same thing again, but this time we’ll create the table first and then insert into it:
use AdventureWorks2012
go
checkpoint
go
select count(*) from fn_dblog(null, null)
create table dbo.temp
(
SalesOrderID int not null,
SalesOrderDetailID int not null,
CarrierTrackingNumber nvarchar(25) null,
OrderQty smallint not null,
ProductID int not null,
SpecialOfferID int not null,
UnitPrice money not null,
UnitPriceDiscount money not null,
LineTotal AS (isnull((UnitPrice*((1.0)-UnitPriceDiscount))*OrderQty,(0.0))),
rowguid uniqueidentifier not null,
ModifiedDate datetime not null,
constraint PK_temp primary key clustered (SalesOrderID, SalesOrderDetailID)
)
insert into dbo.temp
select SalesOrderID, SalesOrderDetailID, CarrierTrackingNumber, OrderQty, ProductID,
SpecialOfferID, UnitPrice, UnitPriceDiscount, rowguid, ModifiedDate
from sales.SalesOrderDetail
select count(*) from fn_dblog(null, null)
drop table temp
go
As you can see, the results are impressive. The difference between INSERT INTO and SELECT INTO are vast. This is because SELECT INTO is a minimally logged operation whereas INSERT INTO is fully logged.
Noting that this is a relatively small table you can imagine how much this will hammer your log file (not to mention potential file growth lag) if you did this with a large table.
Something to consider next time you’re coding this type of thing.
Again though, as mentioned above, this is irrelevant when using # tables as tempDB is always minimally logged and therefore this applies to full tables only.
Just in case you don’t believe me:
use tempDB
go
checkpoint
go
select count(*) from fn_dblog(null, null)
select SalesOrderID, SalesOrderDetailID, CarrierTrackingNumber, OrderQty, ProductID,
SpecialOfferID, UnitPrice, UnitPriceDiscount, rowguid, ModifiedDate
into #temp
from AdventureWorks2012.sales.SalesOrderDetail
select count(*) from fn_dblog(null, null)
drop table #temp
go
use tempDB
go
checkpoint
go
select count(*) from fn_dblog(null, null)
create table #temp
(
SalesOrderID int not null,
SalesOrderDetailID int not null,
CarrierTrackingNumber nvarchar(25) null,
OrderQty smallint not null,
ProductID int not null,
SpecialOfferID int not null,
UnitPrice money not null,
UnitPriceDiscount money not null,
LineTotal AS (isnull((UnitPrice*((1.0)-UnitPriceDiscount))*OrderQty,(0.0))),
rowguid uniqueidentifier not null,
ModifiedDate datetime not null,
constraint PK_temp primary key clustered (SalesOrderID, SalesOrderDetailID)
)
insert into #temp
select SalesOrderID, SalesOrderDetailID, CarrierTrackingNumber, OrderQty, ProductID,
SpecialOfferID, UnitPrice, UnitPriceDiscount, rowguid, ModifiedDate
from AdventureWorks2012.sales.SalesOrderDetail
select count(*) from fn_dblog(null, null)
drop table #temp
go