Worth Corner, Crawley, RH10 7SL

Support Hours - Mon-Sat, 8.00-18.00

Welcome to our SQL Server blog

This blog was created in order to pass on some of our knowledge and advice in the hope that the community might find it useful.

Obviously Microsoft SQL Server is our area of expertise and therefore this is our most thorough blog with everything from the basic through to the the incredibly advanced and, as such, we should have something for everyone within our pages.


Column Compression? Compress and Decompress
Published: Dec 01, 2020
This was something which came up recently when discussing storage of XML columns within a SQL Server database and the large amounts of space it can take up. Most of us know about the compression which has been around for years now, those being Row level and Page level compression… but since SQL Server 2016 there has been a new option that you may wish to try.

Also note the use of the word “MAY”. This is a great feature but, as we all know, there’s no such thing as a free lunch which I’ll explain in due course.

So, for those who need a very brief refresher…

Row level compression is done at the index level and affects the whole table/index. It simply makes all data types variable length. Therefore, for example, an int which takes up 4 bytes but is only holding the number 1 will become a varchar(1) and therefore free up 3 bytes back to the data page.

Page level compression is done at the index level and affects the whole table/index. SQL Server uses an algorithm to kind of pattern match the data within the page to strip out common character sets. Think of it as normalization within the data pages.

These are, as mentioned, both table/index level compression methods and although they are both incredibly powerful and useful when used correctly, they don’t necessarily work well if you have very wide tables. Consider the following: You have a skinny table with 200 rows per data page and repetitive data across rows. For this, page level compression will work wonders. However, what if you have a wide table with xml data which means that you maybe have just 1 row per page and nothing that you can really normalize out at the page level? In this case Page Level compression does very little except give you added CPU overhead.

Noting the above, one of the most common questions I hear is how to store textual, XML, of JSON data in a more compressed fashion… “Wouldn’t it be great if we could ‘zip’ the column?”. Well, SQL Server 2016+ allows you to do just that.

The t-SQL function “COMPRESS” and “DECOMPRESS” use the gzip algorithms to zip individual columns and can make a vast difference to your data size on disk. Let’s run a quick demo:

use AdventureWorks2012
go

drop table if exists dbo.person
go

-- make a copy of the person.person table
select BusinessEntityID, PersonType, NameStyle, Title,
              
FirstName, MiddleName, LastName, Suffix, EmailPromotion,
              
AdditionalContactInfo, Demographics, rowguid, ModifiedDate
into dbo.person
from person.person
go

-- check the size on disk
exec sp_spaceUsed 'dbo.person'
go

-- page level compress the table
create unique clustered index cix_person on dbo.person(businessEntityID) with (data_compression = page)
go

-- check the size on disk
exec sp_spaceUsed 'dbo.person'
go

-- truncate the table
truncate table dbo.person
go

-- drop the xml column
alter table dbo.person drop column Demographics
go

-- add the column as a varbinary(max) for compression
alter table dbo.person add Demographics varbinary(max)
go

-- re-insert the data but with the xml compressed
insert into dbo.person
select BusinessEntityID, PersonType, NameStyle, Title,
              
FirstName, MiddleName, LastName, Suffix, EmailPromotion,
              
AdditionalContactInfo, rowguid, ModifiedDate,
              
compress(convert(nvarchar(max), Demographics)) Demographics
from person.person
go

-- create a non-compressed index
create unique clustered index cix_person on dbo.person(businessEntityID) with (drop_existing = on, data_compression = none)
go

-- check the size on disk
exec sp_spaceUsed 'dbo.person'
go



Remembering that the final table has ONLY the XML column compressed, you can see the savings involved are quite large, even compared to page level compression on the table as a whole.

This could have a large effect on anyone trying to save some disk space when having to store large amounts of text. Although note the slight re-architect that is needed. The Compress and Decompress functions require the varbinary data type and therefore you will need to do a few casts or converts, but nothing too traumatic.

However, you remember I mentioned there’s no such thing as a free lunch? Well, the cost of compressing in this manner is high. Very high.

As an example, on my Surface I can populate the dbo.person copy in less than a second BUT the latter query that performs the insert with compression? 14 seconds!!!

This is also reflected in the decompression of data.

Running any query which does not require the decompressing of the column would be faster in our new, smaller version. Also, we don’t take much of a hit if we simply want to view a few rows:

set statistics time on

select
BusinessEntityID, Demographics from person.person where BusinessEntityID between 140 and 200
select BusinessEntityID, convert(xml, decompress(Demographics)) from dbo.person where BusinessEntityID between 140 and 200
go



Yes, this is slower, but it’s definitely not unbearable. BUT what happens if we wanted to return a lot of data:

set statistics time on

select top
10000 BusinessEntityID, Demographics from person.person
select top 10000 BusinessEntityID, convert(xml, decompress(Demographics)) from dbo.person
go



As you can see, this follows the pattern of our smaller data, but suddenly becomes a lot more noticeable.

Bottom line being… if you want to crunch your textual data and only ever read it infrequently or in small batches, then use COMPRESS because it will most certainly be your friend. But if you’re going to be decompressing frequently and in large quantities then you need to make sure that you can afford the performance hit.

Leave a Comment
Your email address will not be published. All fields are mandatory.
NB: Comments will only appear once they have been moderated.

SQL  World  CEO
Kevin  Urquhart

iPhone Selfie

I am a SQL Server DBA, Architect, Developer, Trainer, and CEO of SQL World. This is my blog in which I’m simply trying to share my SQL knowledge and experiences with the world.

Categories


© Copyright 2020 SQLTraining Ltd.