Worth Corner, Crawley, RH10 7SL

Support Hours - Mon-Sat, 8.00-18.00

Welcome to our SQL Server blog

This blog was created in order to pass on some of our knowledge and advice in the hope that the community might find it useful.

Obviously Microsoft SQL Server is our area of expertise and therefore this is our most thorough blog with everything from the basic through to the the incredibly advanced and, as such, we should have something for everyone within our pages.


Slowly Changing Dimension Designs - Basic Date Range
Published: Nov 29, 2020
In a previous post I went through what a Slowly Changing Dimension is and how it can be a pain to all and sundry when designing a database. Before I move on to implementation in regard to code, I just wanted to touch on a few design thoughts in regard to the tables themselves as there is more than a couple of ways to store a slowly changing dimension. There’s no right or wrong, therefore these are just a few ideas to consider depending on what you feel suits you best or you’re most comfortable with.

Let's get started...

Basic Table

This is the basic setup I’ll be using (single line formatting for compactness only):

drop table if exists dbo.FactSalesByMonth, dbo.DimIndustry, dbo.DimCompany
go

create table DimIndustry(industryID tinyint identity, industryName varchar(100) not null)
create table DimCompany(companyID tinyint identity, companyName varchar(100) not null)
create table FactSalesByMonth(salesID int identity, industryID tinyint, companyID tinyint, dateID int, salesAmt money)
go

alter table DimIndustry add constraint pk_industry primary key clustered(industryID)
alter table DimCompany add constraint pk_company primary key clustered(companyID)
alter table FactSalesByMonth add constraint pk_sales primary key clustered(salesID)
go

alter table DimIndustry add constraint fk_salesIndustry foreign key(industryID) references DimIndustry(industryID)
alter table DimCompany add constraint fk_salesCompany foreign key(companyID) references DimCompany(companyID)
go

insert into DimIndustry select 'Technology'
insert into DimCompany select 'Pear Computers'
insert into FactSalesByMonth select 1, 1, 201801, 1000000
insert into FactSalesByMonth select 1, 1, 201802, 1000000
insert into FactSalesByMonth select 1, 1, 201803, 1000000
insert into FactSalesByMonth select 1, 1, 201804, 1200000
insert into FactSalesByMonth select 1, 1, 201805, 1200000
insert into FactSalesByMonth select 1, 1, 201806, 1200000
go

select * from DimIndustry
select * from DimCompany
select * from FactSalesByMonth
go



We’re going to deal with the fact that Pear Computers has changed its name over the years and we need to know what it was on a Point In Time basis.

Solely Date Based

In this method we simply add To and From dates to our table in order to show when a value was active. For example, our company table changes as follows:

drop table if exists dbo.DimCompany
go

create table DimCompany
(
    
companyID tinyint not null,
    
fromDate datetime not null,
    
toDate datetime not null,
    
companyName varchar(100) not null
)
go

insert into DimCompany select 1, '2000-01-01', '2018-03-31', 'Pear Technology Ltd'
insert into DimCompany select 1, '2018-04-01', '9999-12-31', 'Pear Computers'



Summary (main points, not exhaustive)

Pros

  • Nice and simple to understand


  • Easy to implement


  • Easy to find latest record via toDate


Cons

  • Need a new Primary Key

    • Composite across CompanyID & FromDate


    • Add surrogate and use that


  • Difficulties with Primary/Foreign key relationship to Fact tables

    • Extra columns in Fact Table to support


    • Pass through Surrogate key


(Not ideal for CompanyID level aggregation)

As per the above, this is easy to implement and is a clear and clean design for the Dimension table. However, it can be a problem in terms of foreign keys and aggregation within the Fact layer which could lead to degraded performance.

For example, method 1 (using full foreign key in Fact table):


This could cause problems because now you need to aggregate over 2 columns which may require additional indexing, wider indexes, and higher reads.

Using method 2 (surrogate key) we see the following:


As the above shows, if you want to aggregate over companyID then this is less clear in the Fact table and, if there were a lot of historical values for the company in question, could make for a much less performant query as a whole.

The other option is to put companySurrogateID AND companyID into the Fact table to cater for both types of query. This is similar to the “parent” option I’ll cover in my next post.

Leave a Comment
Your email address will not be published. All fields are mandatory.
NB: Comments will only appear once they have been moderated.

SQL  World  CEO
Kevin  Urquhart

iPhone Selfie

I am a SQL Server DBA, Architect, Developer, Trainer, and CEO of SQL World. This is my blog in which I’m simply trying to share my SQL knowledge and experiences with the world.

Categories


© Copyright 2020 SQLTraining Ltd.