Trigger-maintained summary values  
Author Message
miker1999





PostPosted: Tue Jul 19 23:18:09 CDT 2005 Top

SQL Server Developer >> Trigger-maintained summary values

I am once again faced with what must be a very common design scenario, but
have for a long time been on the fence as to which way to approach it. I
don't know what the scenario / pattern is "commonly" called, so had trouble
search for opinions on the best solution.

The situation is where there is some form of header row into which several
item rows are foreign keyed. The question is whether or not to store summary
information about the item rows on the header row, and maintain those values
by using triggers. A classic example would be an invoice situation, where
each invoice line has a dollar value, and the invoice itself has a total
dollar value being the sum of the invoice line values.

The only reason I can think of to store the summary information on the
header is to make queries faster. For instance, one can find all invoices
over a particular value without having to sum line items every time.

I can think of several reasons not to store the summary information on the
header. The only one that really matters is that doing so make it possible
for the sum of the lines not to agree with the stored value on the header,
due to disabling triggers etc, raising the question of which one is correct,
and violating some important principles (consistency?).

The cons seem to vastly outweigh the pros, but every so often there is a
situation where the speed increase is so large it cannot be ignored. My
current predicament involves line items with quantities anywhere between
about 5 and 5000, and where tens of thousands of line items accumulate
against one client header each year, and where there is a need for events to
occur immediately whenever the year to date total reaches specific
threshhold values. Performing a sum(qty) where year=current for each client
every time a new line item comes in for that client brings the database to
its knees. And let's not even think about the stress of running-totals.

Is there a solution I have not considered? Indexed views are a solution we
have considered, but we found that the performance penalty for
inserts/updates/deletes was as high or higher than the performance penalty
of running the complete subquery every time.

-------------------------------

By way of DDL, here is a simple, generalised example of the technique.

create table Headers(
HeaderNbr int not null constraint PK_Headers primary key clustered,
TotalQty int not null constraint DF_Headers_TotalQty default 0
)
go


create table Items(
ItemNbr int not null constraint PK_Items primary key clustered,
HeaderNbr int not null constraint FK_LineItems_Headers foreign key
references Headers(HeaderNbr),
Qty int not null constraint DF_Items_Qty default 1
)
go

create trigger items_maintain_header_qty on Items for insert, update, delete
as
begin
set nocount on

update h
set h.TotalQty =
h.TotalQty -
isnull((select sum(Qty) from deleted where HeaderNbr =
h.HeaderNbr), 0) +
isnull((select sum(Qty) from inserted where HeaderNbr =
h.HeaderNbr), 0)
from Headers h
end
go

insert headers(headernbr) select 1 union select 2 union select 3

insert items (itemNbr, HeaderNbr, Qty)
select 1, 1, 10 union
select 2, 2, 20 union
select 3, 3, 30 union
select 4, 1, 100 union
select 5, 2, 200 union
select 6, 3, 300 union
select 7, 1, -1000 union
select 8, 2, -2000 union
select 9, 3, -3000

select * from headers

SQL Server152  
 
 
Aaron





PostPosted: Tue Jul 19 23:18:09 CDT 2005 Top

SQL Server Developer >> Trigger-maintained summary values > item rows are foreign keyed. The question is whether or not to store
> summary information about the item rows on the header row, and maintain
> those values by using triggers.

My vote is no. Partly because there is no need to store redundant data, and
partly because Ireally don't think triggers are the right answer even if I
did agree that redundant data was justified. Are you allowing users to go
in and fiddle with the line items table(s)? If you control access to the
data via an app/stored procedures then you should be able to have that
access layer deal with the summaries.

> A classic example would be an invoice situation, where each invoice line
> has a dollar value, and the invoice itself has a total dollar value being
> the sum of the invoice line values.

How often do the line items change after the day the invoice is created? To
me it would make more sense (if you really needed the extra speed for
queries) to roll up the data once a day or even throughout the day as a
background process. If your data is quite large you may consider a datamart
or data warehouse. But in either case, the performance going on in the
background should not affect the client directly; if it does, then you have
inadequately implemented your hardware.

A


 
 
Don





PostPosted: Tue Jul 19 23:45:04 CDT 2005 Top

SQL Server Developer >> Trigger-maintained summary values > My vote is no. Partly because there is no need to store redundant data,
> and partly because Ireally don't think triggers are the right answer even
> if I did agree that redundant data was justified. Are you allowing users
> to go in and fiddle with the line items table(s)? If you control access
> to the data via an app/stored procedures then you should be able to have
> that access layer deal with the summaries.

Yes, I agree, I have moved away from triggers over the last few years. When
I first moved to SQL Server they were a shiny new hammer and everything was
a nail. But the only triggers in the last database I built were there to
enforce constraint-like business rules and table level constraints.

> How often do the line items change after the day the invoice is created?
> To me it would make more sense (if you really needed the extra speed for
> queries) to roll up the data once a day or even throughout the day as a
> background process. If your data is quite large you may consider a
> datamart or data warehouse. But in either case, the performance going on
> in the background should not affect the client directly; if it does, then
> you have inadequately implemented your hardware.

In this particular situation line items change very rarely, but *can
possibly* change at any time, for a few different reasons (which means a few
different events to trap in trigger code, making long complicated triggers,
which I hate with a passion). Rolling up periodically is my current
solution, with the understanding that threshhold based events will only be
picked up every hour, not the moment the threshhold is crosssed. This is
acceptable, but not optimal. Rolling up using exact threshhold values
requires a running sum style calculation (find the total of all line items
before me), which is far too ****. And yes, the hardware is absolutely not
up to the job, but the client is an external company and a real
foot-dragger. They're happy the way they are :)
Anyway, the specifics of my current situation don't matter *that* much. It's
more the principle of the thing. I would like to believe it is *never* right
to violate your database integrity in order to improve query performance,
but boy it becomes attractive sometimes.

By the way for anyone else who reads this now or in the future, another
solution I haven't done a lot of testing with is indexed, calculated
columns.