CASE vs. JOIN lookups performance difference?  
Author Message
dkilanko





PostPosted: Wed Apr 27 17:04:46 CDT 2005 Top

SQL Server Developer >> CASE vs. JOIN lookups performance difference?

Dear colleagues,

I have encountered an incredible difference in timings of the following
2 statements:

a) SELECT col1, col2, col3,
CASE number
WHEN 1 then 'one'
WHEN 2 then 'two'
...
WHEN 10 then 'ten'
END as number_verbatim
FROM table1

b) SELECT table1.col1, table1.col2, table1.col3,
lookup_table.number_verbatim
FROM table1
JOIN lookup_table
on table1.number = lookup_table.number_int

(where lookup_table consists of 10 rows:

number_int number_verbatim
---------- -------------------
1 one
2 two
...
10 ten
)

It turns out that variant (a) runs at least 25% (in reality, 25 seconds
out of total 90) faster than (b). Any idea why (I was hoping SQL
optimizer is smart enough to handle such tiny lookup tables by caching
them at once, or even maybe turning them into CASE-like statements on
its own) and how to fix that? I wish I could use (a) in the product,
but there exist a 0.001% possibility that eventually we'll need to add
"eleven", and thus I can't . :(

I was pondering dynamically generating an UDF containing the
aforementioned CASE statememnt, but, unsurprisingly, the SELECT
statement utilizing that UDF turned out 150% slower.

SQL Server254  
 
 
David





PostPosted: Wed Apr 27 17:04:46 CDT 2005 Top

SQL Server Developer >> CASE vs. JOIN lookups performance difference?
> Dear colleagues,
>
> I have encountered an incredible difference in timings of the
> following 2 statements:
>
> a) SELECT col1, col2, col3,
> CASE number
> WHEN 1 then 'one'
> WHEN 2 then 'two'
> ...
> WHEN 10 then 'ten'
> END as number_verbatim
> FROM table1
>
> b) SELECT table1.col1, table1.col2, table1.col3,
> lookup_table.number_verbatim
> FROM table1
> JOIN lookup_table
> on table1.number = lookup_table.number_int
>
> (where lookup_table consists of 10 rows:
>
> number_int number_verbatim
> ---------- -------------------
> 1 one
> 2 two
> ...
> 10 ten
> )
>
> It turns out that variant (a) runs at least 25% (in reality, 25
> seconds out of total 90) faster than (b). Any idea why (I was hoping
> SQL optimizer is smart enough to handle such tiny lookup tables by
> caching them at once, or even maybe turning them into CASE-like
> statements on its own) and how to fix that? I wish I could use (a) in
> the product, but there exist a 0.001% possibility that eventually
> we'll need to add "eleven", and thus I can't . :(
>
> I was pondering dynamically generating an UDF containing the
> aforementioned CASE statememnt, but, unsurprisingly, the SELECT
> statement utilizing that UDF turned out 150% slower.

UDFs are slow. Joins are the way to go. With proper indexing, peformance
should be fine. There's no reason to compare it to a hard-coded CASE
statement since they are apples and oranges. Do you have the proper
indexing in place in the smaller table?

--
David Gugick
Imceda Software
www.imceda.com

 
 
Tom





PostPosted: Wed Apr 27 17:05:13 CDT 2005 Top

SQL Server Developer >> CASE vs. JOIN lookups performance difference? Not unexpected. In the first case, only one table is involved and there is
no correlation to do. In the second, you have to correlate the two tables.
You could experiment by placing an index on table1(number) as well as
table1(number, col1, col2, col3) and retry. I assume you have a primary key
constraint on lookup_table(number_int). Also, is table1.number an int or is
it some other datatype?

--
Tom

----------------------------------------------------
Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
SQL Server MVP
Columnist, SQL Server Professional
Toronto, ON Canada
www.pinpub.com
.


Dear colleagues,

I have encountered an incredible difference in timings of the following
2 statements:

a) SELECT col1, col2, col3,
CASE number
WHEN 1 then 'one'
WHEN 2 then 'two'
...
WHEN 10 then 'ten'
END as number_verbatim
FROM table1

b) SELECT table1.col1, table1.col2, table1.col3,
lookup_table.number_verbatim
FROM table1
JOIN lookup_table
on table1.number = lookup_table.number_int

(where lookup_table consists of 10 rows:

number_int number_verbatim
---------- -------------------
1 one
2 two
...
10 ten
)

It turns out that variant (a) runs at least 25% (in reality, 25 seconds
out of total 90) faster than (b). Any idea why (I was hoping SQL
optimizer is smart enough to handle such tiny lookup tables by caching
them at once, or even maybe turning them into CASE-like statements on
its own) and how to fix that? I wish I could use (a) in the product,
but there exist a 0.001% possibility that eventually we'll need to add
"eleven", and thus I can't . :(

I was pondering dynamically generating an UDF containing the
aforementioned CASE statememnt, but, unsurprisingly, the SELECT
statement utilizing that UDF turned out 150% slower.

 
 
WeshaTheLeopard





PostPosted: Wed Apr 27 17:30:03 CDT 2005 Top

SQL Server Developer >> CASE vs. JOIN lookups performance difference? > Joins are the way to go.

But, according to the performance I gain, CASE's are even better...

> There's no reason to compare it to a hard-coded CASE
statement since they are apples and oranges.

I'm rather comparing seconds and seconds...

> Do you have the proper indexing in place in the smaller table?

Tried with and without indexes, performance difference is well within
the margin of error.

 
 
WeshaTheLeopard





PostPosted: Wed Apr 27 17:51:45 CDT 2005 Top

SQL Server Developer >> CASE vs. JOIN lookups performance difference? Here's some real world testing I've just performed (played with join
hints).

1. All tests were run after executing DBCC DROPCLEANBUFFERS;
2. test_table is being dropped from the DB prior to executing each
statement.
3. table1 has 2,370,000 records.

---------------- baseline version -------------
SELECT
number_verbatim
INTO
test_table
FROM
table1 h
JOIN lookup_table l on h.number = l.number_int
-----------------------------------------------
CPU: 8687 ms


---------------- 'case' version -------------
SELECT
CASE h.number
WHEN 1 THEN 'One'

WHEN 2 THEN 'Two'

...
WHEN 9 THEN 'Nine'
END as number_verbatim
INTO
test_table
FROM
table1 h
---------------------------------------------
CPU: 5828 ms



---------------- LEFT JOIN version ------------
SELECT
number_verbatim
INTO
test_table
FROM
table1 h
LEFT JOIN lookup_table l on h.number = l.number_int
-----------------------------------------------
CPU: 9609 ms
(weird, I thought that LEFT JOIN should be faster then INNER JOIN...)


---------------- LEFT JOIN version ------------
SELECT
number_verbatim
INTO
test_table
FROM
table1 h
REMOTE JOIN lookup_table l on h.number = l.number_int
-----------------------------------------------
CPU: 11969 ms


--------------------- HASH JOIN ----------------
SELECT
number_verbatim
INTO
test_table
FROM
table1 h
LEFT HASH JOIN lookup_table l on h.number = l.number_int
-----------------------------------------------
CPU: 12688 ms, Elapsed: 95015 ms.



--------------------- HASH JOIN ----------------
SELECT
number_verbatim
INTO
test_table
FROM
table1 h
LEFT LOOP JOIN lookup_table l on h.number = l.number_int
-----------------------------------------------
CPU: 34812 ms

 
 
WeshaTheLeopard





PostPosted: Wed Apr 27 17:52:58 CDT 2005 Top

SQL Server Developer >> CASE vs. JOIN lookups performance difference? Here's some real world testing I've just performed (played with join
hints).

1. All tests were run after executing DBCC DROPCLEANBUFFERS;
2. test_table is being dropped from the DB prior to executing each
statement.
3. table1 has 2,370,000 records.

---------------- baseline version -------------
SELECT
number_verbatim
INTO
test_table
FROM
table1 h
JOIN lookup_table l on h.number = l.number_int
-----------------------------------------------
CPU: 8687 ms


---------------- 'case' version -------------
SELECT
CASE h.number
WHEN 1 THEN 'One'

WHEN 2 THEN 'Two'

...
WHEN 9 THEN 'Nine'
END as number_verbatim
INTO
test_table
FROM
table1 h
---------------------------------------------
CPU: 5828 ms



---------------- LEFT JOIN version ------------
SELECT
number_verbatim
INTO
test_table
FROM
table1 h
LEFT JOIN lookup_table l on h.number = l.number_int
-----------------------------------------------
CPU: 9609 ms
(weird, I thought that LEFT JOIN should be faster then INNER JOIN...)


---------------- LEFT JOIN version ------------
SELECT
number_verbatim
INTO
test_table
FROM
table1 h
REMOTE JOIN lookup_table l on h.number = l.number_int
-----------------------------------------------
CPU: 11969 ms


--------------------- HASH JOIN ----------------
SELECT
number_verbatim
INTO
test_table
FROM
table1 h
LEFT HASH JOIN lookup_table l on h.number = l.number_int
-----------------------------------------------
CPU: 12688 ms, Elapsed: 95015 ms.



--------------------- HASH JOIN ----------------
SELECT
number_verbatim
INTO
test_table
FROM
table1 h
LEFT LOOP JOIN lookup_table l on h.number = l.number_int
-----------------------------------------------
CPU: 34812 ms

 
 
--CELKO--





PostPosted: Wed Apr 27 18:21:27 CDT 2005 Top

SQL Server Developer >> CASE vs. JOIN lookups performance difference? Thsi makes sense, because the CASE is running on one and only one
table, while the JOIN has two tables to handle. As the Table1 grows
and the lookup is kept in main storage, you will get some improvements
in the numbers. Indexes will be a big help, too.

>> I wish I could use (a) in the product, but there exist a 0.001%
possibility that eventually we'll need to add "eleven", and thus I
can't . :( <<

With those odds, I would think about doing it and leaving stub "WHEN 11
THEN 'eleven' " in the expression. Right now you have -- I hope -- a
declaration that looks like "CHECK (number BETWEEN 1 AND 10)" on the
column that you will have to change, so comment it and get ready to do
an update once in a billion years.

 
 
WeshaTheLeopard





PostPosted: Wed Apr 27 18:38:27 CDT 2005 Top

SQL Server Developer >> CASE vs. JOIN lookups performance difference? > Joins are the way to go.
> There's no reason to compare it to a hard-coded CASE
statement since they are apples and oranges.

I'm not comparing CASE's and JOIN's. I'm comparing the total query
execution time. That's seconds and seconds, not apples and oranges.
=^.^=

> Do you have the proper indexing in place in the smaller table?

Tried with and without index. Difference is well within the tolerance
of measurement.

 
 
WeshaTheLeopard





PostPosted: Wed Apr 27 18:41:58 CDT 2005 Top

SQL Server Developer >> CASE vs. JOIN lookups performance difference? > Joins are the way to go.
> There's no reason to compare it to a hard-coded CASE
statement since they are apples and oranges.

I'm not comparing CASE's and JOIN's. I'm comparing the total query
execution time. That's seconds and seconds, not apples and oranges.
=^.^=

> Do you have the proper indexing in place in the smaller table?

Tried with and without index. Difference is well within the tolerance
of measurement.

 
 
WeshaTheLeopard





PostPosted: Wed Apr 27 18:44:08 CDT 2005 Top

SQL Server Developer >> CASE vs. JOIN lookups performance difference? > Joins are the way to go.
> There's no reason to compare it to a hard-coded CASE
statement since they are apples and oranges.

I'm not comparing CASE's and JOIN's. I'm comparing the total query
execution time. That's seconds and seconds, not apples and oranges.
=^.^=

> Do you have the proper indexing in place in the smaller table?

Tried with and without index. Difference is well within the tolerance
of measurement.

 
 
WeshaTheLeopard





PostPosted: Wed Apr 27 18:47:19 CDT 2005 Top

SQL Server Developer >> CASE vs. JOIN lookups performance difference? > Joins are the way to go.

But, according to the performance I gain, CASE's are even better...

> There's no reason to compare it to a hard-coded CASE
statement since they are apples and oranges.

I'm rather comparing seconds and seconds...

> Do you have the proper indexing in place in the smaller table?

Tried with and without indexes, performance difference is well within
the margin of error.

 
 
WeshaTheLeopard





PostPosted: Wed Apr 27 18:49:42 CDT 2005 Top

SQL Server Developer >> CASE vs. JOIN lookups performance difference? > Joins are the way to go.

But, according to the performance I gain, CASE's are even better...

> There's no reason to compare it to a hard-coded CASE
statement since they are apples and oranges.

I'm rather comparing seconds and seconds...

> Do you have the proper indexing in place in the smaller table?

Tried with and without indexes, performance difference is well within
the margin of error.

 
 
WeshaTheLeopard





PostPosted: Wed Apr 27 18:55:17 CDT 2005 Top

SQL Server Developer >> CASE vs. JOIN lookups performance difference? > Not unexpected. In the first case, only one table is involved and
there is
no correlation to do.

Understood, but again, I was thinking better of SQL query optimizer.
=^.^=

> You could experiment by placing an index on table1(number) as well as
table1(number, col1, col2, col3) and retry.

I'm afraid that'll be a serious performance hit on other operations, as
table1 contains over 2,000,000 records.

> I assume you have a primary key constraint on
lookup_table(number_int).

Correct.

> Also, is table1.number an int or is it some other datatype?

Yes, it is int, and that adds to my confusion. Operations with ints
oughtta be fast...

 
 
WeshaTheLeopard





PostPosted: Wed Apr 27 18:55:34 CDT 2005 Top

SQL Server Developer >> CASE vs. JOIN lookups performance difference? > Not unexpected. In the first case, only one table is involved and
there is
no correlation to do.

Understood, but again, I was thinking better of SQL query optimizer.
=^.^=

> You could experiment by placing an index on table1(number) as well as
table1(number, col1, col2, col3) and retry.

I'm afraid that'll be a serious performance hit on other operations, as
table1 contains over 2,000,000 records.

> I assume you have a primary key constraint on
lookup_table(number_int).

Correct.

> Also, is table1.number an int or is it some other datatype?

Yes, it is int, and that adds to my confusion. Operations with ints
oughtta be fast...

 
 
WeshaTheLeopard





PostPosted: Wed Apr 27 18:57:12 CDT 2005 Top

SQL Server Developer >> CASE vs. JOIN lookups performance difference? > Not unexpected. In the first case, only one table is involved and
there is
no correlation to do.

Understood, but again, I was thinking better of SQL query optimizer.
=^.^=

> You could experiment by placing an index on table1(number) as well as
table1(number, col1, col2, col3) and retry.

I'm afraid that'll be a serious performance hit on other operations, as
table1 contains over 2,000,000 records.

> I assume you have a primary key constraint on
lookup_table(number_int).

Correct.

> Also, is table1.number an int or is it some other datatype?

Yes, it is int, and that adds to my confusion. Operations with ints
oughtta be fast...

 
 
WeshaTheLeopard





PostPosted: Wed Apr 27 18:59:27 CDT 2005 Top

SQL Server Developer >> CASE vs. JOIN lookups performance difference? > Not unexpected. In the first case, only one table is involved and
there is
no correlation to do.

Understood, but again, I was thinking better of SQL query optimizer.
=^.^=

> You could experiment by placing an index on table1(number) as well as
table1(number, col1, col2, col3) and retry.

I'm afraid that'll be a serious performance hit on other operations, as
table1 contains over 2,000,000 records.

> I assume you have a primary key constraint on
lookup_table(number_int).

Correct.

> Also, is table1.number an int or is it some other datatype?

Yes, it is int, and that adds to my confusion. Operations with ints
oughtta be fast...

 
 
WeshaTheLeopard





PostPosted: Wed Apr 27 19:08:00 CDT 2005 Top

SQL Server Developer >> CASE vs. JOIN lookups performance difference? (sorry about multiple messages before, it's Google Groups belch...)

> Thsi makes sense, because the CASE is running on one and only one
table, while the JOIN has two tables to handle.
> As the Table1 grows and the lookup is kept in main storage, you will
get some improvements in the numbers.

You see, thinking according to Zen Hacking, I was hoping that SQL query
optimizer would detect that the lookup table is tiny and automatically
morph it in a case-like structure, bypassing the [more] costly
operations like indexing, hashing, and the like (come on, the key is
_int_!) (on the other thought... lemme try to make it tinyint and see
what happens...)

> Indexes will be a big help, too.

Not with such a tiny table, I'm afraid. There (at low ends of the
spectrum) exists an edge where indexes stop helping and slowing things
down.

>> I wish I could use (a) in the product, but there exist a 0.001%
possibility that eventually we'll need to add "eleven", and thus I
can't

> With those odds, I would think about doing it and leaving stub "WHEN
11
THEN 'eleven' " in the expression.

First, I'm not sure it'll be "eleven" and not "elven" (you know,
because of NDA, I had to scramble all the table names and values).
Second, our Architect is against that. :)

P.S. It turns out that a major (but not overwhelming) role might be
played by the fact that "table1" has much more 1s and 2s than 9s.
Rearranging items in CASE statement slowed execution down, but not to
the JOIN time still.

 
 
--CELKO--





PostPosted: Wed Apr 27 20:26:50 CDT 2005 Top

SQL Server Developer >> CASE vs. JOIN lookups performance difference? >> Rearranging items in CASE statement slowed execution down, but not
to the JOIN time still. <<

People tend to forget that the CASE expression is executed in order, so
there are times when a re-arrangment can get you a bit faster execution
for a long statement. And by the same logic, you know that all the
predicates above you are false, so you can use that to write a simpler
predicate at the current WHEN clause.

 
 
Michael





PostPosted: Wed Apr 27 21:31:06 CDT 2005 Top

SQL Server Developer >> CASE vs. JOIN lookups performance difference? BTW - what do your query execution plans look like on the joins?



> Dear colleagues,
>
> I have encountered an incredible difference in timings of the following
> 2 statements:
>
> a) SELECT col1, col2, col3,
> CASE number
> WHEN 1 then 'one'
> WHEN 2 then 'two'
> ...
> WHEN 10 then 'ten'
> END as number_verbatim
> FROM table1
>
> b) SELECT table1.col1, table1.col2, table1.col3,
> lookup_table.number_verbatim
> FROM table1
> JOIN lookup_table
> on table1.number = lookup_table.number_int
>
> (where lookup_table consists of 10 rows:
>
> number_int number_verbatim
> ---------- -------------------
> 1 one
> 2 two
> ...
> 10 ten
> )
>
> It turns out that variant (a) runs at least 25% (in reality, 25 seconds
> out of total 90) faster than (b). Any idea why (I was hoping SQL
> optimizer is smart enough to handle such tiny lookup tables by caching
> them at once, or even maybe turning them into CASE-like statements on
> its own) and how to fix that? I wish I could use (a) in the product,
> but there exist a 0.001% possibility that eventually we'll need to add
> "eleven", and thus I can't . :(
>
> I was pondering dynamically generating an UDF containing the
> aforementioned CASE statememnt, but, unsurprisingly, the SELECT
> statement utilizing that UDF turned out 150% slower.
>


 
 
WeshaTheLeopard





PostPosted: Thu Apr 28 09:23:48 CDT 2005 Top

SQL Server Developer >> CASE vs. JOIN lookups performance difference? > People tend to forget that the CASE expression is executed in order,

Well, you see, according to Zen Hacking again, if I was to write an
optimizer for a CASE statement on a very limited set of TINYINTs, I
would've created a 256-word lookup table and simply picked up a jump
address by offset... so no sequential execution [in this particular
case].

So probably I'm thinking too high of MSSQL team if they are really
avoiding such a tricks.

 
 
Gert-Jan





PostPosted: Thu Apr 28 18:06:21 CDT 2005 Top

SQL Server Developer >> CASE vs. JOIN lookups performance difference? Hmm, and how does the CASE query perform when table1 only contains the
number 10 in column number?

You see, the CASE expression is executed top to bottom, so the
performance will depend on the distribution of the values of column
number. Once the list grows, the performance will degrade (eventually it
will degrade exponentially).

Joining with another table works differently. Cadinality and data
distribution start playing a roll.

By the way: with such a small table, you may want to remove all indexes
from table lookup_table and see how that performs.

Having said all that: I would not be surprised if the CASE query still
performs best, simply because it is hard coded, and does not involve all
the overhead and eventualities of a real table.

HTH,
Gert-Jan




>
> Dear colleagues,
>
> I have encountered an incredible difference in timings of the following
> 2 statements:
>
> a) SELECT col1, col2, col3,
> CASE number
> WHEN 1 then 'one'
> WHEN 2 then 'two'
> ...
> WHEN 10 then 'ten'
> END as number_verbatim
> FROM table1
>
> b) SELECT table1.col1, table1.col2, table1.col3,
> lookup_table.number_verbatim
> FROM table1
> JOIN lookup_table
> on table1.number = lookup_table.number_int
>
> (where lookup_table consists of 10 rows:
>
> number_int number_verbatim
> ---------- -------------------
> 1 one
> 2 two
> ...
> 10 ten
> )
>
> It turns out that variant (a) runs at least 25% (in reality, 25 seconds
> out of total 90) faster than (b). Any idea why (I was hoping SQL
> optimizer is smart enough to handle such tiny lookup tables by caching
> them at once, or even maybe turning them into CASE-like statements on
> its own) and how to fix that? I wish I could use (a) in the product,
> but there exist a 0.001% possibility that eventually we'll need to add
> "eleven", and thus I can't . :(
>
> I was pondering dynamically generating an UDF containing the
> aforementioned CASE statememnt, but, unsurprisingly, the SELECT
> statement utilizing that UDF turned out 150% slower.