Keep 1000 newest records... delete all the others  
Author Message
Windows2k3ans98Sharing





PostPosted: Sun Apr 03 20:00:56 CDT 2005 Top

SQL Server Developer >> Keep 1000 newest records... delete all the others

Ok... I'm trying to keep the newest 1000 records in my table,
and delete all the others.

There's got to be an easier/quicker way than this:
I create a derived table, sort it, looking at the 1000 newest
records... then ANOTHER derived table, sort it, looking at the oldest
of those 1000. Then using THAT output to delete all the records older
that record #1000 date.

This sample code keeps the 3 newest records... and deletes 2 older
records...
but the principle should be the same when I use it to keep 1000 newest
records...
and delete about 20000 older records... in my actual application.


(
MyDate DateTime
)







SELECT *

WHERE MyDate<
(
SELECT TOP 1 MyDate
FROM
(
SELECT TOP 3 MyDate

ORDER BY MyDate DESC
) AS t1
ORDER BY MyDate ASC
)

SQL Server87  
 
 
MGFoster





PostPosted: Sun Apr 03 20:00:56 CDT 2005 Top

SQL Server Developer >> Keep 1000 newest records... delete all the others -----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I get the same results w/ this:



Just the order is different.

To me, newest means the latest dates, not the earliest dates. E.g.:

5-Jan-2005 is newer than 1-Jan-2005

If you want the newest dates (my definition) you'd use something like
this:

select top 1000 <column list>
from <table name>
order by date_column desc

If you want the oldest date first:

select * from
(select top 1000 <column list>
from <table name>
order by date_column desc) as a
order by date_column

--
MGFoster:::mgf00 <at> earthlink <decimal-point> net
Oakland, CA (USA)

-----BEGIN PGP SIGNATURE-----
Version: PGP for Personal Privacy 5.0
Charset: noconv

iQA/AwUBQlCRsYechKqOuFEgEQLfJACgkvk5KEhp6oZ01Cz2CpOmKDUlux0AmgNC
zAClof1qV9omRj72uLahmLPt
=Ug3Z
-----END PGP SIGNATURE-----



> Ok... I'm trying to keep the newest 1000 records in my table,
> and delete all the others.
>
> There's got to be an easier/quicker way than this:
> I create a derived table, sort it, looking at the 1000 newest
> records... then ANOTHER derived table, sort it, looking at the oldest
> of those 1000. Then using THAT output to delete all the records older
> that record #1000 date.
>
> This sample code keeps the 3 newest records... and deletes 2 older
> records...
> but the principle should be the same when I use it to keep 1000 newest
> records...
> and delete about 20000 older records... in my actual application.
>

> (
> MyDate DateTime
> )
>





>
> SELECT *

> WHERE MyDate<
> (
> SELECT TOP 1 MyDate
> FROM
> (
> SELECT TOP 3 MyDate

> ORDER BY MyDate DESC
> ) AS t1
> ORDER BY MyDate ASC
> )
>
>
 
 
Alex





PostPosted: Sun Apr 03 22:14:13 CDT 2005 Top

SQL Server Developer >> Keep 1000 newest records... delete all the others So I guess you could

DELETE TableName WHERE YourKey NOT IN (SELECT TOP 1000 * FROM TableName
ORDER BY YourKey ASC (or) DESCc)
One curiosity of the TOP n is that it seems to behave differently during a
regular select clause than it does in the above!
Consider a simple table with an integer column. The values for the Column
are 1,2,3,3,3,4,5.
SELECT TOP 3 * returns 1,2,3
Whereas the above DELETE statement deletes 2 rows and you are left with
1,2,3,3,3
I must confess I don't know what is going on!!! - unless SQL is preventing a
Set overlap in which case depending on the distribution of your
Keys, you may or may not end up deleting everything but 1000 rows????





> Ok... I'm trying to keep the newest 1000 records in my table,
> and delete all the others.
>
> There's got to be an easier/quicker way than this:
> I create a derived table, sort it, looking at the 1000 newest
> records... then ANOTHER derived table, sort it, looking at the oldest
> of those 1000. Then using THAT output to delete all the records older
> that record #1000 date.
>
> This sample code keeps the 3 newest records... and deletes 2 older
> records...
> but the principle should be the same when I use it to keep 1000 newest
> records...
> and delete about 20000 older records... in my actual application.
>

> (
> MyDate DateTime
> )
>





>
> SELECT *

> WHERE MyDate<
> (
> SELECT TOP 1 MyDate
> FROM
> (
> SELECT TOP 3 MyDate

> ORDER BY MyDate DESC
> ) AS t1
> ORDER BY MyDate ASC
> )
>
>


 
 
Dean





PostPosted: Mon Apr 04 05:31:39 CDT 2005 Top

SQL Server Developer >> Keep 1000 newest records... delete all the others alex, i don't understand.. this does exacly what it's expected to do.

in your example, after the inner 'select top 3..' is executed, what you're
left with is something like this:

delete tablename where yourkey not in (1,2,3) --iow, in(4,5)

what you see as an anomaly is result of yourkey column not being unique.

dean



> So I guess you could
>
> DELETE TableName WHERE YourKey NOT IN (SELECT TOP 1000 * FROM TableName
> ORDER BY YourKey ASC (or) DESCc)
> One curiosity of the TOP n is that it seems to behave differently during
> a
> regular select clause than it does in the above!
> Consider a simple table with an integer column. The values for the Column
> are 1,2,3,3,3,4,5.
> SELECT TOP 3 * returns 1,2,3
> Whereas the above DELETE statement deletes 2 rows and you are left with
> 1,2,3,3,3
> I must confess I don't know what is going on!!! - unless SQL is preventing
> a
> Set overlap in which case depending on the distribution of your
> Keys, you may or may not end up deleting everything but 1000 rows????
>
>
>


>> Ok... I'm trying to keep the newest 1000 records in my table,
>> and delete all the others.
>>
>> There's got to be an easier/quicker way than this:
>> I create a derived table, sort it, looking at the 1000 newest
>> records... then ANOTHER derived table, sort it, looking at the oldest
>> of those 1000. Then using THAT output to delete all the records older
>> that record #1000 date.
>>
>> This sample code keeps the 3 newest records... and deletes 2 older
>> records...
>> but the principle should be the same when I use it to keep 1000 newest
>> records...
>> and delete about 20000 older records... in my actual application.
>>

>> (
>> MyDate DateTime
>> )
>>





>>
>> SELECT *

>> WHERE MyDate<
>> (
>> SELECT TOP 1 MyDate
>> FROM
>> (
>> SELECT TOP 3 MyDate

>> ORDER BY MyDate DESC
>> ) AS t1
>> ORDER BY MyDate ASC
>> )
>>
>>
>
>


 
 
David





PostPosted: Mon Apr 04 06:44:49 CDT 2005 Top

SQL Server Developer >> Keep 1000 newest records... delete all the others Why 1000 rows? Might this requirement not change if the volume of
business goes up or down? A more logical method (and much more
efficient) would seem to be to base this on a date rather than a fixed
value:

DELETE FROM YourTable
WHERE dt <= DATEADD(DAY,-28,CURRENT_TIMESTAMP)

You could schedule this as a regular job if you need to.

DISPLAYING only 1000 rows is a different matter. You don't need to
delete rows just because you don't want to display them.

--
David Portas
SQL Server MVP
--

 
 
\





PostPosted: Mon Apr 04 16:10:28 CDT 2005 Top

SQL Server Developer >> Keep 1000 newest records... delete all the others > Why 1000 rows? Might this requirement not change if the volume of
> business goes up or down? A more logical method (and much more
> efficient) would seem to be to base this on a date rather than a fixed

I need to keep the 1000 newest. (That's *MORE* than we'll really need. But
we *DO* want
to keep "a certain number"... not.... "delete based by dates".)

> DISPLAYING only 1000 rows is a different matter. You don't need to
> delete rows just because you don't want to display them.

I need to "delete". (My example was written to "display"... just so I
could see what I was about to delete... before
actually deleting.)



 
 
\





PostPosted: Mon Apr 04 16:22:17 CDT 2005 Top

SQL Server Developer >> Keep 1000 newest records... delete all the others I also tried the "NOT IN" method... and it seemed to work... but I was
assuming that might be slower to execute.

I "think" that it's doing this... but it might not be:

> Get 1000 newest records... sort them.... but return the 20000 older
> records instead.
> Make a massive list of 20000 comma separated dates. (Ugh)
> Delete them... 1-by-1.
> If the same date appears 5000 times.... search for it each time... and
> delete it each time. (Double ugh.)

Maybe MS-SQL is smart enough to know NOT to do it the hard way... and is
optimizing it automatically.
I really don't know.

What I'd *REALLY* like to do... is find the date of record number 1001
(sorted by date, newest first)... and
just do a single, quick delete... based on that 1 date.
DELETE FROM myTable

Done.

But what's the easiest/quickest way to find the date of the 1001st record
(sorted by date, newest first)?
(Hopefully with a single SQL statement.)

Thanks for everyone's good ideas.



> DELETE TableName WHERE YourKey NOT IN (SELECT TOP 1000 * FROM TableName
> ORDER BY YourKey ASC (or) DESCc)
> One curiosity of the TOP n is that it seems to behave differently during
> a
> regular select clause than it does in the above!
> Consider a simple table with an integer column. The values for the Column
> are 1,2,3,3,3,4,5.
> SELECT TOP 3 * returns 1,2,3
> Whereas the above DELETE statement deletes 2 rows and you are left with
> 1,2,3,3,3
> I must confess I don't know what is going on!!! - unless SQL is preventing
> a
> Set overlap in which case depending on the distribution of your
> Keys, you may or may not end up deleting everything but 1000 rows????
>
>
>


>> Ok... I'm trying to keep the newest 1000 records in my table,
>> and delete all the others.
>>
>> There's got to be an easier/quicker way than this:
>> I create a derived table, sort it, looking at the 1000 newest
>> records... then ANOTHER derived table, sort it, looking at the oldest
>> of those 1000. Then using THAT output to delete all the records older
>> that record #1000 date.
>>
>> This sample code keeps the 3 newest records... and deletes 2 older
>> records...
>> but the principle should be the same when I use it to keep 1000 newest
>> records...
>> and delete about 20000 older records... in my actual application.
>>

>> (
>> MyDate DateTime
>> )
>>





>>
>> SELECT *

>> WHERE MyDate<
>> (
>> SELECT TOP 1 MyDate
>> FROM
>> (
>> SELECT TOP 3 MyDate

>> ORDER BY MyDate DESC
>> ) AS t1
>> ORDER BY MyDate ASC
>> )
>>
>>
>
>


 
 
Dean





PostPosted: Mon Apr 04 16:38:12 CDT 2005 Top

SQL Server Developer >> Keep 1000 newest records... delete all the others

>I also tried the "NOT IN" method... and it seemed to work... but I was
>assuming that might be slower to execute.
>
> I "think" that it's doing this... but it might not be:
>
>> Get 1000 newest records... sort them.... but return the 20000 older
>> records instead.
>> Make a massive list of 20000 comma separated dates. (Ugh)
>> Delete them... 1-by-1.
>> If the same date appears 5000 times.... search for it each time... and
>> delete it each time. (Double ugh.)
>
> Maybe MS-SQL is smart enough to know NOT to do it the hard way... and is
> optimizing it automatically.
> I really don't know.
>
> What I'd *REALLY* like to do... is find the date of record number 1001
> (sorted by date, newest first)... and
> just do a single, quick delete... based on that 1 date.
> DELETE FROM myTable

> Done.
>
> But what's the easiest/quickest way to find the date of the 1001st record
> (sorted by date, newest first)?
> (Hopefully with a single SQL statement.)
>
> Thanks for everyone's good ideas.


something like this?

delete myTable
where myDate<(select min(myDate) from (select top 1001 myDate from myTable
order by myDate desc))

dean


 
 
\





PostPosted: Mon Apr 04 16:53:48 CDT 2005 Top

SQL Server Developer >> Keep 1000 newest records... delete all the others Dean, I couldn't get your example to work... unless I changed that SELECT
TOP x statement
to a derived table. Then it seems to do exactly what I need. Thanks.


(
MyDate DateTime
)










SELECT *

WHERE MyDate<
(
SELECT MIN(MyDate)
FROM
(
SELECT TOP 3 MyDate

ORDER BY MyDate DESC
) AS t
)







>>I also tried the "NOT IN" method... and it seemed to work... but I was
>>assuming that might be slower to execute.
>>
>> I "think" that it's doing this... but it might not be:
>>
>>> Get 1000 newest records... sort them.... but return the 20000 older
>>> records instead.
>>> Make a massive list of 20000 comma separated dates. (Ugh)
>>> Delete them... 1-by-1.
>>> If the same date appears 5000 times.... search for it each time... and
>>> delete it each time. (Double ugh.)
>>
>> Maybe MS-SQL is smart enough to know NOT to do it the hard way... and is
>> optimizing it automatically.
>> I really don't know.
>>
>> What I'd *REALLY* like to do... is find the date of record number 1001
>> (sorted by date, newest first)... and
>> just do a single, quick delete... based on that 1 date.
>> DELETE FROM myTable

>> Done.
>>
>> But what's the easiest/quickest way to find the date of the 1001st record
>> (sorted by date, newest first)?
>> (Hopefully with a single SQL statement.)
>>
>> Thanks for everyone's good ideas.
>
>
> something like this?
>
> delete myTable
> where myDate<(select min(myDate) from (select top 1001 myDate from myTable
> order by myDate desc))
>
> dean
>
>


 
 
\





PostPosted: Mon Apr 04 16:58:37 CDT 2005 Top

SQL Server Developer >> Keep 1000 newest records... delete all the others > I get the same results w/ this:

> Just the order is different.
> To me, newest means the latest dates, not the earliest dates. E.g.:
> 5-Jan-2005 is newer than 1-Jan-2005

I agree. (But I've never heard anyone say "the socks I bought back in 1998
are newer than the
ones I just bought in 2005.)

Newer = more recently = 05-Jan-2005 = MAX(MyDate) = DESC sorted
Older = long ago = 01-Jan=1998 = MIN(MyDate) = ASC sorted

I hope I've got that correct. (But I can't find anything in my example...
or anyone's reply... that says otherwise.)

> If you want the newest dates (my definition) you'd use something like
> this:
>
> select top 1000 <column list>
> from <table name>
> order by date_column desc
>
> If you want the oldest date first:
>
> select * from
> (select top 1000 <column list>
> from <table name>
> order by date_column desc) as a
> order by date_column
>
> --
> MGFoster:::mgf00 <at> earthlink <decimal-point> net
> Oakland, CA (USA)
>
> -----BEGIN PGP SIGNATURE-----
> Version: PGP for Personal Privacy 5.0
> Charset: noconv
>
> iQA/AwUBQlCRsYechKqOuFEgEQLfJACgkvk5KEhp6oZ01Cz2CpOmKDUlux0AmgNC
> zAClof1qV9omRj72uLahmLPt
> =Ug3Z
> -----END PGP SIGNATURE-----
>
>

>> Ok... I'm trying to keep the newest 1000 records in my table,
>> and delete all the others.
>>
>> There's got to be an easier/quicker way than this:
>> I create a derived table, sort it, looking at the 1000 newest
>> records... then ANOTHER derived table, sort it, looking at the oldest
>> of those 1000. Then using THAT output to delete all the records older
>> that record #1000 date.
>>
>> This sample code keeps the 3 newest records... and deletes 2 older
>> records...
>> but the principle should be the same when I use it to keep 1000 newest
>> records...
>> and delete about 20000 older records... in my actual application.
>>

>> (
>> MyDate DateTime
>> )
>>





>>
>> SELECT *

>> WHERE MyDate<
>> (
>> SELECT TOP 1 MyDate
>> FROM
>> (
>> SELECT TOP 3 MyDate

>> ORDER BY MyDate DESC
>> ) AS t1
>> ORDER BY MyDate ASC
>> )
>>

 
 
\





PostPosted: Mon Apr 04 17:15:07 CDT 2005 Top

SQL Server Developer >> Keep 1000 newest records... delete all the others Another thought. This table gets VERY heavy use on some days/weeks... and
then
hardly any use during other weeks.

"Delete by date" would leave us with a huge, bloated table during "heavy
periods".
and...
"Very few records" during the "light weeks".
(A table with 150,000 records... or 89... are both useless to us.)

"Delete everything, but always keep 1000 newest".... seems to solved both
problems.

===================

While we are on the subject... maybe you can answer another question:

Have you ever had a foolish Sys Admin guy.... accidentally set the server's
clock incorrectly?
Maybe get confused around DST and turn it back 1 hour... instead of forward?

Or sometime in January 2005... accidentally set it to "Jan 2004"? (He's
most likely the same guy
that continues to write "last year" on all his checks... until March!)

What happens when I DELETE based on GetDate() then?
I delete a year's worth of records... that I wanted to keep.
Or keep a year's worth... that I wanted to delete.

We started using the Atomic Clock Auto-Setting feature on our server... but
is that always correct?
(Or could the site we get our time from... also be incorrect?)

Unlikely? Impossible?

I guess the most we will EVER need to change a server's system clock would
be +/- 1 or 2 hours.
Is there a way to absolutely prevent it from EVER being changed by weeks,
months, years (accidentally)?
(A BIOS setting?)

... or have and "absolute earliest date":
As of 04-Apr-2005... we will NEVER need to set it earlier than 04-Apr-2005.
So forbid it.




> Why 1000 rows? Might this requirement not change if the volume of
> business goes up or down? A more logical method (and much more
> efficient) would seem to be to base this on a date rather than a fixed
> value:
>
> DELETE FROM YourTable
> WHERE dt <= DATEADD(DAY,-28,CURRENT_TIMESTAMP)
>
> You could schedule this as a regular job if you need to.
>
> DISPLAYING only 1000 rows is a different matter. You don't need to
> delete rows just because you don't want to display them.