SQL Blocking and Client Variable Purge

We've had continual issues with our ColdFusion applications locking up on us. I've spent the last week cleaning out huge tables, creating indexes, and tuning queries, but could never seem to track down what was causing the issue. Today I caught it in the act... As the the applications locked up, I ran the following query in SSMS:

SELECT
   st.text
   , r.blocking_session_id
   , r.session_id, r.status
   , r.command
   , r.cpu_time
   , r.total_elapsed_time
FROM
   sys.dm_exec_requests r
   CROSS APPLY
      sys.dm_exec_sql_text(sql_handle) AS st

This showed me what queries were getting blocked, and who was doing the blocking. It turns out that it was ColdFusion's Client Variable purging that was causing the blocking. This is the query that was running:

DELETE FROM
CDATA WHERE CDATA.cfid in
   (SELECT CGLOBAL.cfid
   FROM CGLOBAL
   WHERE CGLOBAL.lvisit < {date} )
DELETE FROM CGLOBAL WHERE CGLOBAL.lvisit < {date}

Unfortunately, there are no lock hints anywhere on the query. With 7,500 records to delete in a table with more than 30,000,000 records, the table was getting locked for more than 2 minutes, a complete disaster for a production application dependent on client variables.

So I went to work creating a new query that will delete the client variables at a much more reasonable pace, with only as much locking as is necessary to keep things humming along smoothly. The new query looks like this:

SET NOCOUNT ON
DECLARE @cfid char(64), @rowsaffected int, @rowsdeleted int;
SET @rowsaffected = 0;
SET @rowsdeleted = 1;
WHILE @rowsdeleted > 0 AND @rowsaffected < 100
   BEGIN
      BEGIN TRANSACTION
         SET @cfid = (
            SELECT TOP 1 CGLOBAL.cfid
            FROM CGLOBAL (NOLOCK)
            WHERE CGLOBAL.lvisit < DATEADD(d, -30, GETDATE())
            ORDER BY CGLOBAL.lvisit
         )
         DELETE FROM CDATA WITH (ROWLOCK) WHERE CDATA.cfid = @cfid
         DELETE FROM CGLOBAL WITH (ROWLOCK) WHERE CGLOBAL.cfid = @cfid
         SET @rowsdeleted = (SELECT @@ROWCOUNT);
         SET @rowsaffected = @rowsaffected + 1;
      COMMIT
      
   END
PRINT CAST(@rowsaffected AS varchar) + ' client records deleted'

The query will loop through and delete one record at a time, until it hits the limit of rows affected specified. This is set up as a scheduled task in CF so that I know exactly when it will happen, and can watch for issues if there is a failure using existing global error catching.

7 responses to “SQL Blocking and Client Variable Purge”

  1. Brad WoodBrad Wood Says:
    30 Million records-- that's a sizable chunk. How much traffic does your site see? I would think you could get away with doing more than one record at a time though. If you're on MS SQL server, it's going to try and escalate to a full table lock after it deletes 1024 rows.
    Did you find a way to keep CF from trying to purge the client records itself?
  2. Daniel ShortDaniel Short Says:
    It's definitely a busy site. So having 30 million records in 90 days isn't unheard of. I've set it to delete any records over 30 days old now and it's slowly catching up :).

    As for CF purging the records, you can change that option when you edit your client store in the CF admin. Just click on the datasource you're using on the Client Variables page and you can uncheck the purge option to keep CF from trying to do the work.
  3. Ciaran ArcherCiaran Archer Says:
    Hi there - I'm looking into some problems we're having with our CV trim job, and our DB would be equally busy, so I'd love to hear thoughts on the following:

    Brad: if you delete it one row at a time as Daniel suggests, and each delete is in it's own transaction block, why does MS SQL Server try and escalate it to a full table lock after 1024 rows?

    Daniel: can I ask why delete it one by one, why not just specify the rowlock in the bulk delete statement above? E.g.

    /* do deletion */
    delete from cdata with (rowlock) where cfid in (
    select cfid from cglobal (nolock) where lvisit < @timeThreshold
    )


    delete from cglobal with (rowlock) where lvisit < @timeThreshold

    What's the difference in doing it one at a time?

    Thanks!
  4. Daniel ShortDaniel Short Says:
    Then problem is that you can't guarantee that SQL server will honor the RowLock statement. If it feels it needs a TABLELOCK to do what it needs to do, it will escalate the lock on it's own. Deleting it one at a time ensure that a TABLELOCK won't happen and freeze up your entire app...
  5. Brad WoodBrad Wood Says:
    @Ciaran: Sorry if my comment was confusing. If you delete one record at a time it their own transaction, a lock escalation should NOT occur. In my original comment I had said "I would think you could get away with doing more than one record at a time". The main problem with that like Daniel explained is lock escalation. Locking hints are just that-- hints. If SQL Server decides it's tired of keeping track of row locks and that it would be cheaper to just lock the entire table, it will try and request an exclusive table lock.

    One could probably use rowcount and delete maybe a few hundred at a time, but you would have to try it and see how well it worked.
  6. Daniel ShortDaniel Short Says:
    Yep, I tried a couple hundred at a time when I first hit the issue and continued to get unwanted locks on the tables. I really wish I could say "no really SQL, I mean ROWLOCK damnit!" Don't they know I'm in charge? :)
  7. Ciaran ArcherCiaran Archer Says:
    Thanks for that feedback guys.

    Is there a way you would suggest to measure the amount of lock escalation in any given minute? We have a job that runs currently every 15 mins and the job trys to delete rows with a last hit older than 2 hours. Before I implement a row-by-row delete I was hoping I would measure the amount of lock escalation going on right now. I suspect I will see an increase in lock escalation every 15 minutes, as my job runs. If I can then implement the new method I should see the amount of lock escalation decrease overall - or at least see no more 'spikes' over 15 mins.

    I guess doing something via perfmon might be an option? I'd be interested to know your thoughts. We are getting periodic site slowness which I think is related to this client variables trimming job, but I would love to prove it :)

    Thanks in advance!

Leave a Reply

Leave this field empty:

Powered by Mango Blog.