Batch Updates

I have a table with 1 Billion rows. I need to update at least 10 Million of these rows and not affect concurrency or general database performance.

I cannot run an update on all the rows at once as this will lock millions of rows, use up all server memory and cause a lot of disk I/Os. The transaction log will also fill up and could also cause service interruption.

The solution is run your update in batches of manageable amounts of 1000 using the TOP statement. You can also use a WHILE loop so that you can continue to run the batches until @@ROWCOUNT = 0 then the WHILE LOOP will stop as all batches will have been updated.

Before running the Code, I would also create a backup of the data I am updating. I would also use Explicit Transactions.

Before I ran my update, I not only created a backup of my original data beforehand but I have made it into a lookup table so that I referenced the old data and the new data and then I am using the lookup table to perform my update.

You can use this batch approach on UPDATES, INSERTS and DELETES.

To speed up the process even more, you could also drop any nonclustered indexes beforehand and then recreate them afterwards as otherwise they would be updated after every update operation but this will depend on whether the indexes will be needed by front-end users. But you will need to weigh up the benefit of the indexes for the users that are using the database/application.

WHILE (1 = 1)
 
BEGIN
BEGIN TRANSACTION
 
UPDATE TOP(1000) doc_table
SET 3_ref = b.new_3_ref
FROM DOCUMENT AS doc_table
INNER JOIN 
ref_lookup AS b
ON doc.3_ref = b.old_3_ref
 
/* If no more rows to be updated, commit transaction and break while loop */
IF @@ROWCOUNT = 0 
 
BEGIN
 
COMMIT TRANSACTION
 
BREAK
 
END

-- Commit last 1000 rows
COMMIT TRANSACTION
 
 
END
 
GO

When I have run this, I have also included  a look up table to update the last row modified. That way if the update gets interrupted, we have a record of the last row that was updated so we can pick up where we left off.