DB Corruption Simulator

One of the big problems with administering/supporting databases, applications or any sort of IT infrastructure is that sometimes you can’t learn how to deal with Disaster Recovery scenarios until you experience them in a real life situation. No amount of reading or practice makes up for dealing with a real disaster. But how often do these scenarios actually happen? How can someone who wants to become a DBA get that kind of experience?

This post is designed to look at how to recover actual corrupted databases.

Before we start, here is my disclaimer: DO NOT TRY THIS ON A PRODUCTION DATABASE OR EVEN ON AN INSTANCE THAT IS USED IN A LIVE PRODUCTION ENVIRONMENT. I only ever simulate database corruption in my own isolated test environment.

For this simulation, you can use the AdventureWorks database, a copy of one your own test databases, or you can create a new test one with a table or two (with clustered indexes). For my example, I’m using a copy of a database I created called ‘Wine’.

1) First We want to make sure we have backed up the database and log file of the database we want to deliberately corrupt.

2) Now comes the corruption part. There are two different ways of doing this. We can either manually edit a .mdf file with a HEX editor. I use HxD for this. I simply open the file with this, highlight a number of ’00’ rows and use Find and Replace to change a load of ’00s’ with ‘0X’. Save the file and voila, you have a corrupted database file. You should be able to see in SSMS that your database has now been corrupted and is unusable.

3) Take a tail-log backup of your database with the NO_TRUNCATE option.

USE [MASTER]
BACKUP LOG [Wine] TO  DISK = N'F:\MSSQL\Backup\WineLogTail.bak' 
WITH DESCRIPTION = N'Daily Transaction Log Backup', NO_TRUNCATE, NORECOVERY, INIT, 
NAME = N'Wine-Transaction Log Backup', CHECKSUM,  STATS = 10
GO

4) Now Restore the Database with NORECOVERY OPTION

RESTORE DATABASE [WINE]
FROM DISK = N'F:\MSSQL\Backup\Wine.bak'
WITH NORECOVERY
GO

5) Restore your log files with NORECOVERY OPTION in order with the oldest first.

RESTORE LOG [WINE]
FROM DISK = N'F:\MSSQL\Backup\WineLog.bak'
WITH NORECOVERY
GO

6) Now restore your tail-log back up.

RESTORE LOG [WINE]
FROM DISK = N'F:\MSSQL\Backup\WineLogTail.bak'
WITH NORECOVERY
GO

7) Restore your database with the RECOVERY option then run your DBCC and T-SQL checks.

RESTORE DATABASE [WINE]
WITH RECOVERY
GO

8) in a real life situation, after a successful restore, I would take another full backup. 🙂

PAGE RESTORES

The example above simulates a full database restore which, depending on the size of your database, could take some time. What about Page, File or Filegroup Restores?

The following example will show you how you can corrupt individual pages to allow you to simulate this DR scenario. Again, DO NOT TRY THIS ON A PRODUCTION DATABASE OR EVEN ON AN INSTANCE THAT IS USED IN A LIVE PRODUCTION ENVIRONMENT.

1) Back up your database and log file

2) Now choose a table in your database which has a clustered index and run the following DBCC command which will output the page summary for the clustered index on this table (as long as the index id = 1):

DBCC IND (Wine, 'categories.Grape', 1)
GO

3) For this example, I am going to corrupt page 517. The command DBCC WRITEPAGE ({dbname | dbid}, fileid, pageid, offset, length, data [, directORbufferpool]) allows you to edit individual pages so PLEASE BE CAREFUL and DO NOT RUN THIS ON A LIVE PRODUCTION DATABASE. There is a reason why this command isn’t published by Microsoft. To run this command, you will have to put the database into SINGLE_USER mode first.

DBCC WRITEPAGE('Wine', 1, 517, 60, 1, 0x00, 1)

4) If you now run a ‘SELECT * FROM’ on the table you corrupted, you should receive an error.

Msg 824, Level 24, State 2, Line 3
SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0x9dc57c00; actual: 0x9dc57c37). It occurred during a read of page (1:517) in database ID 8 at offset 0x0000000040a000 in file ‘D:\MSSQL\DATA\Wine.mdf’. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

5) If you run DBCC CHECKDB (which you would in a real DR situation) you can identify the pages that are corrupted.

6) As before, Perform a tail-log backup of your log file with the NO_TRUNCATE option

USE [MASTER]
BACKUP LOG [Wine] TO  DISK = N'F:\MSSQL\Backup\WineLogTail.bak' 
WITH DESCRIPTION = N'Daily Transaction Log Backup', NO_TRUNCATE, NORECOVERY, INIT, 
NAME = N'Wine-Transaction Log Backup', CHECKSUM,  STATS = 10
GO

7) Now Restore the database pages with the NORECOVERY OPTION. If you have more than one page corrupted, you would separate each page you want to restore with commas.

RESTORE DATABASE [WINE]
PAGE = '1:517' --Separate pages by commas if you have more pages to restore
FROM DISK = N'F:\MSSQL\Backup\Wine.bak'
WITH NORECOVERY
GO

8) Restore your log files in order with the oldest first

RESTORE LOG [WINE]
FROM DISK = N'F:\MSSQL\Backup\WineLog.bak'
WITH NORECOVERY
GO

9) Restore your tail-log backup

RESTORE LOG [WINE]
FROM DISK = N'F:\MSSQL\Backup\WineLogTail.bak'
WITH NORECOVERY
GO

10) Restore your Database with the RECOVERY option

RESTORE DATABASE [WINE]
WITH RECOVERY
GO

11) Run your DBCC and T-SQL checks and once completed, set back to MULTI_USER

--RUN Checks
ALTER DATABASE [WINE] SET SINGLE_USER WITH ROLLBACK IMMEDIATE
GO

USE [Wine]
DBCC CHECKDB;
GO
 
SELECT * FROM Categrories.Grape
GO
 
ALTER DATABASE [WINE] SET MULTI_USER
GO

Congratulations! – You have averted disaster. Hopefully.

One other thing, if you were completing a page restore in the Enterprise version of SQL Server, you could complete the restore with the database still online. In this scenario, you would complete your tail-log backup after you have restored your other transaction logs and then restore your tail-log before recovering the database.