SQL Agent and AlwaysON Availability Groups

So we all know that SQL Agent isn’t part of an availability group and therefore (like logins, operators, alerts, linked servers etc) your SQL jobs won’t sync. But even when they are synced, I’m sure there are plenty that should only run on the Primary Node and therefore shouldn’t run on the secondaries.

Using the scripts and steps from this blog, it’s possible to add in an extra step to your jobs to check to see whether the agent job is currently running on a Primary or Secondary and then either carry on or stop the job depending on which node the job should be running on (most non-maintenance jobs will probably be running on the primary).

I’ve dealt with a legacy database/application recently where the SQL agent is linked to the application so that when someone creates a scheduled job in the application, it then creates a corresponding SQL job. The application obviously will only copy these to the current primary so I need to be able to copy them across to the other nodes and make them aware of the node they are currently running on. Which is where the above scripts on the aforementioned blog came in very handy. The only problem is. The scheduled jobs are added and amended constantly by the business and I mean CONSTANTLY. The previous workaround was that the job schedulers would have to raise a ticket to get their jobs cloned on each server. It didn’t take long for this to become a really, really tedious task.

If there’s a will, it can be automated (mostly).

I’ve been using the Copy-DBAAgentJob function available as part of the dbatools Powershell module to move these jobs across after I’ve set up the get_availavbility_group_role step in my jobs but it’s still pretty tedious having to complete this task day in, day out so this is how I’ve ended up automating the  process.

  1. I’ve created the function ‘fn_hadr_group_is_primary’ which can live in the master database or you can put it in your own monitor database (like I do).
-- fn_hadr_group_is_primary
USE master;
GO
IF OBJECT_ID('dbo.fn_hadr_group_is_primary', 'FN') IS NOT NULL
  DROP FUNCTION dbo.fn_hadr_group_is_primary;
GO
CREATE FUNCTION dbo.fn_hadr_group_is_primary (@AGName sysname)
RETURNS bit
AS
BEGIN;
  DECLARE @PrimaryReplica sysname; 

  SELECT
    @PrimaryReplica = hags.primary_replica
  FROM sys.dm_hadr_availability_group_states hags
  INNER JOIN sys.availability_groups ag ON ag.group_id = hags.group_id
  WHERE ag.name = @AGName;

  IF UPPER(@PrimaryReplica) =  UPPER(@@SERVERNAME)
    RETURN 1; -- primary

    RETURN 0; -- not primary
END; 

2) Taking the script to update multiple jobs with the ‘get_availability_group_role step. I have turned this into a Stored Procedure called ‘CreateAGJobStep’.  Basically this script looks for jobs without the ‘get_availability_group_role’ step and puts them in a holding table (this lives in my monitoring database but this could live in master, if you wanted to) and updates the jobs with the new step.  Now the caveat; This currently only works if you have 1 availability group set up. If you have multiples you may need to remove the part that places the availability group name into a variable and hard code the group names in. I’ll let you know when I’ve created a script that deals with multiple AGs.

You can also add in any names of jobs you want permanently excluded from this update. I generally like to leave all my maintenance and monitoring jobs as they are as they are either already aware of the Availability Group (indexing, statistics) or I require them to run on all or other nodes ( CheckDBs, backups etc).  I always prefix these jobs with ‘DBA: ‘ so I can easily identify them in the job list but you can also add individual ones if required.

USE [Master] -- I normally keep this in my own monitoring database
GO


SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

IF OBJECT_ID('CreateAGJobStep') IS NULL
  EXEC ('CREATE PROCEDURE CreateAGJobStep AS RETURN 0;')
GO

ALTER PROCEDURE CreateAGJobStep
AS
BEGIN
SET NOCOUNT ON


SET NOCOUNT ON; SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; 

TRUNCATE TABLE tools.dbo.AgentJobsToCopy --This is a table to hold the information for the next step in the agent job.


IF OBJECT_ID(N'tempdb.dbo.#data', N'U') IS NOT NULL 
BEGIN
DROP TABLE dbo.#data;
END

CREATE TABLE dbo.#data 
(id int IDENTITY PRIMARY KEY, 
name sysname);


-- Get all job names exclude jobs that already have a step named 'get_availability_group_role'
INSERT INTO #data (name)
SELECT DISTINCT j.name--, s.step_name 
FROM msdb.dbo.sysjobs j
    EXCEPT
SELECT DISTINCT j.name
FROM msdb.dbo.sysjobs j
INNER JOIN msdb.dbo.sysjobsteps s ON j.job_id = s.job_id
WHERE s.step_name = N'get_availability_group_role';

-- Remove jobs that need to run on any replica (This will be mainly maintenance jobs that are already AG Aware)
DELETE FROM #data WHERE name LIKE 'DBA:%'; --All my maintenance jobs start with 'DBA:' so I can easily identify them



--Insert data and create table in Tools to use with PowerShell Script later
INSERT INTO Tools.dbo.AgentJobsToCopy (ID, Name)
SELECT ID, Name
FROM #data

DECLARE @command varchar(max), @min_id int, @max_id int, @job_name sysname, @availability_group sysname;
SELECT  @min_id = 1, @max_id = (SELECT MAX(d.id) FROM #data AS d);

SELECT @availability_group = (SELECT ag.name FROM sys.availability_groups ag);

-- If this is instance does not belong to HA exit here
IF @availability_group IS NULL 
BEGIN;
    PRINT 'This instance does not belong to AG. Terminating.';
    RETURN;
END;


DECLARE @debug bit = 0; --<------ print only 

-- Loop through the table and execute/print the command per each job
WHILE @min_id <= @max_id
BEGIN;
        SELECT @job_name = name FROM #data AS d WHERE d.id = @min_id;

        SELECT @command = 
        'BEGIN TRAN;
        DECLARE @ReturnCode INT;
        EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_name=''' + @job_name + ''', @step_name=N''get_availability_group_role'', 
                @step_id=1, 
                @cmdexec_success_code=0, 
                @on_success_action=3, 
                @on_success_step_id=0, 
                @on_fail_action=2, 
                @on_fail_step_id=0, 
                @retry_attempts=0, 
                @retry_interval=0, 
                @os_run_priority=0, @subsystem=N''TSQL'', 
                @command=
        N''-- Detect if this instance''''s role is a Primary Replica.
-- If this instance''''s role is NOT a Primary Replica stop the job so that it does not go on to the next job step
DECLARE @rc int; 
DECLARE @jobid uniqueidentifier
SET @jobid = $(ESCAPE_NONE(JOBID))
EXEC @rc = master.dbo.fn_hadr_group_is_primary N''''' + @availability_group + ''''';

IF @rc = 0
BEGIN;
    DECLARE @name sysname;
    SELECT  @name = (SELECT name FROM msdb.dbo.sysjobs WHERE job_id = @jobid);
    
    EXEC msdb.dbo.sp_stop_job @job_name = @name;
    PRINT ''''Stopped the job since this is not a Primary Replica'''';
END;'', 
        @database_name=N''Master'',
        @flags=0
IF (@@ERROR <> 0 OR @ReturnCode <> 0) 
BEGIN; 
    PRINT ''-- Rollback: ''''' + @job_name + ''''''' ROLLBACK TRAN; 
END;
ELSE COMMIT TRAN;' + CHAR(10);

        PRINT @command;
        IF @debug = 0 EXEC (@command);

    SELECT @min_id += 1;
END;

END

3) Next I created a holding table for my agent jobs that I wished to copy. You can put this in the Master Database or in another monitoring type database which is what I do:

USE [Tools]
GO
If Object_id ('AgentJobsToCopy','u') IS NOT NULL
BEGIN
DROP TABLE [dbo].[AgentJobsToCopy]
END

GO

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[AgentJobsToCopy](
  [ID] [int] NOT NULL,
  [Name] [nvarchar](128) NOT NULL
) ON [PRIMARY]
GO

4) Next I decided to put my very shaky PowerShell skills to use. I created the following script which reads the SQL jobs names from the holding table and places them in an array, it also gets a list of all the secondary nodes in the availability group. The script then iterates through the array and executes the Copy-DBAAgentJob for each SQL job for every instance taking part in the Availability Group.

$DBServer = $env:computername
$databasename = "master"
$Connection = new-object system.data.sqlclient.sqlconnection #Set new object to connect to sql database
$Connection.ConnectionString ="server=$DBServer;database=$databasename;trusted_connection=True" # Connectionstring setting for local machine database with window authentication
Write-host "Connection Information:"  -foregroundcolor yellow -backgroundcolor black
$Connection #List connection information

### Connect to Database and Run Query

$SqlCmd = New-Object System.Data.SqlClient.SqlCommand #setting object to use sql commands

$SqlQuery = "
SELECT Replica_server_name FROM sys.availability_replicas WHERE Replica_Server_name NOT IN (select primary_replica FROM sys.dm_hadr_name_id_map nim inner join sys.dm_hadr_availability_group_states ags on nim.ag_id = ags.group_id)
"
$SQLQuery2 ="
SELECT name FROM tools.dbo.AgentJobsToCopy
"

$Connection.open()
Write-host "Connection to database successful." -foregroundcolor green -backgroundcolor black
$SqlCmd.CommandText = $SqlQuery
$SqlAdapter = New-Object System.Data.SqlClient.SqlDataAdapter
$SqlAdapter.SelectCommand = $SqlCmd
$SqlCmd.Connection = $Connection
$DataSet = New-Object System.Data.DataSet
$SqlAdapter.Fill($DataSet)


$SqlCmd.CommandText = $SqlQuery2
$SqlAdapter = New-Object System.Data.SqlClient.SqlDataAdapter
$SqlAdapter.SelectCommand = $SqlCmd
$SqlCmd.Connection = $Connection
$Jobs = New-Object System.Data.DataSet
$SqlAdapter.Fill($Jobs)
$Connection.Close()

foreach ($Job in $Jobs.Tables[0].Rows)
{ 
$AgentJob = $($Job[0])
foreach ($Row in $DataSet.Tables[0].Rows)
{ 
  $DestServer = $($Row[0])
  Copy-DBAAgentJob -Source $DBServer -Destination $DestServer -Job $AgentJob -Force
}

}

I saved the above script and called it ‘CopyAgentJob_AG.ps1’.

Now to automate the whole task, I created an SQL Job to run everything. But BEFORE we do that there are a few other things you need to get in place:

Download and Install the DBAtools PowerShell module to each server participating in the Availability Group (I now do this as a part of my standard build).

You’ll need a windows account that can access and execute the powershell script. This then needs to be set up as a credential in SQL Server and then an Agent Proxy under ‘Operating System (CMDExe)’.

You’ll also need to copy the powershell script to a UNC path or a location on the server where it can be accessed and executed. If you copy it to the server, it will have to be in place on all nodes (so when your server failsover, the script can used by the new primary node).

Agent Job

  1. Create the first step so that runs and executes the stored procedure ‘CreateAGJobStep’. Ironically, before I run the stored procedure  I run the code to check that I’m running this job on the Primary Node.

2. Create a second step that executes your PowerShell script with the command:

C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe "& 'E:\DBA\PowerShell\CopyAgentJob_AG.ps1'"

There are different ways to execute the PowerShell script in SQL Server but that is beyond the scope of this blog. But you can read about it here and here.

Once this jobs is set up, you’ll need to test it. I created a number of jobs with a single T-SQL step on the Primary node and then ran the job (on a non-production server, of course!) to confirm that the jobs had the ‘get_availability_group_role’ inserted as the first step and the jobs then got copied across to all secondary nodes.

Once I was happy that it worked as expected, I could then add it to production and scheduled it to run nightly (or hourly if you prefer) or as and when required.