dbo.sp_MSdistribution_cleanup BUG

There’s a good chance that if you have ever looked after a production SQL environment that you may have come across replication before, this technology is incredibly handy for getting data from A to B (or C, D etc), one of the best ways I’ve heard someone describe this tech is – “oh, so it’s copy and paste then?” and yes, in a way it is!

One of the big bug bares of this technology for me is its maintenance, there’s a job added after the initial setup which is due to run every 15 minutes and clears up any commands/transactions which match the criteria you’re looking for, in most cases these will be defaulted to 0 and 72 (@min_distretention and @max_distretention respectively). Recently we’ve had issue with this job which is that its run-time both far exceeds its schedule and that it causes insane amounts of blocking which consequentially takes all scheduler time meaning that no new rows can be taken from the logreader and added to the distributor DB – not ideal! If you do a quick scan around the internet for any information on how to reduce your distributor size or how to increase your throughput you’ll see a lot of mention for the option immediate_sync, massive warning here, if you have more than one publication on one database and you have a mix of this flag and different types on replication (snapshot and transaction in my case) this will break the clean-up job without telling you that it’s actually broken…! What will end up happening is that you’ll have loads of rows in your DB which aren’t being removed past their retention period.

So, why does this cause an issue? Well to get to the bottom of that you’ll need to dig into Microsoft code. Thankfully, since the distribution DB is a system DB which acts more like a user DB you can actually read through all the code and create your own versions of the procs (if you really want). The issue comes from the following proc which is about 3 levels down into the clean-up sproc.

repl-2

At this point during the clean-up process the code is in a cursor which is looking to obtain @max_cleanup_xact_seqno, this tells the clean-up job what the last seqno is that it can clean up till, this is based on the time parameters which you pass to the clean-up job, as mentioned earlier these are usually 0 and 72 which are translated as hours. What was happening in my environment was that 0x00 was being returned, which was causing the next proc in the chain (delete of commands & trans) to not process anything, as the code at the bottom of the proc states

Thanks MS!!
Thanks MS!!

But why would it return 0x00 (effectively NULL going off the comments)? Well, this is to do with the declared cursor at the top of this sproc, if you have any immediate_sync = false publications this will enter a loop for each agent which exists pulling out the minimum xact sequence number for each agent, this later on affects the final set of the @max_cleanup_xact_seqno = xact_seqno as the minimum for this publisher DB (ID 1 in my case) is not related at all to the immediate_sync = true publications, their minimum sequence numbers are way off vs the rest of my publications for the DB. In my opinion this is a massive bug, if you read the advice online and switch a few of your articles immediate_sync settings but not the rest (test the water or something) you may well end up with a very large DB, very quickly! In my case we were at the billion row mark before I found this! Be cautious!

It’s also worth noting there are some big problems in the way this procedure runs which can cause you some big issues when running, I’d personally recommend a few tweaks to help your system run;

Change (or create a new proc) for dbo.sp_MSsubscription_cleanup (I called all mine dbo.sp_[whatever]_MAXDOP) and set
* REPETABLE READ to NOLOCK on the first update statement
* Set the query to have OPTION(MAXDOP 1), this stops blocking and ensures you’re not killing your server for resource – this is maintenance after all.

Cheers,
Rik

slow I/O diagnosis with sys.dm_io_virtual_file_stats

Diagnosis, it’s probably one of the most important parts of being a DBA/DB Dev if you’ve got a slow running system you need to use the tools you’ve got to hand, if you’ve got a slow running query you need to do the same. Out of the box SQL Server ships with well over 100 handy tools in the form of DMV’s (and DMO’s) to help surface some of that key information to help you in your day-to-day jobs, one of which, sys.dm_io_virtual_file_stats I’ve had to dig into recently to validate that the I/O stats being pulled back to a 3rd party monitoring system were tallying correctly.

The first thing I noticed when trawling the web beyond the msdn articles was that a lot of people haven’t really given anything useful outside of telling you what the values are that are being returned, since this information is always ever accumulating you need to establish an original reading and then apply a delta to get information over time. For me I decided to set up a job to run every minute to skim the latest figures into a table, at that point I could establish the minute on minute stats for performance of each file.

Setting up the table first;

CREATE TABLE dbo.VirtualFileStats
	(
		ID INT IDENTITY(1,1)
		,DatabaseName sysname	
		,FileType VARCHAR(4)
		,DiskLocation CHAR(1)
		,SizeOnDiskMB DECIMAL(20,2)
		,FileName sysname
		,io_stall_read_ms BIGINT
		,io_stall_write_ms BIGINT
		,num_of_reads BIGINT
		,num_of_writes BIGINT
		,TimeCaptured TIME(0)
		,CONSTRAINT PK_VirtualFileStats_ID PRIMARY KEY CLUSTERED (ID)
	)

Now, setup a job to capture the information that you need, in my case I was looking for a specific database and I also wanted to see if there was anything odd going on with the system databases;

INSERT  INTO dbo.VirtualFileStats
        SELECT  DB_NAME(a.database_id) AS dbname
               ,db_file_type = CASE WHEN a.file_id = 2 THEN 'Log'
                                    ELSE 'Data'
                               END
               ,UPPER(SUBSTRING(b.physical_name,1,1)) AS disk_location
               ,CAST(( ( a.size_on_disk_bytes / 1024 ) / 1024.0 ) AS DECIMAL(20,2)) AS size_on_disk_mb
               ,b.name
               ,a.io_stall_read_ms
               ,a.io_stall_write_ms
               ,a.num_of_reads
               ,a.num_of_writes
               ,CAST(GETDATE() AS TIME(0)) AS DS
        FROM    sys.dm_io_virtual_file_stats(NULL,NULL) AS a
                INNER JOIN sys.master_files AS b
                    ON a.database_id = b.database_id
                       AND a.file_id = b.file_id
        WHERE   ( a.database_id <= 4
                  OR a.database_id = 5
                )

With the job setup and configured to capture the data for the next day every minute I’m now able to report on the data as and when it comes in, I’ve made use of the LAG features of 2012 to give me the preceding rows values to make the math a bit easier and the code a little less convoluted;

DECLARE @MinsBack TINYINT = 10;

WITH    CTE
          AS ( SELECT   ID
                       ,DatabaseName
                       ,FileType
                       ,DiskLocation
                       ,SizeOnDiskMB
                       ,FileName
                       ,io_stall_read_ms
                       ,io_stall_write_ms
                       ,num_of_reads
                       ,num_of_writes
                       ,TimeCaptured
                       ,LAG(num_of_reads,1,0) OVER ( ORDER BY DatabaseName, FileName, ID ) AS LAG_num_of_reads
                       ,LAG(num_of_writes,1,0) OVER ( ORDER BY DatabaseName, FileName, ID ) AS LAG_num_of_writes
                       ,LAG(io_stall_read_ms,1,0) OVER ( ORDER BY DatabaseName, FileName, ID ) AS LAG_io_stall_read_ms
                       ,LAG(io_stall_write_ms,1,0) OVER ( ORDER BY DatabaseName, FileName, ID ) AS LAG_io_stall_write_ms
                       ,ROW_NUMBER() OVER ( PARTITION BY DatabaseName,FileName ORDER BY ID ) AS RN
               FROM     dbo.VirtualFileStats
               WHERE    TimeCaptured > DATEADD(MINUTE,-@MinsBack,CAST(GETDATE() AS TIME(0)))
             )
    SELECT  A.ID
           ,A.DatabaseName
           ,A.FileType
           ,A.DiskLocation
           ,A.SizeOnDiskMB
           ,A.FileName
           ,A.TimeCaptured
           ,( A.num_of_reads - A.LAG_num_of_reads ) AS D_num_of_reads
           ,0 AS D_num_of_writes
           ,CAST(( ( A.io_stall_read_ms - LAG_io_stall_read_ms ) * 1.0 ) / ( A.num_of_reads - A.LAG_num_of_reads ) AS DECIMAL(20,4)) AS D_IO_Stall
           ,( A.io_stall_write_ms - LAG_io_stall_write_ms ) AS D_io_stall_write_ms
           ,'Reads' AS IOType
    FROM    CTE AS A
    WHERE   RN > 1
            AND ( A.num_of_reads - A.LAG_num_of_reads ) > 0
			AND CAST(( ( A.io_stall_read_ms - LAG_io_stall_read_ms ) * 1.0 ) / ( A.num_of_reads - A.LAG_num_of_reads ) AS DECIMAL(20,4))>2
    UNION ALL
    SELECT  A.ID
           ,A.DatabaseName
           ,A.FileType
           ,A.DiskLocation
           ,A.SizeOnDiskMB
           ,A.FileName
           ,A.TimeCaptured
           ,0 AS D_num_of_reads
           ,( A.num_of_writes - A.LAG_num_of_writes ) AS D_num_of_writes
           ,CAST(( ( A.io_stall_write_ms - LAG_io_stall_write_ms ) * 1.0 ) / ( A.num_of_writes - A.LAG_num_of_writes ) AS DECIMAL(20,4))
           ,( A.io_stall_write_ms - LAG_io_stall_write_ms ) AS D_io_stall_write_ms
           ,'Writes'
    FROM    CTE AS A
    WHERE   RN > 1
            AND ( A.num_of_writes - A.LAG_num_of_writes ) > 0
			AND CAST(( ( A.io_stall_write_ms - LAG_io_stall_write_ms ) * 1.0 ) / ( A.num_of_writes - A.LAG_num_of_writes ) AS DECIMAL(20,4))>2
    ORDER BY D_IO_Stall DESC

Personally I was only interested in reads over 2 milliseconds, Microsoft generally recommends <20 but that can depend on hardware.

Few things to note, change the @MinsBack if you want to look further back and also remember this data is collected minute on minute in this instance so any average is average on that alone, more data you collect the more accurate your readings but then where do you draw the line?!

Query Stats – Long Runners

My most recent activity’s have found me digging into bad performance (or un-optimal if you’re being political) but for all the posts I read I sometimes find myself actually asking what is bad and that will always depend on your environment. For me I’ve found a lot of use in Glen Berry’s 911 scripts which have provided good insight into certain aspects of performance but I still think there’s room for more analytical scripts, and why not, understanding the scripts is more important than just running them.

My first script I’ve decided to create help identify the longest running queries (top 20) on a given instance, this should help you identify if you don’t know already where you have long running queries, this in my case, is usually always overnight jobs.

/*Query Stats - Longest running queries on average*/
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
SET NOCOUNT ON;
 
WITH CTE_AVGRun
 AS ( SELECT TOP 20
 last_execution_time
 ,DATEDIFF(MINUTE,last_execution_time,GETDATE()) AS MinsSinceLastCall
 ,execution_count
 ,CAST(DATEADD(MILLISECOND,( ( DEQS.total_elapsed_time / 1000 ) / execution_count ),CAST(CAST(GETDATE() AS DATE) AS DATETIME)) AS TIME(2)) AS AvTimeToRun
 ,CAST(( total_logical_reads * 1.0 / execution_count ) AS DECIMAL(20,2)) AS AvLogicalReads
 ,CAST(( total_physical_reads * 1.0 / execution_count ) AS DECIMAL(20,2)) AS AvPhysicalReads
 ,CAST(( total_logical_writes * 1.0 / execution_count ) AS DECIMAL(20,2)) AS AvWrites
 ,plan_handle
 ,sql_handle
 ,statement_start_offset
 ,statement_end_offset
 FROM sys.dm_exec_query_stats AS DEQS
 ORDER BY CAST(DATEADD(MILLISECOND,( ( total_worker_time / 1000 ) / execution_count ),CAST(CAST(GETDATE() AS DATE) AS DATETIME)) AS TIME(2)) DESC
 )
 SELECT DB_NAME(DEQP.dbid) + '.' + OBJECT_SCHEMA_NAME(DEQP.objectid,DEQP.dbid) + '.' + OBJECT_NAME(DEQP.objectid,DEQP.dbid) AS FQN
 ,last_execution_time
 ,MinsSinceLastCall
 ,execution_count
 ,AvTimeToRun
 ,AvLogicalReads
 ,AvPhysicalReads
 ,AvWrites
 ,SUBSTRING(T.text,CASE WHEN statement_start_offset IN ( 0,NULL ) THEN 1
 ELSE statement_start_offset / 2 + 1
 END,CASE WHEN statement_end_offset IN ( 0,-1,NULL ) THEN LEN(T.text)
 ELSE statement_end_offset / 2 + 1
 END - CASE WHEN statement_start_offset IN ( 0,NULL ) THEN 1
 ELSE statement_start_offset / 2 + 1
 END) AS QueryText
 ,DEQP.query_plan
 FROM CTE_AVGRun
 CROSS APPLY sys.dm_exec_sql_text(sql_handle) AS T
 CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS DEQP

PowerShell File Archive

Administration with PowerShell put simply, is brilliant! I generally use PowerShell at least once or twice a day for various tasks and any opportunity to do a bit of coding I take gladly. My most recent endeavour with this has been an archive job, in which we were using volumes on a virtual machine then backing up SQL databases to them and removing files based on their age and name – 5 days ish and “.BAK”. The old world method of doing this was a little annoying in my opinion as the version written back in the day had to have a line configured for each folder but since when I first encountered the script I was fairly new in the company I let it be.

One of the best things I find about using PowerShell is you can start simple and build, so without any further detail – “Get-ChildItem” (posh DIR)

Get-ChildItem -Path "C:\"

Very handy little bit of code as this forms the base of what we’re after doing, we need to identify all files within a given location and evaluate their age and type so using GCI with the following switches allows us to extent the use.

Get-ChildItem -Path C:\* -Include *BAK -Recurse

I’ll be honest I was a little surprised at how many results came back from this, I obviously don’t keep an eye on my local backups! I had a little muck around with checking the properties at this point as you can dig into various aspects of each object by simply specifying it within the pipe via a select or pipe the object to a format-list so that all properties are exposed, be warned though it can be information overload! Additionally my own personal preference when making scripts like this is to make them as flexible as possible so I started to change it up at this point by adding in variables to make the script more portable, I’m a bit of a nerd when it comes to formatting, specifically tabulation.

The next step was to get the GCI working with these variables, simple enough, declare variable, set variable and GO – no need for a code snippet here. After that I wanted to expose information about each item which met the criteria we were after so I used a foreach loop.

New-Variable -Name strLocation -Value 'C:\' -scope script
New-Variable -Name strFileType -Value "BAK" -scope script

foreach ($file in Get-ChildItem -Path $strLocation* -Include *$strFileType -Recurse)
    {
    write-host $file
    }

In the snippet above I’m just proving to myself that it’s working, if it enters the loop write out the name of the object you have context on.

To define the age of the file be expose one of the backups properties, I’m able to use that information by creating a variable which I then use to form a New-TimeSpan between the current objects CreationTime and the current date (get-date), what I want to do once I’ve got this context is do something with the file, in this case delete it however I like to test things first so I’ll start by writing this out to the current PS window.

New-Variable -Name strLocation -Value 'C:\' -scope script
New-Variable -Name strFileType -Value "BAK" -scope script
New-Variable -Name intAge -Value 7 -scope script

foreach ($file in Get-ChildItem -Path $strLocation* -Include *$strFileType -Recurse)
    {
    $DaysOld = (New-TimeSpan -Start $file.CreationTime -End (get-date -format g)).Days
    if (($DaysOld -ge $intAge) -eq $true)
        {
            write-host $file.FullName
        }
    }

Once happy with the results you can then replace the write-host file out with the actual code required which is shown below.

Remove-Item -LiteralPath $file.FullName 

And that’s it for a basic archive, you can tinker with that as a base but basically that’ll give you a simple delete with limited constraints, personally I might use it for my own PC at home for removing any video files which other users create – handy. Further to that below is my final script which I ended up producing which I introduced a few extra features (Logging and Debug), enjoy.

################################################################################################################################################
## 	Name		: FileArchive.ps1
## 	Function	: Written to remove files from a location
##	Version		: 1
##	Author 		: RT
##	Usage		: Admin/Advanced only
################################################################################################################################################
clear 

$error.clear()

New-Variable -Name strLocation -Value 'C:\' -scope script
New-Variable -Name strFileType -Value "BAK" -scope script
New-Variable -Name strLogFile -Value "" -scope script
New-Variable -Name strLogFileLocation -Value "C:\" -scope script
New-Variable -Name intAge -Value 7 -scope script
New-Variable -Name debug -Value 1 -scope script

$strLogFile = Get-Date -uformat "%d%m"
$strLogFile = "Log_"+$strLogFile + (Get-Date -UFormat "%Y") +".txt"

if (($debug -eq 0)-eq $true)
    {
        new-item -Path $strLogFileLocation -Name $strLogFile -ItemType file
    }

foreach ($file in Get-ChildItem -Path $strLocation* -Include *$strFileType -Recurse)
    {
        $DaysOld = (New-TimeSpan -Start $file.CreationTime -End (get-date -format g)).Days
        if (($DaysOld -ge $intAge) -eq $true)
            {
                $info = @{}
                $info.DateNow=(get-date -Format g)
                $info.DaysOld=$DaysOld
                $info.Created=$file.CreationTime
                $info.DIRLoc=$file.FullName
                $object = New-Object -TypeName PSObject –Prop $info

            if (($debug -eq 0)-eq $true)
                {
                    $object | format-list | out-file -FilePath $strLogFileLocation$strLogFile -append
                    Remove-Item -LiteralPath $file.FullName
                }
            if (($debug -eq 1)-eq $true)
                {
                    $object | format-list
                }
            }
    }

write-host $error

Remove-Variable strLocation
Remove-Variable strFileType
Remove-Variable intAge
Remove-Variable debug
Remove-Variable strLogFile
Remove-Variable strLogFileLocation

“Free” Space

I’ve done a bit of work recently on compression and various aspects of space saving and what I generally found is that myself and my team aren’t really thinking about space and what it means. Given our current environment anything which is live is replicated on 10+ other environments by using tools provided by our storage vendor, NetApp, although these tools are great and provide us with methods of reducing our 2TB estate we’re still having to reserve X amount of space even using thin provisioning.

All that said I went about looking at two things, compression (another post) and actual database file usage, specifically .LDF files, I found some pretty decent posts out there about how people went about it but ultimately decided to write my own version because I like collecting data so it was easier to write something which fit into my collection agent (SSIS into a repository database) and here’s the code.

SET NOCOUNT ON
GO
IF EXISTS ( SELECT  *
            FROM    sys.objects
            WHERE   name LIKE 'tmp_FileSpaceSummary' ) 
   BEGIN
         DROP TABLE dbo.tmp_FileSpaceSummary
   END
GO
CREATE TABLE dbo.tmp_FileSpaceSummary
       (
       ServerName VARCHAR(50)
	   ,DatabaseName VARCHAR(50)
       ,FILE_ID INT
       ,DatabaseFileName VARCHAR(50)
       ,FileType TINYINT
       ,FileTypeDesc VARCHAR(50)
       ,Physical_Name VARCHAR(400)
       ,TotalSpaceMB DECIMAL(20, 2)
       ,UsedSpaceMB DECIMAL(20, 2)
       ,FreeSpaceMB DECIMAL(20, 2)
       )

DECLARE @s_SQL VARCHAR(600)
       ,@s_DB VARCHAR(50)

SELECT  @s_DB=MIN(name)
FROM    sys.databases
WHERE	database_id<100

WHILE @s_DB<=(SELECT    MAX(name)
              FROM      sys.databases
			  WHERE	database_id<100) 
      BEGIN
            SET @s_SQL = 
				'USE ['+@s_DB+'];'+CHAR(13)
				+'INSERT INTO master.dbo.tmp_FileSpaceSummary'+CHAR(13)
				+'SELECT  @@SERVERNAME'+CHAR(13)
				   +',DB_NAME(DB_ID()) AS DatabaseName'+CHAR(13)+CHAR(9)
                   +',FILE_ID'+CHAR(13)+CHAR(9)
                   +',name'+CHAR(13)+CHAR(9)
                   +',TYPE'+CHAR(13)+CHAR(9)
                   +',type_desc'+CHAR(13)+CHAR(9)
                   +',physical_name'+CHAR(13)+CHAR(9)
                   +',CAST((size/128.0) AS DECIMAL(20, 2)) AS TotalSpaceMB'+CHAR(13)+CHAR(9)
                   +',CAST((CAST(FILEPROPERTY(name,''SpaceUsed'') AS DECIMAL(20,2))/128.0) AS DECIMAL(20,2)) AS UsedSpaceMB'+CHAR(13)+CHAR(9)
                   +',CAST((size/128.0-CAST(FILEPROPERTY(name,''SpaceUsed'') AS DECIMAL(20,2))/128.0) AS DECIMAL(20,2)) AS FreeSpaceMB'+CHAR(13)
				   +'FROM    sys.database_files WITH (NOLOCK)'+CHAR(13)

			EXEC (@s_SQL)

            SELECT  @s_DB=MIN(name)
            FROM    sys.databases
            WHERE   name>@s_DB
		AND database_id<100
      END

In my scenario I initiate a Foreach loop in SSIS from a config table of servers which I want to collect this information from, each server gets this query ran against it and then the data is copied into my repository database every hour or so which then allows me to see what the actual usage of the file is like on an hourly basis. In short I’ve been able to see that the actual amount of space required for a .LDF is a lot less than we originally thought although this may have been due to an application change for example a archive table which was incorrectly written to may now be using a more efficient method and now no longer requires all of 50GB of log.

Get going with PowerShell

I was scanning through my scripts for PowerShell the other day and I came across a script I was quite fond of back in the day but I can’t remember for the life of me why I wrote it however, I remember using it recently for some mischief! What I wrote was basically a tool for restarting a service against any given host which as you can imagine was used a few times for restarting/stopping MSSQLSERVER on my colleagues PC!

So, without further a-do, the code!!

##################
## FUNCTIONS
##################
function FuncMail
	{#param ($strTo, $strFrom, $strSubject, $strBody, $smtpServer)
	param($To, $From, $Subject, $Body, $smtpServer)
	$msg = new-object Net.Mail.MailMessage
	$smtp = new-object Net.Mail.SmtpClient($smtpServer)
	$msg.From = $From
	$msg.To.Add($To)
	$msg.Subject = $Subject
	$msg.IsBodyHtml = 1
	$msg.Body = $Body
	$smtp.Send($msg)
	}

function FuncRestartService ($server, $service)
{
	$intPingError = 0
	$intError = 0

	$ping = new-object System.Net.NetworkInformation.Ping
	try
		{
			$rslt = $ping.send($server)
		}
	catch
		{
			$intPingError = 1
			$strError = $Error
		}
	if ($intPingError –eq 0) #success: ping
		{
	        write-host “...ping returned, running service restart”
			try
				{
					Restart-Service -InputObject $(Get-Service -Computer $server -Name $service ) -force
				}
			catch
				{
					$intError = 1
					$strError = $Error
				}
			if ($intError -eq 1) #failure: restart - fully exit program
				{
					Write-Host "...an error occurred, notifying by email"
					FuncMail -To $emTo -From $emFrom  -Subject "Server: $server - Error" -Body "$server\$service restart was attempted but failed. Details: $strError" -smtpServer $emSMTP
					break
				}
			else #success: restart
				{
					write-host “...finshed restarting service email sent”
					FuncMail -To $emTo -From $emFrom  -Subject "Server: $server - Restart" -Body "$server\$service has been restarted" -smtpServer $emSMTP
				}
		}
	else #failure: ping - fully exit program
		{
			Write-Host "...ping failed, notifying via email"
			FuncMail -To $emTo -From $emFrom  -Subject "Server: $server - Status" -Body "$server is not responding to ping, please investigate. Details: $strError" -smtpServer $emSMTP
			break
		}
}
##################
## EMAIL Variables
##################
$emTo 		= 'Richard.Thwaites@SomePlace.com'
$emFrom		= 'TMServiceRestart@SomePlace.com'
$emSMTP		= 'AnExchangeServer.com'

##################
## RUN Program
##################
Write-Host "Starting Program..."
Write-Host "...To execute run ""CheckService ServerName"""

FuncRestartService "APCNameHere" "MSSQLSERVER"

Write-Host "...program complete"

If you wanted to you can call two sets of the function if required, suppose you wanted to restart a set of services against an array of servers, a few posts I’ve seen tend to read these in to loops via text files but I prefer the all-in-one way.

Cheers,
Rik

re-send emails from msdb

One of the few issues I’ve had with msdb has been the configuration and use of mail, going back to SQL 2000 it was a bit of a mare but thankfully went through a fairly handy rehash for 2005 onward. That being said I was asked recently a.) could I check to see if a mail had failed to send from SQL? and b.) if the previous was true could I re-send that email? Yes & Yes but in fairness I always say yes until I find out I’m horribly wrong then the headphones go in and I go recluse for a week or two!

Ok, first step is to find anything relating to failed mail which is where IntelliSense came in handy for me, if you’re not accustom with SQL and you start using it one thing you will notice is the plethora of system objects and functions which can be a bit of a mare, with IntelliSense you can start your logical query i.e. SELECT * FROM … and as long as you know roughly what you’re looking for you should be able to narrow it down without having to resort to google and SQLServerCentral!

Code a.)

USE msdb;
SELECT TOP 10 *
FROM msdb.dbo.sysmail_faileditems WITH(NOLOCK)
ORDER BY mailitem_id DESC

This should return you a list of the last 10 failed mails, mailitem_id is the key column to note here as it’s also used within sysmail_allitems which is handy if you fancy taking a look at the history of mail from the server either way, that’s a.) done with!

The second request was a little trickier than the first however with a little google and a profiler trace you’re able to see what sp SQL uses to send mails out.

Code b.)

USE msdb;
DECLARE @sendmailxml NVARCHAR(MAX)
DECLARE @mailitem_id VARCHAR = '52119'
DECLARE @rc VARCHAR(40)

-- Create the primary SSB xml maessage
    SET @sendmailxml =	N'<requests:SendMail xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"' 
						+N' xsi:schemaLocation="http://schemas.microsoft.com/databasemail/requests RequestTypes.xsd"'
						+N' xmlns:requests="http://schemas.microsoft.com/databasemail/requests">'
						+N'<MailItemId>'
                        + @mailitem_id 
						+N'</MailItemId></requests:SendMail>'

    -- Send the send request on queue.
    EXEC @rc = sp_SendMailQueues @sendmailxml
    IF @rc <> 0
    BEGIN
       RAISERROR(14627, 16, 1, @rc, 're-send mail failed')
    END 

Note in the above code that the @mailitem_id has been set to 52119, that in your own scenario will be whatever failed mailitem_id number you may have come across, and that should be that.

Cheers,
Rik