My frustraition with a product that is, in theory, action packed, but faultering in execution continues - this time it is incoming email. A well behaved mail gateway should return a non-delivery receipt if it is unable to deliver email, it's common courtesy. SharePoint, both Windows SharePoint Services and Office SharePoint Service, seems to take exception to this idea. As an IT guy it is the most frustraiting thing to see a module that was slapped together with little regard to proper protocol behaviour. My frustration started when email enabled lists just stopped capturing email randomly. I am not alone in this frustration either, there are clearly others yet not a peep from Microsoft:
The SPTimer service polls the SMTP drop folder on a regular basis to look for incoming messages. When the message is picked up it is read in and the alias appears to be cross referenced against an EmailEnabledLists table. I was tipped off to this by a post on the SharePoint newsgroup. If the logging verbosity is set and the alias is not found it will log an Unknown Alias error to both the event log and the trace logs:
Type: Warning
Date: 2/26/2008
Time: 8:42:30 AM
Event: 6873
Source: Windows SharePoint Services 3
Category: E-Mail
User: N/A
Description:
An error occurred while processing the incoming e-mail file
D:\SMTP\Drop\54f3fb0601c8787d0000024f.eml. The error was: Unknown alias..
Along with the following event:
Type: Information
Date: 2/26/2008
Time: 8:42:30 AM
Event: 6871
Source: Windows SharePoint Services 3
Category: E-Mail
User: N/A
Description:
The Incoming E-Mail service has completed a batch. The elapsed time was 00:00:00.0156256.
The service processed 1 message(s) in total.
Errors occurred processing 1 message(s):
Message ID: <SERVERxiDd9D9DQVXwR100000a19@domain>
The following aliases were unknown:
mylist
After poking around in the database to see if I could find any hints as to why this functionality decided to stop all of a sudden. Running the following query on each content database gave me a dump of the email aliases that each list was configured against:
SELECT tp_EmailAlias as EmailAlias,
SiteId,
tp_WebId AS WebId,
tp_Id as ListId
FROM AllLists
INNER JOIN Webs ON AllLists.tp_WebId = Webs.Id
WHERE tp_EmailAlias IS NOT NULL
Cross referencing the results with the EmailEnabledLists table I was able to find that there were some, no, a lot of missing rows. Something was deleting my email enabled lists. At this point I don't know what, but it smells like a bug. In the mean time I have created a script to re-populate the configuration database table based on the content databases.
SET NOCOUNT ON
IF OBJECT_ID('tempdb..#EmailEnabledLists') IS NOT NULL
BEGIN
DROP TABLE #EmailEnabledLists
END
CREATE TABLE #EmailEnabledLists (
Alias NVARCHAR(128) NOT NULL,
SiteId UNIQUEIDENTIFIER NOT NULL,
WebId UNIQUEIDENTIFIER NOT NULL,
ListId UNIQUEIDENTIFIER NOT NULL)
-- TODO: Insert your SharePoint Content Database Names into the Set
DECLARE DatabaseCursor CURSOR FOR
SELECT [Name]
FROM master.[sys].databases
WHERE [State] = 0 AND -- State = Online
[Name] IN ('SharePoint_ContentDatabase1', 'SharePoint_ContentDatabase2')
ORDER BY [Name]
OPEN DatabaseCursor
DECLARE @DatabaseName VARCHAR(128)
FETCH NEXT FROM DatabaseCursor INTO @DatabaseName
WHILE @@FETCH_STATUS = 0
BEGIN
DECLARE @Command NVARCHAR(4000)
SET @Command =
N'USE [' + @DatabaseName + '];' +
N'INSERT INTO #EmailEnabledLists (Alias, SiteId, WebId, ListId) ' +
N'SELECT ' +
N' tp_EmailAlias as Alias, ' +
N' SiteId, ' +
N' tp_WebId AS WebId, ' +
N' tp_Id as ListId ' +
N'FROM ' +
N' AllLists ' +
N' INNER JOIN ' +
N' Webs ' +
N' ON AllLists.tp_WebId = Webs.Id ' +
N' WHERE tp_EmailAlias IS NOT NULL'
EXEC sp_executesql @Command
FETCH NEXT FROM DatabaseCursor INTO @DatabaseName
END
CLOSE DatabaseCursor
DEALLOCATE DatabaseCursor
-- TODO: Insert your SharePoint Configuration Database Name
USE [SharePoint_Config]
DECLARE AliasCursor CURSOR FOR
SELECT [Alias]
FROM #EmailEnabledLists
ORDER BY [Alias]
OPEN AliasCursor
DECLARE @AliasName NVARCHAR(128)
FETCH NEXT FROM AliasCursor INTO @AliasName
WHILE @@FETCH_STATUS = 0
BEGIN
IF(NOT EXISTS (SELECT Alias FROM dbo.EmailEnabledLists WHERE Alias = @AliasName))
BEGIN
DECLARE @SiteId UNIQUEIDENTIFIER
DECLARE @WebId UNIQUEIDENTIFIER
DECLARE @ListId UNIQUEIDENTIFIER
SELECT @SiteId = SiteId,
@WebId = WebId,
@ListId = ListId
FROM #EmailEnabledLists
PRINT 'Inserting ' + @AliasName
INSERT dbo.EmailEnabledLists
(Alias, SiteId, WebId, ListId, [Deleted])
VALUES (@AliasName, @SiteId, @WebId, @ListId, 'FALSE')
END
FETCH NEXT FROM AliasCursor INTO @AliasName
END
CLOSE AliasCursor
DEALLOCATE AliasCursor
IF OBJECT_ID('tempdb..#EmailEnabledLists') IS NOT NULL
BEGIN
DROP TABLE #EmailEnabledLists
END
Now the million dollar question - what is the root cause? If I had time I would setup a trigger on the EmailEnabledLists table to notify me when something is deleted to get an idea of time frame and then use SQL Profiler to watch that process to see what the calls looks like and where they are coming from. Better yet, maybe some folks from the SharePoint team can chime in and share some more to help us understand the quirks of a product struggling for stability in the face of mass adoption.
UPDATE (24-Mar-08): Linked to my TechNet forums post as well, but no further progress. Looking into the Microsoft.SharePoint.Administration.SPEmailEngine class with Reflector I cannot see any code to generate NDRs either.