Introduction
Administrators who haven't worked with BizTalk much rarely seem to appreciate its power and flexibility and almost never respect its demands. Yes I used the words "respect" and "demands", BizTalk requires both. Allow me to demonstrate with a recent example. A client of mine had a rather sophisticated four node SQL cluster with three active nodes and one spare. The nodes were configured to keep failing over in the event of a major catastrophe (that's the good part later). They ran BizTalk's databases (all of them, also against my advice) on one node; along with SSIS for the entire enterprise (again against my advice). Their business had a very much peak and trough pattern with floodgate scenarios common.
At one point this customer also wanted to create a database to hold all incoming messages. It functioned almost like the message box, but without the extensive R&D of Microsoft behind it. I was unable to persuade them against this. That said we did a pretty good job with it. BizTalk itself would log messages to this database and different orchestrations/messaging solutions would process them in progression until they were sent out to other systems when their enrichment was complete. The solution was modular, easily extensible, and worked well even with moderate stress testing.
The Issue
When it was deployed to production everything was running fine until the first major floodgate. This particular incident involved months of backlogged transactions being queued up awaiting processing. Due to strict, appropriate, and well enforced security procedures none of the development team was aware of the sort of volume awaiting our solution. This wouldn't be a minor flood; we were talking Noah and the Ark. Worse still for some other reason, unlike in all other environments, this new database was created on the same SQL instance as the BizTalk databases. This was the real epicenter of the issue and one we never quite got to the bottom of. As mentioned in the beginning of this post, some organizations just don't give BizTalk the proper respect it deserves.
All the servers in the group eagerly began processing tens of thousands of transactions – we think even far more. As BizTalk and the adapters (all spread across hosts and instances on multiple servers) began hammering this new database they were hammering the message box as well; on the same SQL instance. We didn't know it yet, but we had hit the iceberg. The server began blocking connections and many of the database counters that throttle BizTalk also were not getting the access they needed. Soon SQL Server pulled a Cartman: "Screw you guys I'm going home" and shutdown; that is it existed the SQL process. I had never really seen this before, but when the deadlock queue was past 1000 it happened. The failover worked flawlessly and in minutes the second node followed the first Cartman. The third node followed suit and an administrator decided to not let the fourth go on and took the offending node offline. And the band played on. It took a while to clear up the backlog and get everything to process well. This was done mostly by not turning on all the BizTalk servers at once and putting some more aggressive throttling in place.
Conclusion
As we attempted to figure out where our brilliant ideas went astray we tried replicating the issue in other, much less powerful environments which did not have this new database installed on the same SQL instance as the message box. No matter what load we used, we could not replicate it. The solution would slow, it would throttle, but SQL would never pull a Cartman. The moral of the story is despite what your teacher told you: sharing isn't always caring. When it comes to BizTalk databases, especially the message box, do not share them with other databases on the same SQL instance; especially if that database is an endpoint of a BizTalk solution.