This month’s T-SQL Tuesday – hosted by Kennie T Pontoppidan(@KennieNP) – is called “The daily (database-related) WTF“. He asked us to be inspired by the IT horror stories from http://thedailywtf.com, and tell our own daily WTF story.
One day, a manager asked me if I could help on an urgent matter: the application suddenly could no longer execute transactions on the production database and the database connection was intermittently failing. The system admin was busy with other duties, so I was the closest thing they had to a DBA. All they could tell me was the production database had crashed and they got an error message about insufficient disk space.
I logged on to the server that housed the database to see what was going on. The server itself had been set up appropriately and seemed to have sufficient memory and CPU to support the load of this application. I saw 3 volumes on the server: a C volume for application and system files, a large F volume for data, and a large G volume for logs.
I connected to the database with Management Studio to do some more digging. The first thing I noticed is that the dev, test, and prod databases for this application were all on the same SQL Server instance. The dev and test databases weren’t very large, so while that wasn’t what I would have recommended, that didn’t seem to be the main problem. As I looked at the prod database, I noticed that the MDF and LDF files were sitting on the C volume rather than the spacious F volume that was made for them! The person who configured the server hadn’t made the C volume very large since user databases weren’t supposed to be there.
Then I looked at the size of the log file. It was huge! A bit more digging revealed that they had left all the defaults on the database for full recovery and autogrowth of the log file, but they had never done a transaction log backup. (Sidenote: You can check the Log_Reuse_Wait_Desc column in sys.databases to verify the database is waiting on a transaction log backup.) The developers had worked long and hard to get the application up and running and hadn’t quite finished up the maintenance and disaster recovery tasks.
Once I knew what I was dealing with, I was able to fix the problem. A full backup and a log backup later we were back in business. I went ahead and shrunk the log file back to a reasonable size (please remember this is reserved for special occasions). I took the database offline (which was acceptable since the application was currently unusable anyway), moved the MDF and LDF files to their rightful home, and brought it back online. A lesson on recovery models and setting up SQL Agent jobs that scheduled such backups ensured this didn’t happen again anytime soon.
This should be a good reminder to have a healthy respect and understanding for your database settings and to make sure you have (and test) your backups (both full and transaction logs) for your production databases.