Insufficient Disk Space (T-SQL Tuesday #88)

TSQLTuesdayThis month’s T-SQL Tuesday – hosted by Kennie T Pontoppidan(@KennieNP) – is called “The daily (database-related) WTF“. He asked us to be inspired by the IT horror stories from http://thedailywtf.com, and tell our own daily WTF story.

Years ago in a previous job, I worked at a company that had no DBAs. I am/was a BI developer, so I know my way around a database, but I wasn’t dedicated to keeping all databases in good health. There were several application developers at this company (mostly focused on .NET and Javascript) who built applications with SQL Server databases as the back end. And there was a guy who acted as a system admin among his many other duties. The application developers had built a web app that was to be used by users around the world. The application had been launched and things were fine for several weeks. I wasn’t involved with the project, but I was aware of it.

One day, a manager asked me if I could help on an urgent matter: the application suddenly could no longer execute transactions on the production database and the database connection was intermittently failing. The system admin was busy with other duties, so I was the closest thing they had to a DBA.  All they could tell me was the production database had crashed and they got an error message about insufficient disk space.

I logged on to the server that housed the database to see what was going on. The server itself had been set up appropriately and seemed to have sufficient memory and CPU to support the load of this application. I saw 3 volumes on the server: a C volume for application and system files, a large F volume for data, and a large G volume for logs.

I connected to the database with Management Studio to do some more digging. The first thing I noticed is that the dev, test, and prod databases for this application were all on the same SQL Server instance. The dev and test databases weren’t very large, so while that wasn’t what I would have recommended, that didn’t seem to be the main problem. As I looked at the prod database, I noticed that the MDF and LDF files were sitting on the C volume rather than the spacious F volume that was made for them! The person who configured the server hadn’t made the C volume very large since user databases weren’t supposed to be there.

Then I looked at the size of the log file. It was huge! A bit more digging revealed that they had left all the defaults on the database for full recovery and autogrowth of the log file, but they had never done a transaction log backup. (Sidenote: You can check the Log_Reuse_Wait_Desc column in sys.databases to verify the database is waiting on a transaction log backup.) The developers had worked long and hard to get the application up and running and hadn’t quite finished up the maintenance and disaster recovery tasks.

Once I knew what I was dealing with, I was able to fix the problem. A full backup and a log backup later we were back in business. I went ahead and shrunk the log file back to a reasonable size (please remember this is reserved for special occasions). I took the database offline (which was acceptable since the application was currently unusable anyway), moved the MDF and LDF files to their rightful home, and brought it back online. A lesson on recovery models and setting up SQL Agent jobs that scheduled such backups ensured this didn’t happen again anytime soon.

This should be a good reminder to have a healthy respect and understanding for your database settings and to make sure you have (and test) your backups (both full and transaction logs) for your production databases.

2 thoughts on “Insufficient Disk Space (T-SQL Tuesday #88)

  1. The other mistake they made was not to have their TempDB data and log on separate faster drives. TempDB data should be contained on it’s own drive with the amount of files equal to the number in your MAXDOP whereas the TempDB log should be one file on it’s own fast drive. In total a good server setup has 5 drives with each having their own controllers. 🙂
    A good rule of thumb for speeds is TempDB should be on the fastest, TempDB data and DB log should be on faster, and then DB Data should be on the slowest of all of them. I am in consistent arguments with my IT staff over the need for speed on a DB that they don’t understand what it does, until I tell them that TempDB is similar to the Windows Page File (and even then they don’t understand the necessity for those anymore).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s