Tuesday at 11:37 I was able to finally get email up and running. It was a long hard couple of nights but what an incredible relief it was to see it working again. Thanks to everyone for the prayers and kind words. For those of you who would like some details about what happened, read on, but let me warn you.... it's really long.
Out of room
Our Exchange server is a virtual server and it has several virtual hard drives. On each of the hard drives are Exchange stores or logs. On Sunday I started seeing a bunch of NDR's and went to investigate the issue. It didn't take long to see that one Exchange store had filled the virtual drive completely. So I took the back end server offline and used the vmware-vdiskmanager tool to expand the hard drive. THIS is NOT the first time I've had to do this, and in the past have found it quite easy.
Less room than you think
In order to add room to the disk I had to add pull the last free space from my SAN - there was no possible way to do a clone or snapshot of the LUN on the SAN and no space for a snapshot on the VM either. Knowing that I had backups I pulled the trigger on the vmdk expansion and it claimed grew correctly.... right now I should be all smiles but I never celebrate until it's over.
Won't start
So I press start on the Exchange machine and I start getting an error about the disk, when I investigate the host can't read the geometry of the expanded disk at all.... not good.... after a quick panic attack, I head into work.
After 5 hours on the project and a couple of calls to vmware, Jason makes the call to wait on vmware sales in the morning so that we can purchase support from vmware.
Can't Sleep
So if you know me at all, you'd know that I'm NOT feeling good about my Exchange server being down, but I trust Jason, and I was then and am now certain it was the right call. None the less I'm thinking this isn't going well so I start doing some work around the periphery of the issue. At 2:00am my 20 year old daughter admonishes me into going to bed... up early and back at work at 6:30.
Platinum support far from golden
You can read about the vmware support debacle (same as the link above) on Jason's blog. The biggest woe was the amount of time that past before getting utterly no help from vmware. It's now 4:30pm on Monday and I'm no closer to resolving this issue.
Visit from an old friend
In the midst of the chaos, Jack Chen and PJ Ryan from Equallogic came by to talk about Equallogic and GCC.... sadly I was unable to spend much time with them. But the short time I got to hang out with Jack was the highlight of my day.
Running Again
After copying the vmdk of the failed drive I went into the VMware Console and removed the drive from the email guest server and it fired right up. This is a good sign and makes me happy. I power the server back down and add a replacement virtual drive, and then bring it back up again. My plan is to restore the Exchange database onto this drive. Knowing that growing an 80 GB drive on the fly while restore a backup will slow the process down so I go ahead and pre-allocated the storage.
Break out the Backup
So now I have running exchange server and a valid backup, time to put the two together. I browse the data and tell it to do a loss-less restore to the nice new virtual disk. By now it's 6:30pm and I've been trying to get Jason to leave for at least an hour. I was glad he stayed, but I felt bad that he and his family had to endure this as well. I stopped and prayed for my wife and kids who wouldn't be seeing me tonight. I fire up the restore, it's looking very good, I'm beginning to dream that this might just work.
The Good The Bad and The Ugly
The backup failed to complete, seems that when the backup tried to replay Transaction logs (TLogs) it fills up the C:\ drive by processing them in a temp directory on C:\ drive even though the Database is being restored on another drive. I call Dustin and he's seen this before. He knows there is a fix for it and suggests we call Dell (our Commvault software is Dell branded and we must before the'll ever let me talk to CommVault). The tech from Dell had never been inside of an Exchange box before and had only heard of commvault. So I try explain it all to him. And he decides the only way he can help is if I let him establish a webex session to give him access to my exchange sever so he can poke around long enough to get a feel for it. (although guys like me might develop feelings for an exchange server) feeling your way around in one is not for newbies and certainly not going to happen on MY exchange box.
So while trying to be nice to the tech who is insistent about playing with my exchange box, Dustin finds the CommVault KB about this issue and we tell the Dell guy the solution... finally I persuade tech that we have the case number and will give him a call it doesn't work.
SO I make the registry edits and start restore again.
The backup restore takes about an hour and a half and and it finishes with errors.The errors seem to be around processing the TLogs, but we run the restore once more with tlogs anyway. This time Dustin spends some time looking through the Database with eseutil and Jason and I are munching chicken wings and pizza. Yes dinner at midnight.
Dustin Goes to bed
You know it's late when you work later than Dustin will stay up, but, sometime in the 1:00am hour Dustin heads to bed and tells me to call him if the restore without TLogs fails.
1:50am the restore fails. 2:00am I head down a different avenue, using eseutil I begin a repair on the database I've just restored. everybody says this is a last ditch effort, but I'm ready to end this ordeal. This is a long process, Microsoft says it will process about 5 to 8 gig per hour.... this is a 70 gig store. I start playing chess online.
Jason has stayed with me all this time.... I don't know that I could have kept going at so long if it wasn't for Jason being in the boat with me.
It's 3:00am and Jason's on his way home to bed.
I crank up the Pearl Jam and consistently lose every game of chess I play. Then I decide to blog (that kills about half an hour and then back to losing at chess. Babysitting a process like this can be mind altering, but I'm determined.
My Chess rating Drops 300 points
Heading home
It's 5:00am and the process has yet to finish. I pack up my laptop and head home, I drive slowly and am thankful that I'm the only guy on the road. I get home and set up my laptop, process still running.
Take a shower and get dressed at 6:15am I head off to my small group that meets Tuesday morning..... I'm the leader. By 7:00 the guys send me home. The process is still running.
Bed
I head to bed setting my alarm for 9:00am, it's 7:15am
9:00am I drag outta bed.... process still running... back to bed.
10:20am process still running - my head feels as if will explode. Back to bed, but this time the re-leaf trucks are slowly moving through my tree filled neighborhood the vacuums sound like jet engines... no sleep now. I take a hot bath and my headache becomes a dull rumble, I pray as I head to my computer. When I get there I find the process has finished, 11:15am. Praising God, tears well up in my eyes from joy and fear as I make an attempt to mount the store, I pray more as the store begins to mount. It takes longer than I can stand, but finally it mounts. I unfreeze the queues and watch the mail flow it's 11:38am
Cleanup and pancakes
I do some cleanup work on the server and call my wife to celebrate, I ask her to come by and get me.... we got out for lunch and I have pancakes, that were subliminally planted in my mind by Kem Meyer. It was a good celebration.
Recent Comments