I've had a few people ask me what the scoop was on the exchange server from this post.
So here's the story...
While troubleshooting this problem I found that the SAN had been registering an alert (that I wasn't getting - more on that very soon) for an error on drive 7. So I called Equallogic who led me through the process of pulling a diag on the SAN. I mailed it off to them and they went over it and got right back, saying to pull the drive and replace it with our spare. I did and they shipped another drive.
So in the mean time I dork around with email alerts and I start getting them again, all is well.
The next day the drive showed up and I headed to the server room to put the spare drive in a nice safe spot and there are red lights on two drives, 7 and 10.
Freaked out I call Equallogic again, I pull another diag, mail it to them and they call me back. Looks like drive 10 failed over to drive 7 (was currently a hot spare) and it failed also. So I take the drive that just came in the mail and replaced drive 10, and dork around with email alerts... whew they're working again (wish I had more time to keep an eye on this).
The next day two more drives show up, I replace drive 7 and put the spare on the shelf.
This past Tuesday Dustin (awesome volunteer) calls me into the server and there on the SAN is a red blinking light... drive 9 has died. Yep that makes 4 drives in 16 days. Stay tuned for more about the email alerts!
Well, on the bright side, that is what RAID is there to do: protect ya. Wonder if there is an another underlying issue? Weird power issues? Temps in the room? Temps in the rack? Glad you didn't suffer any serious downtime!
Posted by: deannie | February 08, 2007 at 09:21 PM
Ed, my name is Marc Farley and I work for EqualLogic. I was alarmed to see your post - probably not as alarmed as you were to have multiple drive failures, but plenty alarmed just the same. It's extremely unusual to have something like this happen. If its OK, I'll try to contact you tomorrow to see how things are going. If you want to call me, my phone # is 408-210-7931. I'm on the west coast.
Posted by: MarcFarley | February 09, 2007 at 02:28 AM
I had a similar experience about a year ago with another vendors SAN unit. It turned out that testing proved the drives were not dying, but they had a major problem with the particular version of the firmware the SAN was running. We received a "fixed" version of the firmware 6 weeks later than finally resolved the issue.
Posted by: allen madding | February 10, 2007 at 08:28 PM