WD RED4TB DRIVES + DELL POWERVAULT MD1000 + Atto R680 or R644 = Headache
Hi All, hoping this post manages to catch the attention of guru's like Bob Zelin et al. but would feedback and thoughts all round.
I'm about to start as the DIT on a 3-month feature film. One of the things I've learnt with projects like this is to always make the supply of hard drives and RAIDS especially someone else's responsibility (aka problem/SEP). I find that the on-set environment is one where if something niggly happens with a drive or RAID its far better to be able to call someone who has money invested into a device like that and get them to come and troubleshoot it while I continue to do my 'normal' work on a backup device.
I try to often give this responsibility to either the Post-house responsible for finishing the job, editorial etc. They usually are all too happy because it helps them pay off their relatively expensive and high end hardware and it works great as a motivation to cash strapped productions to get the "right stuff." Often, the moment you try to punt your own personal hardware, I'll simply be told there's "no budget for that" and "can't we just use and recycle a few rugged drives". The moment I'm not making money on drives and saying "this is what we need, phone these guys..." its instantly taken more seriously. I back this "give me the right hardware" thing up with a no-overtime deal, because, well, there shouldn't be if all the right toys are in place.
With this in mind, I requested, amongst several other things, a large, 30-40TB Rackmount Raid I could connect via my R680. The post guys, who are great and I have huge respect for, supplied me with a Dell Powervault MD1000 kitted out with 10x WD RED4TB drives + 2 hotspares and space for another 3 Drives should I wish to add more later...
I lent them my R680 so that they could build the RAID while I was busy with gear check, however, it failed twice upon reaching 96% (they tried updating to the most recent firmware on the card on the second attempt).
They decided to switch over to their R644 and try again. Due to a few hiccups with power in their new building, I ended up taking home the chassis over the weekend and left it to finish initialising. It failed at 4:30 this morning at 96%. The beeping alarm woke me up. Having now given up Sunday to troubleshooting what should have been SEP, I'm having trouble letting go of the challenge I've been presented.
I've been at the atto config tool since then trying and testing different options as well as trolling the depths of the interweb for ideas and solutions.
Here's what I think so far:
1) Drive 7 might be faulty and has caused the three initialisation failures
- This seems to be confirmed by the fact that I started two separate RAID0 setups, drives 1 and 2 in one pair and drives 8 and 7 in another. Drives 1+2 are sitting at 85% after a few hours and drives 7+8 are sitting at 49%.
PS All other diagnostics except for the initial error report on the initial failure have turned up clean for drive 7.
2)The Powervault MD1000 might not like the R680 OR the R644? Can't really confirm this in anyway as it seems to be an chassis (only purchased in 2012 though according to the dell website) but the most recent firmware I found seems to be dated 2007.
3) The power vault or R680 might not like the WD40ERFX 4TB drives. They are very new to the market. Can't find any firmware, write-ups or inter web based trouble shooting for them directly.
All the express initialisations I've run seem to work beautifully. Its the advanced one that crashed.
Once this last test on drive 7 completes, I intend to eject it from the system and try to stripe up a RAID6 with 10 of the drives.(maybe 11, using the replacement I get for drive 7 as a hot spare later).
I'll probably use a 256K interleave size with the system to to high priority on rebuilds. I have a spare 16TB Raid of my own on standby in case the big guy gives me any hiccups so my wrangler will continue with that should the Dell need any separate TLC.
Thoughts? Mr. Zelin?
PS. come Tuesday, if this isn't working I'm giving it back to the post guys and asking for a newer box, sorry J ;p
your post guys gave you this drive array, because it's not good enough for them to use. I know I am lecturing you, but are you really doing a DIT job, and NOT charging your client for the drive storage ? Are you out of your mind ? How do you know that these drives are good ? Did you test them ? Are they new ? Did anyone test anything, other than to stick the drives into the array, and hope for the best ?
What if you successfully RAID the array, and then have a catastrophic failure on the set of the shoot, and lose all the media. What will you do then? What will your client do to you ?
Why don't you ask your post house to configure the array for you, since you trust them so much, and they are "the experts" - after all, it's their RAID array - they should know if it's working.
I know that all you want me, or Alex to say is "change this setting on the ATTO Config Tool, and everything will work". But that's not the way it works, and there is no magic setting. This is why we CHARGE PEOPLE to do this, because it's a PAIN IN THE A$$ to get things working properly. Things just don't plug in and work - this is not a G-Tech Firewire or USB3 drive. Things go wrong, and there are lots of variables.
Test your drives one at a time. I assume you have a "toaster" (like from OWC for $75) so you can test the drives - especially drive #7.
And then replace drive #7.
here is a $200 brand new WD RED drive from B&H ? Don't have the $200 - THEN DONT DO THE JOB, because if you show up on a job shoot, even with a brand new RAID array, and you dont' have spare drives in case you have a failure during the shoot, then you are simply out of your mind, and should not be doing the job. And if you say "I know that makes sense, but the client is paying me ZERO for this" - then both you and the client are out of their minds. It costs money to do this stuff, and if you, or the client feels that they are going to play Russian roulette with their production shoot - then all of you deserve to lose all of your data.
The ATTO R680 should do an advanced initialization (256k is fine) in one shot. If it's failing, you don't say "gee, let me try this again, maybe I will get lucky this time", you say "I HAVE A PROBLEM with either these drives or the Dell box, and you troubleshoot it, by TESTING THE DRIVES, and if they work, and you still can't create the RAID, then you change out the Dell. If I were in your shoes, I would bring the entire rig over to your best buddies at the post facility that you trust, set up the raid, and leave for the day. When it fails (which it will), you can say to them - "hey, what's going on with this DELL array box you gave me". Don't want to be disrespectful to them - well, maybe you should not be doing this.
You must confront people all the time, if you are having issues.
It's not you, you are not stupid, you have a problem with the equipment - be it the drives, or the Dell box, and you have to confront someone.
There is no quick answer for you.
Rescue 1, Inc.