Drop-frame issues persist... What did I miss??
Greetings to you all. I've been lurking here for a long time, but I've decided that I need to ask for a bit of help from the community.
I run a small system integrator in NYC, and I usually build render farms, high performance workstations and storage servers for 3D companies, and more mundane non-media clients.
I got a referral to a client that was setting up a Final Cut shop on a tight budget, and so I went about the calculations.
He wanted a centralized storage server with at least 16TB capacity, and he needed to connect 3 edit suites to it.
Here's what I built out:
3x 8 core Nehalem Mac Pros as workstations
each with a Myricom 10Gb CX4 PCIe 8x NIC
a 6 port 10Gb HP managed ethernet switch capable of (and set to use) 9k jumbo packets
and a PC server.
I used a PC for a few reasons, primary among them being price and familiarity, and the availability of chassis which would allow me to direct-connect all the drives in my array.
The server is:
CPU: Xeon W3520 - 2.66GHz hyperthreaded quad core
RAM: 6GB DDR3 1066 ECC
NIC: Intel Dual Port 10Gb CX4 adapter (the same hardware as the Smalltree dual CX4 adapter)
RAID controller: Adaptec 51245 - 12 port SAS with dual core 1.2GHz engine and 512MB ECC battery backed cache
RAID HDDs: 12x Seagate Barracuda XT ST32000641AS 2TB 7200 RPM 64MB Cache SATA 6.0Gb/s (on the controllers supported hardware list)
System Drive: Mirrored 60 GB SSDs
OS X 10.6.4.
Based on the Myricom cards readme file, the jumbo frame size has been set to 8244 bytes: Quote
"For better TCP performance, it is necessary to increase the TCP window size beyond the default value.
To make this change permanent, edit (or create) the file /etc/sysctl.conf with this line in it:
On MacOSX, as with most BSD based stacks, restricting the TCP maximum segment size (MSS) to an even multiple of the mbuf cluster size keeps things nicely aligned, and results in improved performance when using jumbo frames. Unfortunately, the MacOSX TCP stack does not do this (Apple Bug Id #4919145), and the only way to do this is by adjusting the interface MTU by hand. The most common TCP packets will have an rfc1323 timestamp option, making for a header size of 52 bytes. Therefore, setting the MTU to 8192 + 52 (= 8244) results in optimal performance."
...Who knew... Well, I did as they suggested, both with the MTU and the maxsockbuf
Windows Server 2008 R2 Standard
RAID: 12x 2TB drives in RAID 6 = 20TB capacity
2x 10Gb links from Intel NIC to switch configured in to a 20Gb static trunk
Jumbo MTU manually configured to 8244 bytes so the jumbo packets will be aligned and not fragment
Initial internal benchmarks of Server:
System Drive: 200MB/s sustained, .12ms access time
RAID: Read: 683 MB/s sustained, 693 MB/s average, 809 MB/s peak. 12 ms access time
Write: 558 MB/s sustained, 599 MB/s average, 641 MB/s peak.
At first I tried benchmarking SMB. Obviously its performance is atrocious, but I figured with so much horsepower on both ends, and such a fat pipe everything would be fine.
I set up a 6GB RAM drive on one of the Mac Pros and started hauling huge files back and forth.
My total transfer speed never got above 135 MB/s Server -> Mac, and 120 MB/s Mac -> Server.
While this is a tiny fraction of the available bandwidth I thought it would be ok, but when testing projects on the FCP workstations, I get dropped frames from time to time (admittedly not too often), even when dealing with 1080i ProRes422 HQ (~31 MB/s).
Strangely, When doing tests with 10 bit YUV 1080i footage (~ 166 MB/s), I get a very similar frequency of dropped frames.
So... I brought in a more novel approach, at least for testing.
I decided to try out iSCSI.
The Mac clients are using the GlobalSAN iSCSI host 22.214.171.124
The Server is running the free version of the StarWind iSCSI target, version 5.4
I set up two 2TB slices on the server, one for each of the two primary edit bays. These are static size image files.
I mounted each slice on each of the FCP workstations, and formatted those slices HFS+
This means the Windows server is now directly hosting HFS+ partitions through iSCSI, cutting out a ton of the intermediary nonsense.
This dramatically boosted transfer performance. Server -> Mac (RAM drive) is now at around 320 MB/s, and Mac -> Server is around 280 MB/s
These rates drop to about 250 MB/s and 230 MB/s respectively when you hit the server from both workstations simultaneously. (hitting a random-read bottleneck on the RAID array I think)
These performance numbers should be pretty awesome for most things, except that the macs are still dropping frames, especially when they're both trying to pull 1080i 10bit YUV (166 MB/s), but even occasionally when a single system is attempting to pull 1080i ProRes422HD @ 32 MB/s, which is just stupid...
I got a utility that allows me to watch TCP/IP stats in real time, and this allowed me to trouble shoot a fragmentation error that was initially causing really poor network performance, but that was when I was trying to run everything with standard 9K jumbo frames. At this point it doesn't look like I'm suffering from debilitating TCP fragmentation any more.
So what the heck is going on??
Has anyone experienced this sort of issue before?? Is there some sort of magic bullet in FCP that I missed that creates bigger buffers and is more tolerant of jitter? Am I missing some configuration issue? Did I not properly spec some piece of hardware?
What am I missing???
What sort of performance should I be expecting out of this setup? Shouldn't I at least be able to do 1080p ProRes422HQ through SMB?? Shouldn't I be able to do 4K ProRes422HQ through iSCSI?
If I have to use iSCSI permanently will MetaSAN allow me to share the same volume amongst the three workstations??
ALL YOUR HELP IS GREATLY APPRECIATED!
ARC Systems Consulting - Brooklyn, NY
this is a low budget system ? 10 Gig and you can't do ProRes without drop frames ! You could have done all of this with a simple MAC Pro as the server, and just ethernet. Only 3 clients - you didn't even need a switch. You could have just used a Small Tree PEG6 card.
And why did you use a PC server for an all MAC system that uses ProRes ? You do know that SMB from Apple doesn't work, right ?
ProRes422HQ is only about 30MB/sec. 10 Gig etherent with jumbo frames enabled will give you 350MB/sec (using Small Tree hardware). And you get drop frame errors, just from ProRes ? My advice - you have a nightmare there. Return the equipment, and build a SIMPLE ethernet based MAC system (you don't even need OS-X Server), and everything will work. Damn, you just have 3 clients that just want to do ProRes422HQ ? Why did you build this yourself, when there are countless solutions on the market that you can get for under 20 grand to do exactly what you want ?
I know I put a lot out there, and I am very impressed with your quick response time, but I think you missed a couple of things I said, such as knowing that SMB in OS X is ***expletive deleted***, to put it mildly. I think you also overestimated the budget.
The whole shebang cost a little more than $9,000... That includes all the 10 Gb networking hardware, cabling, storage, and even a nifty little slide-out rack mountable keyboard & mouse. The server can have another 20TB added to it with just another card and drives dropped in, and as it is its internal benchmarks are fast enough to sustain 4k 24p 8 bit YUV uncompressed.
Whenever I benchmark the network throughput from one system to another I *do* get numbers in the 250 - 350 MB/s range (using iSCSI). If things must stay iSCSI in the production environment then the appropriate license will be purchased and that will be that. What confounds me is why it will still occasionally drop frames.
My client asked for a system that could do 4K ProRes422HQ to two suites if pushed, or 1080p30 ProRes422HQ to up to 8 workstations if they have a rush. When I run benchmarks everything looks good, and then when I try to stream stuff from the timeline in FCP I get dropped frames, even with extremely low bandwidth demands, but only very occasionally, and otherwise it's not even close to struggling. CPU usage on the server stays around 9% even under heavy load, and all the RAID array statistics available in Adaptec's monitoring software are totally in the green.
Granted, if we had to swap out for a Mac Pro only a fraction of the initial hardware wouldn't be transferable, but I was posting here hoping for something a little more constructive than "gut it".
It's close. I just need to nail down the gremlins. Fixing the TCP fragmentation got me a good part of the way there, for instance.
ARC Systems Consulting - Brooklyn, NY
Check your FCP settings. Apple has a list of settings you can adjust when experiencing dropped frames. It's on their website.
SANmp (From the same people who make that free globalSAN iSCSI initiator your using) (or MetaSAN) will allow you to share the same volume(s) amongst your 3 rooms. That's the easy part! Keep us posted...
Studio Network Solutions
replies below -
When I run benchmarks everything looks good, and then when I try to stream stuff from the timeline in FCP I get dropped frames, even with extremely low bandwidth demands, but only very occasionally, and otherwise it's not even close to struggling.
REPLY - now you know the difference between crappy host controller cards and good host controller cards for shared storage enviornments. You can have great performance (the Areca 1680x does about 700MB/sec), but has crappy latency. One little hesitation, and you get dropped frames. Use an ATTO R380 SAS host adaptor for your drives - NO PROBLEM. How on earth you chose the Adaptec, when everything including the Highpoint will outperform it, is beyond me (maybe it was build into the computer).
It's close. I just need to nail down the gremlins.
REPLY - there are no gremlins. You will find as you do this, that certain products work well in shared storage enviornments, others do not. You already found out that Apple's SMB is useless. Want to know why I get cranky at you. Because just like you, I sit there and MAKE MISTAKES, and try stuff that should work, and guess what - it doesn't more than half the time. I try different switches, different drives, different host adaptors, etc, etc, etc. Some things work, some things don't. I was one of the early people that got stuck with the Seagate drives when Seagate when bad. So when people now tell me "I've used Seagate for years, you don't know what you are talking about" - I call them a moron - because I've used Seagate longer than anyone, and I got screwed bigger than anyone, so I don't need to hear someone crying to me, telling me what they know.
REPLY - hire a contractor or consultant to help you the first time you do this - OR be prepared to make lots of mistakes and be embarassed in front of your clients. With all of my background, there are plenty of things that I don't know, and I still make mistakes, and we still have products fail, and we still have limitations. Just because you know computers doesn't mean that you know this stuff. It's hard, it's confusing, and with every new firmware release, things stop working for no reason, and you have to suffer thru it. Dont' want this aggrivation? Bring in someone with more experience.
Brandon Yates in NY can help you - Yates Parks.
I'm with Bob on this. I would much rather have had a Mac server (since it natively supports AFP and uses a BSD kernel). Then you could profile the IO using Dtrace and see what's happening with the Samba or AFP daemon. Are the IOs small or larger? What's the latency on each of them? Depending on the drives, I'll wager you see "lumpy" numbers that are very good most of the time, and very bad some of the time.
Most vendors focus on drag race benchmarks (which is what you posted as well) and those are not relevant when dealing with a shared realtime load.
I'm curious as to what you're TCP statistics look like. I'll wager you are seeing a lot of SACK events on one side or the other. That will be another problem and probably explains your 1Gb performance numbers.
CTO, Small Tree Communications