10GbE woes on OSX
I have been struggling with 10GbE networking on OSX for over a year now and several pros (including the wonderful Chris Duffy from Smalltree) haven't been able to help me with this issue yet. So maybe someone has a clue:
I am running a Synology DS3612xs NAS with a Intel 10GbE card installed. The NAS is connected to a Netgear XS712T switch.
I have several clients connected on the switch:
- 2011 Mac Pro with a Smalltree 10GbE PCI-Card
- MacBookPro with a Smalltree 10GbE PCI-Card in a Sonnet Thunderbolt chassis
- 2013 MacPro in a Sonnet xMac Pro chassis with a Smalltree 10GbE PCI-Card
- 2013 MacPro with a Promise SANLink 2 10GbE
- iMac connected via Gigabit
From each 10GbE I can get great speeds to and from the NAS (around 600MB/s) - so far, so good but from time to time and several times a day, suddenly it seems like the network interface on the client completely stalls. When I got speeds around 600MB/s before, I suddenly get speeds around 1-2kb/s. This occurs more likely when I browse directories in Finder or it happens for sure when I let the client run over night and try to work on the NAS in the morning.
The only fix I found for now to get the interface work properly again is to issue a sudo ifconfig en# down and sudo ifconfig en# up in terminal. This works on all clients, except the MacBook Pro. With this one only a restart helps.
I suspected the switch to be the source of the problem so I established a direct connection between the NAS and the client. Same problem.
So my next step was to rule out the NAS. Currently I am testing the great flow:rage system from toolsOnAir. This NAS is great, unfortunately I still have the same problem.
All my clients are running „net.inet.tcp.delayed_ack=0“ in /etc/sysctl.conf
Anyone has a clue?
I'm in the middle of specing a 4+ user system, expandable to 6 or so..mostly offline work, with one or two heavy use clients. I've been looking at ethernet based NAS-like stuff. The appeal of these NAS systems is that they are less expensive and also, can be made DYI,or bought value-added from vendors like SmallTree, InfoTrend, ACNC, etc.
From what I'm learning, the protocol used on these GigE systems requires tuning and when it does go out of whack, needs tinkering often deeper than the end user can manage. Meaning, you can't just download an updated driver and you're off and running. I'm hearing this is due to the underlying protocol ethernet uses, not designed for big file moves like direct storage-to-client protocols like fibre channel. It seems there will always be tuning needed along the way. It's not set it and forget it to the extent that any OS update can meaningfully alter that protocol.
I'm wondering what the VARs here say to this.
4222 santa monica blvd
los angeles, ca 90028
323 906 9700
Hi David -
you know exactly what the VAR's think here. As I have stated over and over again on this forum, there are countless vendors, and COUNTLESS VAR's right near in you LA and Burbank that can provide you with exactly what you want. And there are countless vendors that advertise right here on Creative Cow, that you can choose from, and compete with each other for your business. And you can buy solutions (that I am aware of ) for as low as $4700. And I have stated all of these exact things over and over again on this forum, showing LISTS of companies to choose from.
But obviously, that is simply not acceptable to many people. For example - Chris Tilley above, could have certainly purchased a turnkey system from Small Tree, and it would have worked perfectly, without any of the issues that he is seeing. You have companies that compete with you, that own shared storage systems - I am sure that you know who they are. So why not choose one of those. I KNOW WHY. Because like Mr. Tilley, you think that you can go out and save a couple of bucks by trying to piece it together yourself. And look what happens. There are recently LOTS of theads on this forum, that show exactly the disappointment when people try to piece it together themselves.
When Solventdreams produces a show, or edits a show, I am sure that your clients don't say "we don't need these idiots from Solventdreams, my son has a Canon 5D and a Scarlet, and we can shoot it ourselves, and he can use iMovie to edit it". No, what happens is that you do the work, you do PROFESSIONAL work, and you get to charge for your professional work, and the clients of Solventdreams says "wow, you guys did a great job - thank you". And for the clients that let their children in film school do the job - well, it comes out like crap.
I could go on and say "you should use brand x and brand y", but the real bottom line, is that I am currenly not aware of ANY brand of shared storage that is a piece of junk. THEY ALL WORK. And let me assure you that if Mr. Tilley was not astute enough to call Small Tree, and get Chris Duffy's assistance on the phone, his Synology would have been a boat anchor right now, because I will bet money that Mr. Tilley simply went to a mail order company, ordered the Synology without any other hardware, and said "I can do this myself". Well, he failed, and now, with Small Tree's help, he has a system that it working with a few minor issues. Had he purchased the entire turnkey Small Tree, he would not have ANY ISSUES.
I don't want to make it seem like Small Tree is the only vendor - THEY ARE NOT - there are lots of them, and they advertise right here on Creative Cow, and participate on Creative Cow. YOU NEED HELP, and you have to pay for that help -even if its only a labor fee, and you buy the stuff yourself from mail order companies. And when you say "you are full of crap" - well, you fail.
Rescue 1, Inc.
Listen to Mr. Zelin. He is wise. And happens to have installed my flawlessly working 10GE set up.
For Camera Accessories - Monitors and Batteries
I believe you've misinterpreted my question.
In any case, those not already professionally mature and well-mannered as you are, may in fact, benefit.
I'm asking a very specific question because another highly respected VAR who advertises here as well expressly pointed out the issue raised by this specific question--that the Ethernet protocol is not well suited to a 4+ scalable user situation in a mission critical environ as it requires fundamental VAR-oriented stack tuning and the like and it was not designed for heavy video use, thereby requiring tweaks. So, I'd like to learn more why people who sell the stuff think it is suitable, if there are limits compared to the fibre channel ecosystem (of which I'm familiar) and which is designed for this direct access heavy lifting.
It may interest that I'm not putting these together like a VAR or trying to save money and roll my own. And of no minor consequence, that I've had a major VAR provide an Ethernet-based shared storage solution which failed to perform in a mission critical environment. The vendor graciously apologized--and who knows, it may have been multiple systems--but it was a disaster and the solution was not robust enough.
I do know for certain, that if I'd used a fibre-based solution, those problems would not have presented--at least from my experience. The SAN I used for years was set it and forget it--across many OS updates, different programs, file types. It just worked 99% of the time. The 1% was the usual stuff, user operation things.
In any case, it's made me think twice about ethernet-based solutions. And I'd love to know more about the inner issues of 10G systems.
4222 santa monica blvd
los angeles, ca 90028
323 906 9700
Your question is applicable and I think a good one to explore. Although TCPIP protocol was designed to carry traffic over wide distances, it can be tuned to work for video. The VAR’s that have successfully done this have been able to create mission critical systems that perform well for video. Some of these systems can even out perform traditional fibre channel SAN based solutions.
To your question, one of the reasons an ethernet solution can be successful in our environment is that in many ways it has less moving parts than a traditional block-mode access SAN. The ability to perform ‘file-locking’ through the higher-level protocol SMB / NFS can have a great many benefits over the overhead of a metadata controller. This is the key difference between the two technologies and they both have their positives and negatives. That said, a properly constructed NAS with 10GbE optical connections and the correct protocol stack can blow away many SAN’s at a fraction of the price.
Of course the key is getting a system (and a system builder) that understands these technologies at the protocol layer and is capable of creating the unique capabilities required for streaming video. It’s not rocket science but it can be challenging. As for it changing all the time, I don’t believe that is true. Apple and Microsoft tend to throw a curve ball at us every once in a while, but we don’t find that happening frequently.
Thanks for the reply Jess.
This discussion interests. What are the great many benefits you mention for a NAS over a fibre SAN aside from cost? The 10Ge has a huge price appeal, obviously. But what are the trade offs?
From my limited understanding, the metadata part on fibre systems is handled on a separate network (like ethernet) and by a separate controller. I thought this didn't add overhead and if so, it was marginal.
My concern is that a TCP/IP protocol based NAS ties one to a human pick-up-the-phone vendor support should that relatively complex and proprietary server build go south or does something weird with the client OS--which in the case of Apple, gets altered without letting anyone know. This is what I'd like to avoid.
In my limited experience, and in the opinion of this other highly regarded SAN vendor, the points of failure on a fibre SAN are relatively modular, standardized and largely field user serviceable with spares as one such anticipation. The only complex knowledge required was switch zoning, and save two failed Qlogic switches, the user available updates (without picking up a phone or having someone log in), and two brief support sessions, it was solid day-in day-out for 8 years.
This was not my experience with a value-added mainstream vendor NAS server which appears to be inextricably tied to support to rejigger the server & clients, should something fail. Also, everything inside is proprietary. Are there any user-friendly spares on these systems? I'd imagine largely not, as the storage is sold as one piece inside the NAS.
I'm not diminishing the valuable support angle, I'm attempting to illuminate the distinction in the kind of future support required for one of these units.
And asking to understand, in detail.
4222 santa monica blvd
los angeles, ca 90028
323 906 9700
I would love to know who the vendor is that is trying to talk you out of a 10G system. I can only assume that it is a vendor that is trying to shove StorNext down your throat, to maintain a highly complex system that requires them to support you.
As you know, AVID Unity was 4g fibre, not ISIS is Ethernet (both 1G and 10G) over iSCSI
Studio Network Solutions was at one time only Fibre - now they are 1G and 10G (also iSCSI), as well as Fibre.
ProMax offers 1G and 10G, but can offer fibre - but why should they.
Facilis at NAB 2014, was proud to show 10G networking, and while they offer even 16G fibre, the most cost effective and efficient method is with 10G, with the ATTO cards, and Netgear switch.
EditShare has always been Ethernet, and now offers both 1G and 10G.
Small Tree offers 1G and 10G in hi speed NAS enviornments.
and then there is the usual list of vendors that I list here on Creative Cow all the time, that offer wonderful solutions.
But I have to assume that you have a vendor that is pushing StorNext, and is telling you that everything else on the market is crap. Fibre systems were never "set it and forget it". I worked on plenty of AVID Unity systems, and like with everything else in the history of linear and non linear video, right until today with all of our software that we edit with - there are always issues, from every manufacturer. That's why we have new products, that's why we have NAB, that's why Creative Cow is here - to discuss problems. But all the vendors I listed, and all the vendors that I didn't list don't sell unstable systems.
You can certainly build a system with "user friendly parts", like the one that I build for people, and you can certainly use off the shelf mail order NAS products with 10G interfaces from the usual (QNAP, Netgear, Synology, Thecus), but they won't give you the same performance as the ones mentioned in this post.
In my opinion, 10G is where it's at right now, and if that's not fast enough for you (speeds that exceed the bandwidth requirement for a 6K RED Dragon at 5:1) - then knock yourself out with a nice Facilis 16 gig fibre system, and have an excellent vendor like Keycode in LA put it in for you.
Just stop thinking about StorNext, please.
Rescue 1, Inc.
Although you may not have had the same results, in my experience and I think a great deal of my colleagues, SAN’s are by no means problem free technology. There are Pro’s and Con’s between SAN and NAS yet the differentiation is disappearing year after year. There are applications where SAN’s are a better fit, however, that is changing. Other than a few remaining hold outs, almost all editing applications now support NAS connectivity for shared storage. And as Bob mentioned, most shared storage vendors in this space now have a NAS offering. If it wasn’t what the market wanted, these vendors wouldn’t go there.
Regarding your concern that the tuned protocol ties you to vendor support, it’s true. That’s the trade-off you get when you buy instead of build yourself. For most people, it is a reasonable trade-off. For some, they would prefer to tune the system themselves. In essence, these individuals prefer to replicate what most of the vendors do with a staff of engineers who understand the core protocols. Most of the time it’s costly and frustrating.
That said, I doubt that any SAN solution you purchase lessens the burden of vender reliance. If the SAN goes down and requires support, you’re in the same position. Unless you build a system yourself and have the technical capabilities to fix it, it seems to come down to what systems are inherently more stable.
In my experience TCPIP protocol is no more volatile than any other technology (Fiber Channel with SAN Layer included). We have been building and supporting both SAN and NAS solutions for a decades and the failure and problem rates are about the same. Frankly we had more problems with XSAN permissions than we ever had with our tuned NAS solutions.
User serviceable parts are available for almost all of these solutions. And for some, including ours, drives as well. Spare kits can be purchased for key hardware components including power supplies, controllers etc.
I think the key point here is that the vendors that play in this space collectively have many thousands of specialty NAS based video systems in place and that number is growing. We are not seeing the same trend in the SAN field. And the key is specialty. A generic NAS system cobbled together will perform poorly and inconsistently. That’s why companies like ours are in business.
I only wish that David would simply say "vendor X in LA says that this StorNext solution is the only reliable solution out there today". Then I could call that vendor on Monday, and find out the whole story, because almost NO ONE today can survive on selling SAN only - I can only assume that this vendor also sells alternate solutions, but makes the most profit (particularly on their support contract) from their StorNext based SAN, and is trying to shove it down David's throat (don't listen to idiots like Bob Zelin - we have major clients, and StorNext is the ONLY reliable solution for you).
Rescue 1, Inc.
We are moving away from Xsan/StorNext toward a NAS based infrastructure. Our facility is very large and does the full range of workflows, editing, massive HD-SDI ingest/playout, and very aggressive encoding.
We deploy Xsan volumes ourselves so we see Xsan as a DIY system which gives us choice as to which storage to use in any particular volume. So far our experience with NAS is much more ridged with the NAS "head/server" and storage being single vendor supplied. For us zfs has the potential of being the "DIY" platform which would allow more storage choices.
Some drawbacks of SAN are the dual connectivity required (ethernet and fiber) and the scale up limits after the initial volume is built. We also found that any maintenance that requires us to stop a volume also requires us to shut down every client. We have 130 clients! With a SAN, each client does the heavy filesystem work which is good for distributed performance but it's very low level and means any volume can adversely affect the client/application.
With NAS, the filesystem is sustained by the NAS head/server and the client has a simple ethernet connection to it. The drawback is performance is limited to that single NAS head/server. Systems like Isilon and Harmonic get around this limit with a distributed proprietary filesystem where compute and storage are added together. You can get around that by deploying addition NAS head/servers but then that's a new name space. We found zfs very good at scaling up as long as you build it with that in mine.
I'm not sure how this conversation got into a discussion about network turnkey solutions when the question was a very basic problem about OS X slowing down 10Gbe connections after a while. I actually believe this to be a bug in OS X as we are too experiencing random slow downs on 10GBe adapters. This has nothing to do with the server or NAS connected on the other side. It's about the client side.
I had a hunch that it had something to do with 10Gbe over thunderbolt but it seems as if Chris is also experiencing this with regular PCI cards.
Chris, have you made any further discoveries since December? It would be nice to get to the bottom of this and report the issues to Apple.
Bumping this topic up, because I am facing the exact same problem in our edit suite. Mac Pro Late 2013 connecting to a Windows 2012 R2 server over 10GbE with the Sonnet Twin 10G TBL2 Adapter. Speeds are great but will randomly drop to miserable lows after a short time. Lower than 1G even.
Only thing to solve this is to restart the Sonnet adapter, what will bring the connection back up to speed. It will however lose speed pretty much after a couple of minutes again. Very frustrating, especially when you are not a networking pro...
ask for the previous generation of driver for this card. It's not on the website.
Rescue 1, Inc.
thanks for the fast heads up! I have contacted Sonnet Support and will report back if this solve the issue.
So Tobias, what was your solution?
Online Editor / Colourist