FORUMS: list search recent posts

Is link aggregation the solution?

COW Forums : SAN - Storage Area Networks

<< PREVIOUS   •   VIEW ALL   •   PRINT   •   NEXT >>
Jesse, DiJiFi
Is link aggregation the solution?
on Jun 2, 2011 at 8:15:20 pm

Hi there,

I run a consumer film transfer studio (for 8mm, 16mm) and the way we transfer film requires that the digital file (1080P Motion-JPG BlackMagic Codec @ ~40GB/hr) be captured on one system and then transferred to a different system for editing to keep things efficient. Timing is of the essence for us, so the transfer stations need to be transferring and the editing stations need to be editing all day, non-stop. So once the file is captured, we want to move it onto another system ASAP. We used to do this with SATA II drives using an external SATA II dock. The speed was okay, but the SATA II drives die too often and we eventually upgraded to Gigabit ethernet which outpaced the SATA II dock system. We need it to be faster, though, as 100 GB of transfers will still take close to an hour.

I've read a lot of articles and threads and feel that maybe we could use link aggregation between the 3 stations (2 for transferring and 1 for editing). The editing station has a fast 12 TB setup over 2 G-Speed eS units in RAID 5 with 8 2 TB disks. The transfer stations are simpler setups of 2 1 TB disks in a RAID 0 stripe. We simply share the editing drive (12 TB) over the network to receive the transferred files while we are still editing off the drive.

Anyone have experience with this?


Return to posts index

Steve Modica
Re: Is link aggregation the solution?
on Jun 2, 2011 at 8:22:46 pm

Link aggregation won't work in a scenario like this. It's basically a socket balancing thing. When clients connect to servers, they open a socket. That socket gets assigned to a port. As more clients come in, they get randomly assigned and are load balanced.
On the client side, there's only ever one socket, so you won't see any additional bandwidth. Does that make sense?

Steve Modica
CTO, Small Tree Communications


Return to posts index

Jesse, DiJiFi
Re: Is link aggregation the solution?
on Jun 2, 2011 at 8:40:57 pm

Oh, okay. I think so.

From what I read before it seemed that link aggregation could double or triple the bandwidth of gigabit ethernet between a switch and a storage array. I thought maybe that connection could be made between a switch and multiple desktops (and their attached storage arrays), but I guess it has to be a standalone storage array designed for link aggregation?

So just to be clear, there is no way to use link aggregation on the client end in order to increase bandwidth?

Thanks for your help!



Return to posts index


Alex Gerulaitis
Re: Is link aggregation the solution?
on Jun 2, 2011 at 9:39:52 pm

[Steve Modica] "On the client side, there's only ever one socket, so you won't see any additional bandwidth. Does that make sense?"

Does this still hold true in a hypothetical scenario where the links are trunked directly from the server to the client (no switch) via LACP?

I read that LACP allows to present multiple physical links as a single logical channel. I am not sure how it works of course, i.e. if something like file transfer will be able to use multiple physical links.

Thanks!

Alex (DV411)


Return to posts index

Steve Modica
Re: Is link aggregation the solution?
on Jun 2, 2011 at 10:04:41 pm

The links are presented as a single logical channel. That is true.
The 802.3ad spec requires that a "conversation" must exist on one port. (This is to maintain TCP ordering. TCP stacks cannot deal with lots of out of order packets. That's an exception condition).

So what happens is a socket opens and gets assigned to a port. The only time it will ever hit another port is if the first port fails.

Many people think LACP acts like a striping utility with the packets. This can't work because packets 1 2 3 and 4 would end up on different ports and would arrive "out of order". The stack would go bonkers.

At SGI, we did this experiment. We actually wrote a driver to do this. It took 3 CPUS to handle 2 striped gigabit ports and if we added a 3rd port, it couldn't go any faster (the re-ordering caused a bigger slowdown than the new ports additional bandwidth).

This is a problem people have wanted to solve forever. SGI's NUMA and several other protocols that use "scheduled transfers" and RDMA like buffer splitting were created. However they were all etherNOT and etherNOT never wins. It required special hardware and rewritten stacks etc.

Steve Modica
CTO, Small Tree Communications


Return to posts index

Alex Gerulaitis
Re: Is link aggregation the solution?
on Jun 2, 2011 at 10:13:13 pm

Thanks Steve, that cleared things up.

As far as helping the original poster speed things up - will Jumbo frames help him? Faster GigE switch? A direct 10GigE connection between the two stations that need a faster file transfer? (I.e. install 10GigE NICs in both machines and cable them up?)

(Of course an 8Gbs SAN will help but that's an order of magnitude more expensive than any options above.)

Alex (DV411)


Return to posts index


Steve Modica
Re: Is link aggregation the solution?
on Jun 3, 2011 at 12:02:40 am

Jumbo frames could help, although that will mostly be cpu reduction.

10Gb can probably help with a couple provisos:

1. Using a normal copy, there's only one cpu thread running to push the data across. This one thread will top out at around 300MB/sec. This is a function of how fast a single cpu can turn the TCP crank. Faster cores could make this a little faster.

(FCP and quicktime use AIO routines which work in parallel. Servers get the advantage of having many open sockets, so more cores are brought to bear)

2. You might not get 300MB/sec if the storage on both sides can't handle that bandwidth. You won't go faster than the slowest element.

Steve

Steve Modica
CTO, Small Tree Communications


Return to posts index

Steve Modica
Re: Is link aggregation the solution?
on Jun 3, 2011 at 12:05:14 am

In answer to my own question:
The 2 RAID0 striped drives in the destination stations will be the bottleneck. If they are SATA devices I think you'll be topping out at 140MB/sec best case.

Steve Modica
CTO, Small Tree Communications


Return to posts index

Alex Gerulaitis
Re: Is link aggregation the solution?
on Jun 3, 2011 at 12:11:49 am

[Steve Modica] "1. Using a normal copy, there's only one cpu thread running to push the data across. This one thread will top out at around 300MB/sec. This is a function of how fast a single cpu can turn the TCP crank. Faster cores could make this a little faster. "

Great info - thanks Steve. I had no idea TCP had that much of a computational overhead. Are there ways to reduce it and speed up 10GigE to get it closer to its 10Gbs ceiling?

Alex (DV411)


Return to posts index


Steve Modica
Re: Is link aggregation the solution?
on Jun 3, 2011 at 12:20:05 am

10Gb itself goes line rate. You can fire up a benchmark like iperf and show that pretty easily. The problem lies in pulling data off of disk, segmenting it and getting it out to the network. There's significant overhead in that (and reversing the process on the other side).

Our 10Gb products are doing segmentation offload now and receive side coalescing, so that helps. I have not run new benchmarks to see if things have gotten a lot better.

I think the main code improvements need to be in the Samba/AFP code. This isn't an apple issue either. They all have these limitations.

10Gb running a block protocol like FCoE or even iSCSI goes pretty close to line rate.

Steve Modica
CTO, Small Tree Communications


Return to posts index

Steve Modica
Re: Is link aggregation the solution?
on Jun 3, 2011 at 12:22:03 am

One more thing:
AFP (and Samba) do a lot of consistency checking since they are shared protocols. So they stat files and directories very frequently. When we put together blazeFS a long time ago, reducing that was one of our primary goals. That's a big contributor to the overhead.

You can watch all that happen with tcpdump.

Steve

Steve Modica
CTO, Small Tree Communications


Return to posts index

Jesse, DiJiFi
Re: Is link aggregation the solution?
on Jun 3, 2011 at 1:01:27 am

Wow, thanks so much Steve and Alex. This is really interesting information, though a lot of it is over my head.

From what I can tell, 10 Gbps products are extremely expensive, so it seems this is not an option for me. I was hope to spend less than $1,000 on simply speeding up my network between stations, but it seems this is currently too new of a technology.

I will certainly try the jumbo frames, and will consider the other options for a while.

Thanks again, you're heroes!



Return to posts index


Alex Gerulaitis
Re: Is link aggregation the solution?
on Jun 3, 2011 at 1:28:54 am

[Jesse, DiJiFi] "From what I can tell, 10 Gbps products are extremely expensive, so it seems this is not an option for me. I was hope to spend less than $1,000 on simply speeding up my network between stations, but it seems this is currently too new of a technology."

From what I understand, you could do it for $1110: two Intel 10Gigabit AT2 Server Adapters for about $550 each, a cross-over Cat6 cable ($10 or so), and you are all set - as long as you are only interested in speeding up a file transfer between two stations. According to Steve, you should see 300MB/s if the drives can handle it. Like Steve said, the weak link (after you upgrade to 10GigE) will be your dual-drive RAID0 (200-280MB/s are most common speeds).

Or, you could go with Small Tree Comms for about $1000-2000 more and get even higher speeds.

Alex (DV411)


Return to posts index

Steve Modica
Re: Is link aggregation the solution?
on Jun 3, 2011 at 1:37:36 am

[Alex Gerulaitis] "From what I understand, you could do it for $1110: two Intel 10Gigabit AT2 Server Adapters for about $550 each, a cross-over Cat6 cable ($10 or so), and you are all set - as long as you are only interested in speeding up a file transfer between two stations. According to Steve, you should see 300MB/s if the drives can handle it. Like Steve said, the weak link (after you upgrade to 10GigE) will be your dual-drive RAID0 (200-280MB/s are most common speeds).

Two comments:
The Intel cards won't work in a mac without a driver. So the small tree cards are the only option for those
Cross over cables are no longer required (since gigabit)


Or, you could go with Small Tree Comms for about $1000-2000 more and get even higher speeds."


I think his limitation would be the raid0 stripe. I think before moving to a faster network, they should consider more drive spindles on the edit machines.

Steve Modica
CTO, Small Tree Communications


Return to posts index

Alex Gerulaitis
Re: Is link aggregation the solution?
on Jun 3, 2011 at 1:48:58 am

[Steve Modica] "The Intel cards won't work in a mac without a driver. So the small tree cards are the only option for those
Cross over cables are no longer required (since gigabit)"


Again, great info. I've been using patch cables for direct GigE transfers but never knew until now why they worked. :)

[Steve Modica] "I think his limitation would be the raid0 stripe. I think before moving to a faster network, they should consider more drive spindles on the edit machines."

There might be more to it:

[Jesse, DiJiFi] "We need it to be faster, though, as 100 GB of transfers will still take close to an hour."

If my calculations are right, 100GB/hr is about 28MB/s which is roughly 25% of the GigE line rate; he should be able to get much higher speeds by optimizing his NICs, possibly using jumbo frames and a faster switch.

Alex (DV411)


Return to posts index


Steve Modica
Re: Is link aggregation the solution?
on Jun 3, 2011 at 1:59:25 am

[Alex Gerulaitis] "If my calculations are right, 100GB/hr is about 28MB/s which is roughly 25% of the GigE line rate; he should be able to get much higher speeds by optimizing his NICs, possibly using jumbo frames and a faster switch."

Good catch. I actually scanned that and just read it as "100MB/sec" assuming he was talking about gigabit speed.

he should be seeing 90MB/sec pretty solid if he's moving a large file.
30MB/sec is really horrible.

Steve Modica
CTO, Small Tree Communications


Return to posts index

Jesse, DiJiFi
Re: Is link aggregation the solution?
on Jun 3, 2011 at 3:23:30 am

Again, thanks to you both!

Interesting, regarding the direct connection (though I need to connect 2 'media ingest' stations to a 3rd editing station, not just one to one here). We are on Windows machines only, by the way.

And yes, it seems I must have a pretty horrible speed then, even though our switches and desktops support 1 Gbps. I'll have to try jumbo frames and see what kind of improvement takes place. 90 MB/s might be enough for our purposes, though 100-200 MB/s for a $550 investment in each of the 3 stations may pay for itself in the long run. I am also able to add extra drives I have available to increase the RAID 0 arrays of the two ingest stations, which I will do soon.



Return to posts index

Andrew Richards
Re: Is link aggregation the solution?
on Jun 4, 2011 at 12:11:52 am

[Alex Gerulaitis] "
Great info - thanks Steve. I had no idea TCP had that much of a computational overhead. Are there ways to reduce it and speed up 10GigE to get it closer to its 10Gbs ceiling?
"


Aside: has TOE ever been attempted on OS X? I know on Linux it requires kernel extensions as well as drivers...

Best,
Andy Richards

VP of Product Development
Keeper Technology


Return to posts index


Steve Modica
Re: Is link aggregation the solution?
on Jun 4, 2011 at 12:32:37 am

[Andrew Richards] "Aside: has TOE ever been attempted on OS X? I know on Linux it requires kernel extensions as well as drivers.."

Not really. Neterion tried and had a large hardware abstraction layer. They kept telling me it was done, but the driver never released and was always "alpha" and you had to ask them to mail it to you. They never mailed it to me :)

Chelsio did a slow path driver (no TOE), and claimed they would have TOE later. From my understanding, the upper management of software within apple has no intention of putting in the kernel TCP stack hooks to let that happen. (I'll explain that in a sec)

We tried to do it with Intel's "quickdata" dma offload engine. That was an exciting project. We got the design all completed and then Intel decided not to move forward with the chip, so we were screwed. But even in that case, we needed some Apple help to make that work.

So what's needed??
Ultimately, when you transmit data out of the machine, the OS puts the data into mbufs. It has to keep these around in case they get lost and have to be retransmitted. however you are writing to your socket buffer and could care less that the OS needs to keep those buffers. You'll happily overwrite them. So the OS has to copy them off to a safe place in the kernel. So all your data is memcopied before it hits the network. This uses a *lot* of cpu. Each word has to go through a ld/sd operation.

TOE and quickdata were supposed to subvert this. However to do that in the BSD kernel stack, there's a check to see if an offload path exists. If so, it's called rather than the normal memcopy. Apple does not have this check and has not wanted to put it in. Hence, no TOE.

(Historical note: Dave Miller hated TOE and refused to let it into the Linux stack. So those guys all had to release kernel patches to support their stuff. That was a huge hassle and was yet another reason TOE failed)

Steve Modica
CTO, Small Tree Communications


Return to posts index

Andrew Richards
Re: Is link aggregation the solution?
on Jun 6, 2011 at 11:34:52 pm

Thanks, Steve! I figured you'd have the dirt on that one.

Best,
Andy Richards

VP of Product Development
Keeper Technology


Return to posts index

Bob Zelin
Re: Is link aggregation the solution?
on Jun 2, 2011 at 10:56:40 pm

REPLY -
Does your mother call you DiGiFi ?

There is the straight answer to your question -

you state:

The editing station has a fast 12 TB setup over 2 G-Speed eS units in RAID 5 with 8 2 TB disks. The transfer stations are simpler setups of 2 1 TB disks in a RAID 0 stripe. We simply share the editing drive (12 TB) over the network to receive the transferred files while we are still editing off the drive.

REPLY - your G-Speed eS unit will not work as a drive array for a shared storage server. You want shared storage, you need a dedicated server, you need a professional VERY FAST RAID array, and THEN you can put a multiport ethernet card or 10Gig card into your server, tie that to a matching ethernet switch, and accomplish what you want (to tie your 3 editing systems together, so they can all share the same media). AND, you can't use the server computer as one of your editing systems. Why ? Because you will get drop frame errors.

You started this thread, thinking that you would buy an $800 ethernet card, stick it into one of your MAC's, set up link aggregation, and use your existing equipment to have a shared storage system. It's not going to work.

Do you want to see a drive array from G-Tech, that will be suitable for your work ?
http://www.g-technology.com/products/g-speed-es-pro-xl.cfm
AND a dedicated MAC Pro as a server computer, AND a switch, AND a multiport ethernet card, so you can accomplish what you want to do.

Compared to an Apple XSAN system, there are LOTS of shared storage systems that will do exactly what you want for a fraction of the price of a full blown Apple XSAN system. But simply putting a multiport etherent card in your Mac Pro, and using your existing drive array will not do what you want.

companies like AVID, Facilis, Small Tree, Apace, EditShare, Cal Digit, JMR, Studio Network Solutions, and Maxx Digital can all provide you with a working solution. Will you pay over 10 grand for a working system - YES YOU WILL ! Can you do it for $800 bucks, and use your existing G-Speed eS as the storage - ABSOLUTELY NOT.

Bob Zelin



Return to posts index


Alex Gerulaitis
Re: Is link aggregation the solution?
on Jun 2, 2011 at 11:07:18 pm

[Bob Zelin] "your G-Speed eS unit will not work as a drive array for a shared storage server."

Bob, I don't think Jesse needs shared storage for editing, or at least he didn't seem to ask for it:

[Jesse, DiJiFi] "the way we transfer film requires that the digital file be captured on one system and then transferred to a different system for editing to keep things efficient."

Jesse appears to be only searching for ways to speed up file transfer over Ethernet, and he asked a specific question about LAG.

Alex (DV411)


Return to posts index

Jesse, DiJiFi
Re: Is link aggregation the solution?
on Jun 3, 2011 at 12:55:42 am

Thank you Bob and Alex,

Yes, I was just wanting to increase our network bandwidth. Capturing and editing to an actual SAN is supposedly not acceptable by the machine we use and its accompanying software (which requires that we first capture a raw file, then run frame pulldown on the captured file to produce the final file that we are editing): http://moviestuff.tv/8mm_sniper_hd.html

Plus, we don't necessarily need a SAN since we are only editing on one machine, not multiple.

And DiJiFi is the name of my company! (http://www.dijifi.com)



Return to posts index

Bob Zelin
Re: Is link aggregation the solution?
on Jun 3, 2011 at 2:23:06 am

Jesse writes in the original post -
"I've read a lot of articles and threads and feel that maybe we could use link aggregation between the 3 stations (2 for transferring and 1 for editing). The editing station has a fast 12 TB setup over 2 G-Speed eS units in RAID 5 with 8 2 TB disks. The transfer stations are simpler setups of 2 1 TB disks in a RAID 0 stripe. We simply share the editing drive (12 TB) over the network to receive the transferred files while we are still editing off the drive."

REPLY - to extract your text - "we could use link aggregation between the 3 stations (2 for transfering and 1 for editing)." Maybe I am missing something, but it sounds like you want to have THREE STATIONS that are sharing information. This is called SHARED STORAGE. You have a G-Speed eS. You aint' gonna do nothing with the G-Speed eS, other than what you are doing right now. You don't need a "faster switch" as you will not get better speeds with regular ethernet unless you simply enable jumbo frames. A "faster switch" is a fantasy. If you want to spend money (which you don't) - you buy 10Gig cards for direct connection to get MUCH faster speeds, but you STILL don't have shared storage (sharing 3 systems with one drive array). If you want SHARED STORAGE, then follow my other post in this thread.

Jesse - this is what you want - you want to have a single drive array that all three of your systems can access, at fast speeds. Simple, right ? It costs money to do this.

Bob Zelin



Return to posts index


Jesse, DiJiFi
Re: Is link aggregation the solution?
on Jun 3, 2011 at 3:47:11 am

Thanks Bob.

You are right, although I have a very unique situation in which (according to the manufacturer of my capture machine and accompanying software) I must first capture these huge files to an internal drive of what I will call the 'ingest' station (of which there are two). The capture of 8mm/16mm film to digital files will apparently not work if I attempt to capture directly to shared storage or anything not directly connected to the motherboard of this ingest system (the reasoning I don't completely understand, but the manufacturer assures me it is so).

So, I at least have to have internal arrays for the initial capture. And since I only need to edit on one station, and in the past have tolerated the time it takes to transfer from one station to the other, I just went with the more affordable G-Speed eS array which achieves up to 500-600 MB/s on the local system. I figured if the network were fast enough it could talk to the other two machines at at least over 100 MB/s since they are RAID 0.

So this direct connect method is very interesting to me. Though I don't know if it can direct connect two stations to a third station, rather than one to one.



Return to posts index

Alex Gerulaitis
Re: Is link aggregation the solution?
on Jun 3, 2011 at 4:10:13 am

[Jesse, DiJiFi] "So this direct connect method is very interesting to me. Though I don't know if it can direct connect two stations to a third station, rather than one to one."

You can with either two 10GbE cards or one dual-port 10GbE card in the edit station.

This will also likely require a static IP setup in each card (no DHCP server in the segment) - which isn't anything to worry about - fairly straightforward.

Alex (DV411)


Return to posts index

Jesse, DiJiFi
Re: Is link aggregation the solution?
on Jun 3, 2011 at 2:24:18 pm

Oh cool! Okay, I will look into this.

Alex, do you think someone with pretty basic knowledge of networking could set this up without too many problems? I build our computers from parts, so I know a fair amount, but am no expert in networking.

Again, infinite thanks for your input on this. All of you.



Return to posts index

Alex Gerulaitis
Re: Is link aggregation the solution?
on Jun 3, 2011 at 10:35:13 pm

[Jesse, DiJiFi] "Alex, do you think someone with pretty basic knowledge of networking could set this up without too many problems?"

Yes I do think so. The only thing I am not sure about is how to prioritize the 10GbT link over the standard one when you do those file transfers - but with Steve on our side, pretty sure we can figure this out. :)

Alex (DV411)


Return to posts index

Steve Modica
Re: Is link aggregation the solution?
on Jun 3, 2011 at 11:15:05 pm

Routing is handled automatically depending on how you do the mounts.
The ports cannot be on the same subnet (or the kernel routing table will get confused).

When you mount the other machine, use "connect to server" and give the 10Gb address. That will automagically route over the 10Gb. Then you can drag the mount down to the tool bar to save it.

Steve Modica
CTO, Small Tree Communications


Return to posts index

Alex Gerulaitis
Re: Is link aggregation the solution?
on Jun 3, 2011 at 11:24:39 pm

These sound like OSX instructions - I think they can be adapted to Windows. Thanks Steve!

Alex (DV411)


Return to posts index

Steve Modica
Re: Is link aggregation the solution?
on Jun 3, 2011 at 11:33:01 pm

Right! It's the same deal tho. You connect to the share using the 10Gb IP address. The routing issues probably apply. If you have two ports on the same subnet, you are essentially lying to the OS. Since they are on the same subnet, the OS *thinks* that both ports can hear the same packets, but in real life, they can't. So it will send packets out one port for *both* ip addresses. (old problem)

Steve Modica
CTO, Small Tree Communications


Return to posts index

Jesse, DiJiFi
Re: Is link aggregation the solution?
on Jun 4, 2011 at 4:13:26 pm

Okay, at some point this summer I am giving this a try. Thank you again for all the helpful information. I really appreciate it guys.



Return to posts index

<< PREVIOUS   •   VIEW ALL   •   PRINT   •   NEXT >>
© 2019 CreativeCOW.net All Rights Reserved
[TOP]