FORUMS: list search recent posts

Update on our SAN - looks like os X.3.9 on G4's are no go.

COW Forums : SAN - Storage Area Networks

<< PREVIOUS   •   VIEW ALL   •   PRINT   •   NEXT >>
Francois Stark
Update on our SAN - looks like os X.3.9 on G4's are no go.
on May 28, 2005 at 4:39:53 pm

For those of you following my SAN installation testing from the XSAN and OS X forums:

Yesterday we figured out that the secondary controller in the ADTX was faulty.

Last night I realised that when we had bypassed the switch and connected straight to port 3 to test it(on controller 2), we only had a LU 0 mapped on controller 1. Controller 2's LU's started at 4. So the macs could not see any LU 0, and thus would not see ANY LU.

So today I remapped the LU's on the secondary controller to also include LU 0 and suddenly it started working. I'm still not sure if the macs need to see LU 0 on every port, or only every controller. So I mapped the LU's to have LU 0 on every port, in such a way that every distinct LU is only seen on one port.

Fine. Both G5's worked (Os X.3.8). I reformatted the SANMP volumes and stripe sets and did some speed testing on the G5's: getting about 200 MB/s write nd 150MB/s read. OK, I guess. On two 5 drive raid 0 arrays. Getting 3 streams SD on one machine and 4 streams on the other G5. If I add an 8th stream everything starts stuttering.

Checking the data streams on the qlogic 5200 fibre switch, I could see that the dual link machines (G5's) are splitting their data streams over both links. Good.

BUT both the G4's (os X.3.9) would still not mount most LU's. I'll post some pictures when back at work. I could see all 8 LU's in the system profiler, but only 2 LU's would show up in the disk utility.

I tried Tiger X.4.1 on the G4 - and it also saw all 8 LU's in system profile, but it saw three other LU's in Disk Util. Then I tried a raw os X.3 installation, and saw all the LU's in the Disk util! That's when my time ran out.

I'll update you when I'm back at the office - for now the plan is to go to X.3.5 on the G4 and see if Disk util and SANMP works ok.

Regards
Francois


Return to posts index

Francois Stark
Proof
on May 29, 2005 at 2:18:53 pm

Our G4 was running on os X.3.9:



All 8 LU's in the system profile but only 2 in the disk utility.


And here I tried it in Tiger X.4.1:



In Tiger also all 8 LU's in the system profile, but only 2 in disk util.


So here I did a raw installation of X.3:



And suddenly I could see all the LU's in the disk util.


Finaly I went up to os X.3.4 and this is what I saw:



All the LU's in disk util! I installed sanMP and could mount and read and write to the shared fibre arrays.

Conclusion: Os X.3.9 and X.4.1 are not able to see LU's properly on a ADTX array on a G4 mirrordoor. Os X.3.8 works in G5 machines.

At the moment I'm getting 6 streams SD from the 10 drive raid 0 array (two 5 drive raid 0's striped by apple disk util). I can play 6 streams on one G5 in FCP real time, or two streams each on three machines (two G5's and one G4). This is quite reliable.

However I have had three crashes on our G5 while capturing SD PAL to the SANMP fibre raid. FCP stops with disk IO error, and then finder also does not see the SanMP disk anymore. Everything freezes and I have to do a hard reboot. SanMP recovers after it's 15 second delay because I did not unmount properly.

So there is still a reliability problem when capturing - more tests to follow.

Regards
Francois





Return to posts index

Bart Harrison
Re: Proof
on May 29, 2005 at 3:07:41 pm

Hi Francois.

I wish I had time for a quick trip to SA. I'd love to get to the bottom of this problem you're having. I've got many ADTX/SanMP installations all over North America some running perfectly for more than eighteen months. For me it's been the real winning combination. Other than the fact that I refuse to use the Apple/LSI HBA's I don't see that you're doing anything different. Why don't you see if someone can loan you an Astera Rhino 3000 or an Atto Celerity 22XH to test out. FYI: the Astera card uses it's own proprietary set of drivers while the Atto card uses parts of the Apple FCAL library. Either way it'd be a good test. By-the-way which version of the Apple cards do you have ? Rev 2 has a /B in the part number and Rev 4 has a /D in the part number. The /D cards were a nightmare of unexplainable problems for me, LUNS showing up somtimes and not showing up at other times (sound familier). I switched to the Astera cards and now everything works.

Bart

- - - - - - - - - - - - -
Bart Harrison
MPA - The HD Suite

America's VAR
TurnKey Editing Systems, Storage Area Networks
HD Consulting, Production & Post, Exhibition & Distribution
http://www.hdsuite.com
954-894-1221


Return to posts index


Francois Stark
Re: Proof
on May 29, 2005 at 5:13:00 pm

Hi Bart

Thanks for the encouragement - I seem to be sorting the problems out one by one. Sad about having to go back to an earlier version of os X, but someone said apple changed fibre port discovery from os X.3.7. So it seems I can go back to X.3.6.

I don't know what's going to happen with tiger - I know SanMP is not ready for tiger yet, but that cannot be far off. And if this problem is not solved, I'll be stuck in Panther on the G4's. And FCP 5 does not run on X.3.6; it needs at least X.3.9...

Q: Did you notice any difference between G4 and G5 machines on the Astera cards? What version of os X are you running?

I'm at home now, so I'm not sure about the rev of apple fibre card. Will check tomorrow.

On the G5 that froze during capture: I'm running the Apple fibre card (LSI) in the top slot (PCI-X 133MHz by itself), and the decklink extreme with the Atto UL4D in the second and third slots (sharing PCI-X 100MHz). I copied some files off the Huge 320S array to the fibre array and got 135MByte/s read from Huge, and 135MB/s write to the Fibre array. No crashing.

The only time it froze was during capture. I'm going to try the decklink card in the top slot (to isolate it from other cards), and the LU4D and Fibre cards in the second and third slots.

Hopefully, if I can get this fibre array working very reliably, I can sell the UL4D and Huge array - which means I will only have the fibre card and Decklink installed.

struggling on...
Francois


Return to posts index

Bart Harrison
Re: Proof
on May 29, 2005 at 6:05:59 pm

[Francois Stark] "Q: Did you notice any difference between G4 and G5 machines on the Astera cards? What version of os X are you running?"

Sorry, we retired/repurposed the G4's at all my SAN installation so far.

[Francois Stark] "Hopefully, if I can get this fibre array working very reliably, I can sell the UL4D and Huge array"

What I've done at a number of SAN installations is to set up one of the old G4's and the Huge array as a support server tied to all the editing/graphic systems via an inexpensive Gbit Ethernet switch. I set them up as an NFS volume which appears on each of the editing systems as an "internal" folder on their system drive. At one site this server is hosting 700MB of music (about six music libraries) plus all of their still graphics assets. The music and graphics play right off of the support server without consuming any of the valuable SAN resources.

Bart

P.S. When running a dual controller ADTX, a QLogic 5200 switch, and a dual channel Astera 3000 (with dual 2Gbit fibre) we usually see well over 300MB/sec read and write at every workstation. If you're only seeing 150MB/sec then something is definately wrong !

- - - - - - - - - - - - -
Bart Harrison
MPA - The HD Suite

America's VAR
TurnKey Editing Systems, Storage Area Networks
HD Consulting, Production & Post, Exhibition & Distribution
http://www.hdsuite.com
954-894-1221


Return to posts index

Francois Stark
Re: Proof
on May 29, 2005 at 6:21:07 pm

Hi Bart

I found where Jeff Bernstein talks about the change from os X.3.7:
------------------------------------------------------------------

Name: Jeff Bernstein
Date: May 27, 2005 at 04:44 gmt
Subject: Re: Not solved yet - seems G5 is better.

Okay, this is starting to sound a little familiar. I have a feeling you are running OS X 10.3.7 or greater. This could mean you are not running the latest firmware on the ADTX in which half the LUNs would not mount due to the way OS X changed the behavior in LUN discovery.

Another thing to note is that there is a nasty bug on the ADTX when used with SANmp. If your ADTX needs to rebuild, you not be able to recover the data. We just discovered this the hard way. ADTX is working on it and I welcome you to apply some pressure for I would hate for this to happen to you.

Let us know how it goes.

Can you tell us which OS and which rev of the firmware on the ADTX?

Jeff

Jeff Bernstein

Digital Desktop Consulting
Apple Pro Video VAR
XSAN Certified

323-653-7611


[Bart Harrison] "set them up as an NFS volume"

Why NFS? Maybe NTFS? for PC graphics machines? (we all make typo's sometimes...) or is it really NFS?

[Bart Harrison] "we usually see well over 300MB/sec read and write at every workstation" Today I got about 180MB/s on the decklink disk speed meter. Remember, only 10 drives (two 5 drive raid 0's striped using apple disk util), not a fully populated ADTX yet.

On your ADTX settings: What is your setting for cache pre-read streams? Mine is set to 32 at the moment, default is 8. If I remember right it can go up to 128? I'm not interested in HD streams (yet), I need as many SD (20MB/s)uncompressed streams as possible.

Thanks
Francois




Return to posts index


Bart Harrison
Re: Proof
on May 29, 2005 at 7:16:48 pm

[Francois Stark] Jeff: "half the LUNs would not mount due to the way OS X changed the behavior in LUN discovery."

With the Astera 3000 it's the card/driver that does the LUN discovery not OS X. When using the Apple/LSI cards you're subject to the whims of Apple who as you know are continuously changing things (driver revs, OS revs, hardware revs). This is why I've always preferred the Astera card. Unfortunately I'm not sure if they're going to be making them any more.

[Francois Stark] "Why NFS? Maybe NTFS? for PC graphics machines? (we all make typo's sometimes...) or is it really NFS?"

Yes, NFS (the Unix Network File System). Because of it's Unix core NFS networking is already built into OS X. Of course if their are a lot of XP systems I'd go with SMB, but in a primarily OS X or Unix/Linux environment nothing beats NFS for convenience. Please be mindful this has nothing to do with the underlying disk format. You can share a Mac OS Extended, FAT 32, or NTFS volume via any networking topology you have available. If you excuse the expression, "they are apples and oranges".

[Francois Stark] "Remember, only 10 drives (two 5 drive raid 0's striped using apple disk util), not a fully populated ADTX yet."

I have never recommended any 10 drive array in a shared-storage SAN environment. Every SAN installation I have is set up with 7 drives on one controller (Raid 5) and 7 drives on the other. I create 12 LUNS on each side and stripe them together to create 12 OS X volumes. The 15th drive is used as a hot spare.

[Francois Stark] "On your ADTX settings: What is your setting for cache pre-read streams? Mine is set to 32 at the moment, default is 8. If I remember right it can go up to 128?"

I set the ADTX with Cache' Mirroring off, Write Cache Enabled, and the Look Ahead Table to 128 Entries. I arrived at these settings through extensive performance testing and consultation with ADTX and Astera engineeing. (Of course, your're milage may vary.)

Hope this helps !

Bart

- - - - - - - - - - - - -
Bart Harrison
MPA - The HD Suite

America's VAR
TurnKey Editing Systems, Storage Area Networks
HD Consulting, Production & Post, Exhibition & Distribution
http://www.hdsuite.com
954-894-1221


Return to posts index

Francois Stark
Re: Proof
on May 30, 2005 at 4:53:00 am

Hi Bart
thanks for this discussion.

[Bart Harrison] "You can share a Mac OS Extended, FAT 32, or NTFS volume via any networking topology you have available."

When sharing over SMB to PC's we can only access the user;desktop etc. We can not see any non-boot drives. So we'd have to get os X server software.

[Bart Harrison] "7 drives"

Working towards that.

[Bart Harrison] "Look Ahead Table to 128 Entries"

I'll try that today.

Regards
Francois



Return to posts index

Bart Harrison
Re: Proof
on May 30, 2005 at 3:33:15 pm

[Francois Stark] "When sharing over SMB to PC's we can only access the user;desktop etc. We can not see any non-boot drives. So we'd have to get os X server software."

I generally use Thursby DAVE if there's a need to share other OS X drives/folders via SMB (as opposed to SAMBA). Whenever posible, though, I prefer using NFS.

Bart

- - - - - - - - - - - - -
Bart Harrison
MPA - The HD Suite

America's VAR
TurnKey Editing Systems, Storage Area Networks
HD Consulting, Production & Post, Exhibition & Distribution
http://www.hdsuite.com
954-894-1221


Return to posts index


Bart Harrison
Re: Proof
on May 29, 2005 at 6:11:36 pm

[Francois Stark] "What version of os X are you running?"

I have versons of OSX ranging from 10.3.4 to 10.3.7. I'll be upgrading the 10.3.4 installation (my oldest) to FCP 5 next week. We might try 10.4.1 just as a test although we'll probably go with 10.3.9 for the time being. I'll keep you posted.

Bart

- - - - - - - - - - - - -
Bart Harrison
MPA - The HD Suite

America's VAR
TurnKey Editing Systems, Storage Area Networks
HD Consulting, Production & Post, Exhibition & Distribution
http://www.hdsuite.com
954-894-1221


Return to posts index

Francois Stark
Proof (fixed post)
on May 29, 2005 at 4:42:20 pm

Here I re-did the post because the previous links worked for our internal LAN only:

Our G4 was running on os X.3.9:



All 8 LU's in the system profile but only 2 in the disk utility.


And here I tried it in Tiger X.4.1:



In Tiger also all 8 LU's in the system profile, but only 2 in disk util.


So here I did a raw installation of X.3:



And suddenly I could see all the LU's in the disk util.


Finaly I went up to os X.3.4 and this is what I saw:



All the LU's in disk util! I installed sanMP and could mount and read and write to the shared fibre arrays.

Conclusion: Os X.3.9 and X.4.1 are not able to see LU's properly on a ADTX array on a G4 mirrordoor. Os X.3.8 works in G5 machines.

At the moment I'm getting 6 streams SD from the 10 drive raid 0 array (two 5 drive raid 0's striped by apple disk util). I can play 6 streams on one G5 in FCP real time, or two streams each on three machines (two G5's and one G4). This is quite reliable - I looped these six playing streams for about 1 hour.

However I have had three crashes on our G5 while capturing SD PAL to the SANMP fibre raid. FCP stops with disk IO error, and then finder also does not see the SanMP disk anymore. Everything freezes and I have to do a hard reboot. SanMP recovers after it's 15 second delay because I did not unmount properly.

So there is still a reliability problem when capturing - more tests to follow.

Regards
Francois




Return to posts index

chrispy
Re: Proof (fixed post)
on May 30, 2005 at 2:20:19 pm

Hey Bart,

Since Astera is no longer making the Rhino FC cards, which alternative will you be looking at ?

Have you tried the ATTO Celerity series ?

-chrispy


Return to posts index


Bart Harrison
Re: Proof (fixed post)
on May 30, 2005 at 3:17:25 pm

[chrispy] "Have you tried the ATTO Celerity series ?"

It's my first choice. The Celerity writes faster to stripe sets than the Astera (ala dual-link RGB HD). My concern is that Atto is using parts of the Apple API meaning it might work fine today and then stop working all-together with a point upgrade of the OS.

Bart

- - - - - - - - - - - - -
Bart Harrison
MPA - The HD Suite

America's VAR
TurnKey Editing Systems, Storage Area Networks
HD Consulting, Production & Post, Exhibition & Distribution
http://www.hdsuite.com
954-894-1221


Return to posts index

chrispy
Re: Proof (fixed post)
on May 30, 2005 at 3:31:10 pm

Bart,

I think I may have asked you this before but anyhow...in your SAN installs, do you usually use both FC ports on the HBAs ? Does it speed up the throughput if you use both FC ports ?

And if the two FC ports goes to into a switch and the switch in turns goes into both the ADTX RAID controllers (4 FC cables in total, 2 per controllers), how you do manage the LUNs so that it does not duplicate or get confused ? This is the part that I'm still trying to figure out.

-chrispy


Return to posts index

Bart Harrison
Re: Proof (fixed post)
on May 30, 2005 at 3:41:23 pm

[chrispy] "how you do manage the LUNs so that it does not duplicate or get confused ?"

What I like to do is deploy two FC switches. If that's not in the cards then you need to set up appropriate LUN masking. The unique WWPN of each port on every HBA can be seen and masked by the ADTX GUI utility. It takes some organizational planning but works fine once you get it all setup.

Bart

- - - - - - - - - - - - -
Bart Harrison
MPA - The HD Suite

America's VAR
TurnKey Editing Systems, Storage Area Networks
HD Consulting, Production & Post, Exhibition & Distribution
http://www.hdsuite.com
954-894-1221


Return to posts index


Francois Stark
Update - excuse the ramblings of a madman (long post)
on May 31, 2005 at 1:12:43 pm

Hi Bart and chrispy

I'm just gonna ramble on about my experience because I find that typing it up makes you think deeper about the problem...

Well, last night I thought I had messed around enough with the Raid 0 test LU's. I deleted all the LU's and restarted the ADTX array. Created two new 5 drive arrays, one on each controller. Created 6 LU's on each array - all raid 5.

While the first LU's were being initialised ( the others were"waiting to be initialised") I could access all the LU's from the G5's! I tried creating stripe sets in the disk utils, but that did not work properly, so I went home to let the raid 5 LU's finish initialising.

This morning I found the LU's had finished initialising. I started up in 10.4.1 Tiger on the G5, but Tiger seems to exhibit similar problems on the G5 as os X.3.9 did on the G4: All the LU's are visible in the system profiler, but only one port on the ADTX array's LU's are visible in the disk utility. So I went back to os X.3.8 on the G5 and could see all the LU's in the disk util.

Bad news for Tiger migration... The Astera cards might work but the apple fibre card is really not working well yet in Tiger.

Then it took me about 30 minutes just to figure out which LU's must be striped together, because most of them are the same size! I had to format them as seperate drives, copy stuff over and see where the data goes using the fibre switch software - one by one. Eventually I figured out how to identify which array and which LU is which disk is in the disk utility.

Then I striped the corresponding LU's from each array together and started doing some tests (journalling off - yesterday I had it on). Yesterday I could copy stuff from one stripe array to another at 55MB/s (raid 0 stripes), today it ran at 50MB/s (raid 5 stripes). OK, I guess. Raid 5 makes it about 10% slower.

Setting up the LU mapping in the ADTX: I set it up so that
1) every port has one LU 0 - so the G4's can see all the LU's
2) every real LU only shows up on one ADTX port
3) every real LU only shows up on the port of it's own controller
4) I have 12 LU's it total, 6 on each array
The LU mapping looks like this:
Port 1: 0 - 1 - 2 - - - - - - -
Port 2: - 0 - 1 - 2 - - - - - -
Port 3: - - - - - - 0 - 1 - 2 -
Port 4: - - - - - - - 0 - 1 - 2
(PS Only people who have a ADTX array will recognize this LU mapping)

Then I set up FCP to capture to the one Sanmp disk (stripe raid 50), SD PAL uncompressed using the Decklink Xtreme on a G5. Checking on the fibre switch performance graphs, I could see the G5 is splitting the data over both fibre links, 10MByte/s each, and the ADTX is also getting the data on two ports at 10Mbyte/s each, one on each controller. I captured about 55 minutes in 10 minute clips without problems.

Then I started some playback tests. Yesterday I could play back 8 streams uncompressed on the raid 0 stripe, today I could to 6. I continued capturing, while playing back 4 streams on the next G5 machine.

Then something went wrong. I don't know what. The capturing started dropping frames - abort capture. At random times, between 1 min and 5 minutes from start of capture. The SanMP capture disk, mounted as read-only on the second G5, started acting strangely. It would show the folders on the drive, without icons. As I click on the folder name, it would vanish. Freaky. I restarted the read-only machine and it restored disk access to the sanMP disk.

I restarted the capture machine three times. Continued dropping frames on capture to the SAN disk. Trashed preferences. I even changed capture drive to the boot drive and it continued dropping frames as long as I had the sanMP disk mounted. Only after another reboot and by not mounting the sanMP disk at all, would the system capture to the boot drive wihtout dropping frames.

I'm sitting here on the read-only machine and it can not even play back four streams SD. It starts off fine, but after about 2 minutes it drops frames. Badly. And the capturer machine does not even have the SanMP disks mounted. No other machine is mounted. Yesterda it also dropped frames on capture, but I though that could be related to the journalling that I left on when creating the stripe sets. It seems I was wrong.

On the playback machine I playing four of the captured SD files (apple unc422 8 bit) in quicktime at the same time - looped. Plays at 80MByte/s. Then after a random time it drops fames and the datarate momentary drops, whereafter it shoots up to 130MByte/s as quicktime tries to catch up. It stays in this state indefinitely, reading from the array at 130MByte/s while continuing to drop frames in all 4 windows. When I pause one window, the others start playing normally after about 10 seconds - datarate drope to 60MB/s. Bottom line, it can not handle 4 streams continously, even though it has much higher datarate capability.

I checked the drive lights on the front of the ADTX to look for "sticky" lights indicating a drive with many retries slowing the ADTX down. Nothing strange - all lights blinking similarly.

I wonder if this has to do with the SanMP automatic sync. I set all my drives to ON for read and write, with a 360 second volume sync interval.

So I'm stuck again...

Regards
Francois





Return to posts index

Francois Stark
Update 2
on May 31, 2005 at 1:18:06 pm

I just realised the playback problems happen with even three streams. While I was typing, the fourth stream was paused. It was playing three streams reading 60MB/s for a random time, when it would drop frames, data rate drops and then shoots up to 120MB/s where it stays for about three minutes. Then they just continue like before.

It's like something gets stuck on the array, and causes this dropped frame on playback. It could be the same thing that is causing the problem for capturing from next door. But then, why did the first 55 minutes capture OK this morning?

Regards
Francois


Return to posts index

Francois Stark
Solved
on May 31, 2005 at 5:50:21 pm

I think I found the last remaing problem with the SAN:

I tested how long it takes between the dropped frames and it was exactly 6 minutes - the time I set up for SanMP's volume sync.

So I disabled SANMP's volume automatic sync, and the problem went away. Since then, I've captured long pieces without any problems.

This morning's first 55 minutes must have captured fine because the SAN was empty. When it's empty, the volume sync must be fast enough not to cause dropped frames.

So now we will start using the SAN properly - I hope.

Regards
Francois


Return to posts index

<< PREVIOUS   •   VIEW ALL   •   PRINT   •   NEXT >>
© 2017 CreativeCOW.net All Rights Reserved
[TOP]