Bru PE Crash on verify
I am getting very annoyed with the last Archive that I have now run 4 times!
9.75TB, and using LTO 6 tapes, all new. Takes around 20 hours 30 minutes to run the archive before verify.
Archive runs through fine, but when the verify starts, it will randomly crash with the following error:
2019-09-25 13:58:50: TapeControl.Ready: Exit
2019-09-25 13:58:50: TapeControl.Seek: Enter
2019-09-25 13:58:50: TapeControl.Seek: ntape0, 0
2019-09-25 13:58:50: TapeControl.Seek: ntape0 returned 0
2019-09-25 13:58:50: TapeControl.Seek: Exit
2019-09-25 13:58:50: TapeControl.Tell: ntape0
2019-09-25 13:58:50: TapeControl.Tell: at block 0
2019-09-25 13:58:50: TapeControl.Tell: Exit
2019-09-25 13:58:50: ccBackupProgress.RunVerify - JobNum 99: The verify will run on ntape0
2019-09-25 13:58:50: ccBackupProgress.RunVerify - JobNum 99: Verify Command is export BRUTAB=/Library/Application\ Support/BRU\ PE/etc/brutab ; /usr/local/bin/bru -ivvvvjf ntape0; sleep 2
2019-09-25 13:58:54: ccBackupProgress.thProcessVerifyLines.Run - JobNum 99: ProcAll set
2019-09-25 14:39:46: App.UnhandledException: thread 887340592 - Error occurred.
2019-09-25 14:39:46: App.UnhandledException: An unexpected condition occurred.
2019-09-25 14:39:46: App.UnhandledException: >> RaiseExceptionClass
2019-09-25 14:39:46: App.UnhandledException: End error stack dump
I am really at my witts end with this and need to get this archive done and dusted.
OH! I have tried to run a manual verify on the archive which freezes up BruPE, even though it is still running???
Nothing hanging, stopped or crashed in the "Force Quit" menu either?!?!?
I can hear the tape drive is still reading, but Bru PE is totally frozen up....
No crash or errors in the logs.
Please help if you can. I unfortunately don't have a service contract with Bro anymore, so this is my only hope.
Thanks in advance.
I have had this same problem with Bru Server 2.0.4. so I don't know if this is your exact problem, but it sounds exactly the same. Bru Server would crash on verify almost every time if you did a verify immediately after the write. (The problem I gleaned from Tolis support was this was some bug in the 'database' portion of the program.) Good news is there is a work around that reliably works. Try it for Bru PE.
1) Uncheck "Veirfy" in the back up job.
2) Run the back up.
3) After completion, Quit Bru.
4) Reopen Bru.
5) Run a Verify - manually - it should be at the bottom of the Verify list. However double check the Archive ID#. I use the Data Tools & Library Manager & Tapes to cross reference it all. (Sorry I forget my sequence to locate that data.). It's not intuitive but all the data is there, you just have to cross reference.
** The key here is to quit out of Bru and re-start the program. This does something to avoid the problem.
I think the crash also loses the "archive" in the database - like it's never saved. So there is no simple way to verify. Maybe you can reload the tape, but I think its best simply to re-run the job as described above. I suggest you only back up a few files and test, so you don't waste 20 hours.... like I was doing for a few months.
Thanks for this, and I will give it a go if this archive fails again.
I have just updated the software now to the latest 220.127.116.11. I was using 18.104.22.168 before. It worked flawlessy, but this 1 single 9.75TB archive is giving me problems.
What I did try before was the manual verify, which also locked up BRU PE when it ran which was very odd.... After upgrading, I tried the verify again and it ran without any issues for about 30 minutes before I stopped it.
So I am now re running the whole archive again (LOL! Not in any rush. Just want it right!) and we see if it goes through.
Problem with the last few tries was that the archive did complete and save, but I didn't get the actual files that it gives you on completion because it crashed in the verify.
Even though the full archive is there and I can see and access everything, I still don't trust it until I get that much need "ARCHIVE COMPLETED SUCCESSFULLY" message.
That just me and how I work. With the data I'm archiving, I can't afforsd to take any shortcuts.
But if this one fails again, I will do as you suggested above. This will give me a successfull archive, then I can manuall verify.
Thanks again for the help.
Just wish somebody could speak Apple and tell us more about why this crashes....
So after 2 weeks of strugling with this issue, I have finally solved it!!!!!!!!!!!!
Long story short, LEAVE THE ARCHIVE BE! Let it run!
When I saw that there was an app crash in the log, I could hear that the actual verify was still running on the drive and also that the app had not actually crashed or frozen up!
I could still stop the archive by clicking the button.
So I basically left the archive to run, then the verify and left that to also run.
Low and behold it eventually finished and gave me the much needed ARCHIVE COMPLETE window!
I have had this on 4 archives now, all "crashing" but completing fully.
So if you encounter this error, just leave it and let it run. Might take a few extra hours to finish, but it eventually will without any errors.
I hope this can help someone out there! Drove me insane for the past 2 weeks!
Please check in the restore pane. Are the archives that have crashed available to be searched? Can you select a few files and restore from them? Please let use know the results.
Yes, I did check that and they are good. The problem occurred on the verify.
But as I mentioned, it doesn't actually look like a crash though as it was still running in the background.
But with all my archives I am VERY pedantic. I will not chance it if I don't get the "Completed Successfully" window.
So I delete that archive, change a few things then try it again until it works.
Just the way I work. Can't afford to chance a dodgy archive even if it looks good.
I am about to run a new archive with different files so I will see how that goes.
That's good. The old problem was Bru would complete the back up, then "crash" on verify. However the real impact was that the "archive" was never written to the Bru database. Maybe my issue was I never let the full Verify continue. (This problem was years old so I don't recall all my tests.) For you, as long as it is on Tape, Verified, and you can search and restore... awesome. Thanks for letting us know.
Now I need to find an LTO specialist group. I have LTO8 and would love to get the write speeds up closer to 300MBs. Currently I am getting 156MBs. (13.289 TB are written in about 24.8 Hours.) Verify is at about 300MBs (the same 13.389TB is verified in 12.9 Hours). (However this is not an accurate test of speed because I am not writing to Disk or network share, but at least I know the LTO8 can go that fast). If I could just get 225 MBs write speed..... The write job could be done in a little over 17 hours, then combine that with a verify, I could get a backup done in two human days instead of 3. (Run backup 12 Hours + 5, quit, Manual Verify 12 Hours.)
I think the initial issue with the verify "crashing" or "freezing" is maybe because of the type of archive.
I was archiving 13 years of audio backups totalling 9.7TB and some of the files are very small, some less than 1MB and there was millions and millions of small files. So I am just assuming that the actual graphical interface or even apple itself got a bit overwhelmed trying to keep up with the counting...?
All I can really think of.
As for your other speed issue, can't help you there, but I am assuming the drives you are pulling the info from, or archiving from can deliver the content faster than needed...?
Had a similar issue in the past and was on a 1GB network.
Upgraded a few select machines to 10GB and I can now copy from our storage at 300-450mbps constant.
Helps with archiving also as it all runs much faster.
Quick question, are you perhaps using High Sierra for your archiving?
Reason I ask is that I was running Sierra for ages even when we moved to the 10GB network, and speeds were horrible! Max network was just over 100...
Upgraded to High Sierra and EVERYTHING came to life.
Just a thought.
My crashes were with larger files, (Video and it is all I am backing up with) so I am not convinced that is where their bug is. Some memory issue in Bru is my guess going back to old code. I am using the previous iteration of Bru (Bru Server 2.0.5). Maybe its two different errors, maybe not.
Veering into an LTO and software rant.....
Regardless, I am not here to solve Bru's Issues, as they have their own timeline of things and seem content with how it all works. That might sound like a knock against Bru, I guess it kind of is but its also kind of isn't. It is more of an annoyance and combined with the fact that there simply is not a more reliable, affordable, trustworthy LTO program out there. As an engineer friend of mine says and Tim at Tolis shares the sentiment "Its not the backup that is important, but the restore." And Bru has never let me down. What I like about Bru, from my understanding of all their white papers, is basically Bru is writing a checksum at every block - making it much more reliable than LTFS. Now is this critical? Bru over engineered? I really don't know, but I also don't want to find out in 15 years that I have a problem. This is an archive. With only two tape copies in existence. I need to be able to restore 15 years from now reliably.
Another thing about Bru that I like is that Bru is easily searchable to restore individual files. My archiving needs are simpler than most big companies, I imagine, but I think they are similar to creative project based needs like people at creative cow encounter. ( I don't do incremental updates so tracking tapes are easier in my workflow.) I perform an "archive" when a folder on my NAS fills a certain size, 12TBs for LTO 8, then make a Bru Archive (names Pass1) to LTO 8, run it, verify it, then duplicate the "archive" in Bru (re-name that one Pass2) and run a second archive. Then delete the original media on the NAS. LTO tapes are already barcoded uniquely, but I also label each tape box as well with Tape's barcode number, archive name, date, the other tape numbers in the archive, and the Bru Hex name. This way if a tape is misfiled, I can figure out what archive it belongs to. I then put multiple tapes of an Archive into one plastic box ( cheap plastic ammo cases that hold about 13 tapes - silly I know but figure they might be water tight enough in case of "Sharknado"). If the Ammo case has room, I put multiple archives in the case. (labeling the outside of the ammo case as well so I can quickly locate an archive.) Bottom line, my archive procedure seems to scale fine for now. I have about 440 Tapes done over 4 years (220 are the 2nd copies and stored elsewhere) and they sit on 3 shelves. I imaging when I get to over 1000 tapes per copy, this will become another kind of scaling problem.
Anyway, I have explored YoYatta and P5, and both are better and worse than Bru. Yoyotta seems to copy near 300MBs, however its doing LTFS (non starter) and its database interface (Tape tracking) is not at all clear to me. P5 is a real alternative, but basically is going to cost me 10K to get it to do what I want. Pros: P5 is very well supported, updated to work with latter macOS, widely used, and I think is doing block checksums like Bru. Also its interface lets you just grow an Archive so you don't have to track "completed' archives. Just search for a file, locate the tape, and restore from that one tape. No need to locate and load all the tapes in that archive. BEST PRO: You can make 2 tapes that are clones of each other at the same time meaning the tapes are completely interchangeable. No tracking of Pass1 and Pass2 - just put one of the two tapes in a restore.. and poof it works. (FYI, I haven't experienced if this actually works, only their videos says it works. Also I don't know if P5 will verify both tapes at the same time. ) Cons: That powerful interface is also not intuitive, but this can be learned. The software is more expensive $4600 (Library version) versus $600. And to get the clone feature, I need to buy an extra LTO 8 drive ~ $5500. Oh frustratingly, P5 only writes at 150MBs as well. *** Note this is not due to my hardware specifically because in my testing of YoYatta on this exact same hardware, it writes at ~300MBs. I think its an issue of writing LTFS that is not doing checksums and Bru/P5 doing them. Or it could be both P5 and Bru were written a long time ago and YoYatta much more recently. Hence why I am seeking a full LTO 8 forum.
Bottom line, there is no happy LTO answer. I just want Bru to be a little less orphaned (easy and clear support) and clear and reproducible way to get Bru to write quicker. But Bru is affordable and works.
Answer to your previous questions:
1) 10GB network. With Aja, I can get reliably 350MBs reads across the network.
2) I am still using El Cap. I will at some point try High Sierra and see. (Whole bunch of Install issues with BRU Server and High Sierra, but lets let that sleeping dog lie.
3) What file protocol were you using on the Mac to connect to shares, AFP? SMB? CIFS?
Well OK then... Looks like you also take more than needed precautions when it comes to archiving... 😂
I perhaps have too much faith in LTO and only have a single copy, almost 300 tapes, all full and locked up in a big safe. Yea, I know, but let's not go down that road.............
As for High Sierra, I also had serious issues when it first came out, but after all the fixes and new BRU PE versions, it is very stable.
I would suggest grabbing a spare drive, slap High Sierra on and the latest BRU PE and give it some test runs. This is what I did earlier this year and have had no issues since. (Touch Wood!) Well apart from what happened earlier in this thread...
As for network, we use a mixed environment at the office, with Mac, PC and Linux boxes so I use SMB and CIFS (for Linux) which is more friendly between all the machines.
Being in Production, we use external drives on shoots. These and ALL flash drives get formatted to ExFAT for compatibility... Slightly slower, but worth it for the compatability side.
My main machine is a PC, which I'm using now, but all copies and archives run through a Mac Pro 2012 sitting next to me.
I'm pretty old school and prefer a simpler network and setup.
Being the only IT/Engineer guy, it just makes it easier and faster to sort out issues when something goes wrong.