FORUMS: list search recent posts

LTO file size mismatch in Finder

COW Forums : Archiving and Back-Up

<< PREVIOUS   •   VIEW ALL   •   PRINT   •   NEXT >>
Joe Procopio
LTO file size mismatch in Finder
on Dec 9, 2014 at 8:30:26 pm

when I copy 31GB to LTO-5 with LTFS Formatting, and get info on the tape after it's copied, it shows that only 21GB are used on the tape...

The folder size matches...the available space in the finder window looks to be accurate....

so if I copy 1TB to the drive, according to the example stated above, it will show 1/3 used on the tape when I get info on the tape...

anyway to get an accurate reading when getting info? or do i suck it up and do the math so I don't stuff the tape?

Mac OSX 10.7.5, MacPro, SAS connected Quantum LTO-6 external drive


any words of wisdom?

Joe Procopio
Broadway Video, NYC
AVID/Premiere/FCP editor/engineer


Return to posts index

Joe Procopio
Re: LTO file size mismatch in Finder
on Dec 9, 2014 at 9:02:14 pm

When I look at an LTO-6 tape, the finder show less than what should be available....

if my calculations are correct...I have copied 1774 GB to an LTO-6 tape (that totals 2560 GB), and should have 786 GB left...but my available space in the finder window shows 678 GB left. did I forget to carry a 1, or dot an i somewhere?

Joe Procopio
Broadway Video, NYC
AVID/Premiere/FCP editor/engineer


Return to posts index

Tom Goldberg
Re: LTO file size mismatch in Finder
on Dec 11, 2014 at 7:45:52 pm
Last Edited By Tom Goldberg on Dec 12, 2014 at 2:59:20 am

There are many reasons why space remaining on LTO tapes may not add up the way you think:

The reason for the biggest amount of lost space I've seen is a result of the fact that LTO drives always do read-verified-writes at the hardware level. This means that, if a block of data did not get written properly, the tape drive will mark it as bad and rewrite the data. As tapes get old and as heads get worn or dirty, this happens more and more to the point where marked bad blocks can take up more room than the data itself. This will even happen to some extent on new drives and new tapes (more with some brands than others).

LTO tape drives all come with a built-in hardware lossless compression engine and will save an unpredictable amount of space depending upon the data. Since your case shows lost space, this is not likely to have been a big factor but will always impact to some extent the exact amount of tape space taken by any data set.

LTFS partitions each tape to keep a space for its file system (where it puts the index files). All LTO tapes are written in a serpentine manner, and LTO-5 tapes have a total of 80 tracks -- 40 going from the beginning to the end and another 40 going back from the end to the beginning. The index partition is one track down plus one track back, so it is 2/80ths of the tape's capacity or 1500GB/40 = 37.5 GB. The partition also uses another track down and back as a “guard band,” leaving 1500 - (37.5 *2) = 1425 GB useable space remaining. With LTO-6, it is 136 tracks with 2500GB capacity which works out to about the same overhead leaving 2425 GB capacity.

The size of the index will vary with the number of files but it will always be relatively small. The index is written to both the index partition and the data partition for redundancy so will take up some space. Note that each time you add to an LTFS volume, it creates a new index file and the old index files also all still reside on tape, thus many write sessions will use more space for these indices than a tape filled in one session.

Also, be aware that Macs do not represent data using the same numbers as Windows or Linux (and LTFS). A gigabyte, or GB, is now defined as 1,000 bytes cubed, or 1,000,000,000 bytes. A gibibyte, or GiB, is equal to 1024 bytes cubed, or 1,073,741,824 bytes - Mac OSX is using the decimal representation while the rest of the computing world is still using the base2 version.

LTO is very robust - if you archived without errors, the chances are extremely high (one in 10^17th) that your data is all there. The only way to know for sure however is to restore every file and compare checksums.

Tom Goldberg
TGCS
30201 Rainbow Hill Rd.
Evergreen, CO 80439
mailto:tomgoldberg@gmail.com
http://tomgoldberg.net




Return to posts index


Tim Jones
Re: LTO file size mismatch in Finder
on Dec 11, 2014 at 7:58:58 pm

[Tom Goldberg] "LTO is very robust - if you archived without errors, the chances are extremely high (one in 10^17th) that your data is all there. The only way to know for sure however is to restore every file and compare checksums.
"


This statement just makes me sad...

I invite you all to examine this white paper in relationship to Tom's error rate figure and the need to restore to verify:

Reliable Verification

Tim
--
Tim Jones
CTO - TOLIS Group, Inc.
http://www.tolisgroup.com
BRU ... because it's the RESTORE that matters!


Return to posts index

Tom Goldberg
Re: LTO file size mismatch in Finder
on Dec 12, 2014 at 12:13:05 am
Last Edited By Tom Goldberg on Dec 12, 2014 at 3:00:00 am

Tim,

I tried to provide some good general information to the forum about LTO. I'm dismayed that you focus in on one statement I made in order to reinforce your parochial outdated value proposition.

Yes, BRU has the biggest checksums in use on tape systems today... but no one ever mentions the overhead for that - LTFS can get 2.4TB on an LTO-6; how much can BRU get? How come no one else uses such large checksums?

The checksumming system BRU uses was developed to solve the problems with early tape systems which were unreliable and even with read-after-write and hardware ECC, still did indeed have unrecoverable errors. Issues like tape contact, edge pack, significant media inconsistency, and a range of mechanical flaws in early tape solutions (even as recently as LTO-1 and 2) were all good reasons to need that.

But LTO is now really reliable enough that you don't need to verify tapes - LTO errors really are about 100 times less likely than what you get from an enterprise HDD. As your paper notes, garbage in, garbage out, and the only real checksum to compare to would be one generated on the source media, not running checksums on source data residing on any hard disk but the one that was in the camera. How many users run verification on data copied from one HDD to another (again, 100x more likely to have a bit error than copying to LTO tape)?

In my discussions with users on film sets filling multiple LTO tapes every day, they still run checksum verifications religiously but most will admit that if the tape was made without errors, they never see checksum discrepancies.

Tom Goldberg
TGCS
30201 Rainbow Hill Rd.
Evergreen, CO 80439
mailto:tomgoldberg@gmail.com
http://tomgoldberg.net




Return to posts index

Tim Jones
Re: LTO file size mismatch in Finder
on Dec 12, 2014 at 4:11:49 am
Last Edited By Tim Jones on Dec 12, 2014 at 4:23:46 am

Simply put, your statement that "the only way to know for sure" is completely restore and then reverify all of the checksums for each file may be fine if you're dealing with 500GB, but try suggesting that to someone with 57 TB of data.

That's the part that makes me so concerned with the statement that you made. And calling my concern for our users' data parochial and outdated was a bit flippant in light of the real volume and time involved in production backup and archival. My perspective is neither parochial nor outdated. On the other hand, the fact that you believe that no verification is required because people that you have spoken with haven't run into problems, is definitely not a safe position.

Regardless of how new the technology, the device is still mechanical, and mechanical items fail. How many tape devices did you work on during design phases? How many software implementations did you work through from initial spec to finished product firmware to get the drive working just right? How many tape formats did you actually design and deliver? Me? I've done all of these things from the original QIC-27 drives to the later side of the LTO designs including numerous QIC, DAT, DLT, AIT, and VXA devices.

And don't get me started about the number of your alma mater's product's tapes that we recovered for customers that had no verification when they were made - there were just no errors reported when they made the backup, so it must have been good, right?

With LTO-6 tapes in the $65 per tape range and no need to worry about over-filling a tape - unlike tar and LTFS, BRU's 18% overhead is a small price to pay for backups that can be verified with no need to restore the data.

Tim
--
Tim Jones
CTO - TOLIS Group, Inc.
http://www.tolisgroup.com
BRU ... because it's the RESTORE that matters!


Return to posts index


Tom Goldberg
Re: LTO file size mismatch in Finder
on Dec 14, 2014 at 5:38:13 pm

Tim,

I stand by my statement that the only way to know for sure is a full restore and compare. I did not mean to imply that has to be done manually, automated means such as your Autoscan certainly meets that criteria. And I don't dismiss checksum comparison as a powerful way to verify without a full restore - only that it should be done against the source data set. Copied data IS more likely to have bit errors going between hard drives than going onto tape.

I was being flippant, but not about user's concern for their data. I was making my comments about the need for the level of checksumming in BRU, the significant overhead required for that, and the fact that everyone else uses much more efficient checksums such as the popular MD-5. Of course you can't use MD-5s to reconstruct data the way BRU can, but my point is that the hardware has become so reliable you no longer need to. In saying you've recovered data made on other products that didn't use verification, you have in fact reinforced that point.

If the overhead for BRU checksums is 18%, that's not just overhead in tape cost, it's overhead in write times, bit level verify times, and overhead in just plain having to swap, handle and store more tapes.

Tom Goldberg
TGCS
30201 Rainbow Hill Rd.
Evergreen, CO 80439
mailto:tomgoldberg@gmail.com
http://tomgoldberg.net




Return to posts index

Tim Jones
Re: LTO file size mismatch in Finder
on Dec 14, 2014 at 6:40:33 pm

Apologies to readers as this gets a bit long winded.

[Tom Goldberg] "I stand by my statement that the only way to know for sure is a full restore and compare. I did not mean to imply that has to be done manually, automated means such as your Autoscan certainly meets that criteria. And I don't dismiss checksum comparison as a powerful way to verify without a full restore - only that it should be done against the source data set. Copied data IS more likely to have bit errors going between hard drives than going onto tape.
"

Tom, that's simply not valid and I suspect that you make that statement from never having tested BRU's resiliency in this situation. Because BRU's checksum is calculated as the data is read from the disk, it is as valid as the original data. If the original data is bad on the disk, it will be bad on the tape, but even a bit-by-bit comparison (which BRU also supports) or restoring that data to an alternate location and comparing the files will report the compared bad data as good - so your example is not a good one to promote or refute the reliability of any of these methods. And our AUTOSCAN pass is not comparing the data on the tape to the source data - it's rereading each block, recalculating the CRC and then comparing the calculated CRC to the CRC in the header of the block.

On the other hand, this does mean that BRU's checksum is a completely reliable mechanism for checking the validity of the data in the archive container. And, with BRU's CRC mechanism and our Any Time Verify, you can verify AND audit a tape 6 months from now or even 10 years from now with no knowledge of the original data source, filesystem or platform, unlike what you state above about needing the original source data set. Everyone else uses MD5 checksums because it's the only way for them to provide ANY sort of checksum; they can't modify the tar, MTF, or LTFS formats to add stream-based checksums because it would break the interchange capability.

Comparing our in-stream CRC algorithm against an external MD5 (or an even stronger SHA1) file-based checksum is not even close. BRU calculates the CRC in the stream, not as a separate process or pass. Additionally, we have shown that the BRU I/O engine running on an OS X system with a high-performance RAID source drive is still not topping out BRU's performance capability. We have sustained performance of over 3.2GB/sec using BRU to create archives of the data on the array (faster on other platforms where PCIe-3 is available). This is far faster than any modern tape drive can handle, so the overhead relating to speed is nonexistent using BRU's methodology.

Also, BRU has no issues archiving data sets with extremely deep folder structures, folders containing 100,000's of files, filenames with special characters or international languages, so the user does not need to modify their workflow to take those things into consideration. We have one customer that recently left another LTFS solution because rather than rework their international workflow to fit LTFS, they were using ZIP to create archive containers using generic names so that the data could be backed up to the LTFS format used by the other product. Then they realized that they couldn't restore a single file without first restoring the ZIP container. They bought a lot of BRU licenses.

Finally, the ~18% overhead that BRU generates is not simply the CRC (that's literally 4 bytes - 32bits per block), but also includes all of the other filesystem and low-level metadata that we save with relation to the file. Things that LTFS simply throws away.

Tim
--
Tim Jones
CTO - TOLIS Group, Inc.
http://www.tolisgroup.com
BRU ... because it's the RESTORE that matters!


Return to posts index

Tom Goldberg
Re: LTO file size mismatch in Finder
on Dec 17, 2014 at 5:07:22 pm
Last Edited By Tom Goldberg on Dec 17, 2014 at 5:13:35 pm

Well Tim, we can agree at least on one thing, that you are long winded ;-)

Actually, I don't take issue with any of the points you make in your post above.

I was in particular making the same point as you:
[Tim Jones] "If the original data is bad on the disk, it will be bad on the tape"

And then I was taking that point further - We saw many users doing extensive verification of copies of source data or even copies of copies of source data, and that is IMO a waste of time as the disk-to-disk copies are more likely to have errors than disk-to-tape.

I don't doubt that BRU checksums are better than SHA1 or MD-5, just that they are bigger and slower. I note that no one else seems to think it is worth the overhead. And by no one else, I include IBM, HP, Quantum, Oracle, Spectra, and so on.

Tom Goldberg
TGCS
30201 Rainbow Hill Rd.
Evergreen, CO 80439
mailto:tomgoldberg@gmail.com
http://tomgoldberg.net




Return to posts index


Tim Jones
Re: LTO file size mismatch in Finder
on Dec 17, 2014 at 6:51:07 pm

[Tom Goldberg] "I don't doubt that BRU checksums are better than SHA1 or MD-5, just that they are bigger and slower. I note that no one else seems to think it is worth the overhead. And by no one else, I include IBM, HP, Quantum, Oracle, Spectra, and so on.
"


Then you either did not fully read what I posted or are not understanding it - BRU's in-stream CRC method is neither slower nor larger. In fact, it's faster than MD5 and is in-stream rather than a sidecar to the process. An MD5 sidecar "file" is usually around 57 bytes each and requires a separate file entry in the data written (if they are included in the archive at all) while BRU's 32bit CRC is 4 bytes and in-stream. You can't "lose" BRU's checksums, but you could lose the MD5 sidecar files that so many are promoting as "good enough". As for speed, I'd love to see any tar or LTFS-based solution try to match BRU's native CRC-inclusive I/O speeds while including all filesystem metadata, extended attribute information, ACLs, long paths, and full international character set support - even without including the MD5 sidecar files.

So far as mentioning hardware vendors, these companies aren't responsible for what the software does with their devices. You attempt to imply that the fact that they don't include such a solution is indicative that it's not needed - two important points with regards to that:
  • They don't think it's worth it because to do so would imply that their solutions are potentially prone to errors.
  • Those that include software with their hardware provide software tools that are based on archive container formats that can't provide in-stream checksums.

I can't comprehend why you are having a problem following this point. They don't offer or recommend it because they can't. TAR, MTF (Microsoft tape format), NDMP, nor LTFS have a way to include the checksum mechanism within their formats without breaking the existing formats.

BRU has proven for more than 29 years to be as fast or faster, more reliable, more compatible and with broader platform support than any other solution (except maybe raw tar in that last point).

Tim
--
Tim Jones
CTO - TOLIS Group, Inc.
http://www.tolisgroup.com
BRU ... because it's the RESTORE that matters!


Return to posts index

Tim Jones
Re: LTO file size mismatch in Finder
on Dec 10, 2014 at 12:22:41 am

From the LTFS Caveats list - item 11:

Do not depend on OS X or Windows system disk tools for information about an LTFS volume. Because system tools like Finder's "Get Info...", "du", and "df" do not have logic for dealing with the compression on an LTO drive or the space lost to file rewrites on an LTFS volume, the values returned will be estimates that will become less accurate as you write more data to an LTFS volume or replace existing files with new versions. While the available space numbers will be correct, if added to the used space in such situations, the resulting value will be less than the actual stated capacity for a tape.


The whole list is available here:

LTFS Use Caveats

Tim
--
Tim Jones
CTO - TOLIS Group, Inc.
http://www.tolisgroup.com
BRU ... because it's the RESTORE that matters!


Return to posts index

Tim Jones
Re: LTO file size mismatch in Finder
on Dec 10, 2014 at 4:27:42 am

One more thought - how did you originally format the tape? By default, compression is enabled, so if your files are (data) compressible at all, you will see the system reporting less data written to the tape.

If you do an ls -lR on the folder that you copied the files into on the LTFS volume, does the reporting information match the numbers for the same files on your disk?

Tim
--
Tim Jones
CTO - TOLIS Group, Inc.
http://www.tolisgroup.com
BRU ... because it's the RESTORE that matters!


Return to posts index


Goce Shamakoski
Re: LTO file size mismatch in Finder
on Dec 11, 2014 at 6:58:15 pm

Tom,

I have had many times BRU tell me the file size is 2.19 TB and it will take only 1 LTO6 tape only to run out of space during the write process and ask for another tape.

Just to be safe I started making all my folders 2Tb even. It's not ideal but it saves me valuable time


Return to posts index

Tim Jones
Re: LTO file size mismatch in Finder
on Dec 11, 2014 at 7:40:04 pm

Why worry about the size. Let BRU handle the tape swaps and you don't need to worry about it. Files spanning tapes with BRU is not the same as it was with older backup formats (or LTFS where you CAN'T span). BRU has checks and rewrites to protect both sides of the split.

If you're worried about losing a file from one side of the split to the other because you've physically lost a tape, just keep in mind that losing a tape will lose the file on the lost tape anyway...

Tim
--
Tim Jones
CTO - TOLIS Group, Inc.
http://www.tolisgroup.com
BRU ... because it's the RESTORE that matters!


Return to posts index

Tom Goldberg
Re: LTO file size mismatch in Finder
on Dec 14, 2014 at 5:40:48 pm

Every LTFS implementation (except the open-sourced free versions) support tape spanning.

Tom Goldberg
TGCS
30201 Rainbow Hill Rd.
Evergreen, CO 80439
mailto:tomgoldberg@gmail.com
http://tomgoldberg.net




Return to posts index


Tim Jones
Re: LTO file size mismatch in Finder
on Dec 14, 2014 at 5:46:52 pm

There's a difference between spanning tapes and breaking the data stream up to fit on tapes. The key being that the LTFS platforms that support automatic multi-tape operations do the data splitting for you.

Since you are concerned about BRU's checksumming wasting tape, this type of data splitting also wastes tape since with few exceptions, the data is split into segments based on the "approximate" expected storage size of the tape used. It doesn't (and really can't) take into account any compression that occurs while writing to the tape.

Tim
--
Tim Jones
CTO - TOLIS Group, Inc.
http://www.tolisgroup.com
BRU ... because it's the RESTORE that matters!


Return to posts index

Goce Shamakoski
Re: LTO file size mismatch in Finder
on Dec 17, 2014 at 7:31:29 pm

The reason I don't let BRU span tapes is because the catalog is not as robust as I would like it to be. At the moment I use a cataloging software that allows me to see detailed information about a file or folder ( Thumbnails of videos and pictures, metadata, comments, creation date and modified, original XSAN path, tape number, column view list view icons...) Having to use the little arrow to collapse the folders is real pain when I have 30 sub folders.

Another reason is I have head people saying not good idea to split video files.


Return to posts index

<< PREVIOUS   •   VIEW ALL   •   PRINT   •   NEXT >>
© 2017 CreativeCOW.net All Rights Reserved
[TOP]