# [Theory + Math] Long Term Storage Strategy

 [Theory + Math] Long Term Storage Strategy by David Gagne on Sep 30, 2011 at 3:36:41 pm

Hypothesis: Long-term digital storage is not cost effective when working with a data set that is growing significantly every year.

Up front, I am not claiming to say that LTO or RDX disks are useless, I'm just trying to show that cost-effectiveness is not a good reason to go with those solutions. Also, my research is very incomplete, and I'm hoping to get some feedback and more info from those who have been around for a while.

I created a spreadsheet with these assumptions:
Data storage costs (consumer grade) declining at* : Power(10,-.2502*(C1-1980)+6.304)
Data storage costs (enterprise) average cost is some multiple of the consumer cost ** : 22x
Data growth per year *** : 30%
Data storage current need ***: 100TB

With those assumptions, I ran the numbers for a few different cases:
1. Completely replace storage every year to meet new growth need of current year
2. Completely replace storage every two years to meet new growth need of next two years
3. Completely replace storage every three years to meet new growth need of next three years
4. Completely replace storage every four years to meet new growth need of next four years
5. Completely replace storage every five years to meet new growth need of next five years
6. Completely replace storage every ten years to meet new growth need of next ten years
7. Completely replace storage every fifteen years to meet new growth need of next fifteen years

Can you guess which of these is most cost effective? (Results at bottom)

The cheapest total cost over 15 years is to replace every 3rd year (2nd year was close too).
Even more interesting is that it is cheaper to completely replace every year than to replace every four years. This is because in order to last four years, you need to buy four years worth of storage at today's high prices as opposed to the future's cheaper prices.
And if you try to buy enough storage to last you ten or fifteen years, you are a fool, it will cost you 4-10 times more just for the "convenience" of never upgrading. Meanwhile your storage will be slower and more difficult to access.

Now there is of course some flaws to this model -- most people don't COMPLETELY replace their storage when they upgrade. For us we typically have two or three generations of storage at a time. I haven't run the numbers on that, but I hope to figure it out soon.

Also I didn't factor in the reduced cost of LTO into the equation because I didn't have as solid of info on cost-decline of LTO and how that relates to cost of disk, but the costs would have to be SIGNIFICANTLY cheaper, which is not always the case.

If anyone wants a peek at my spreadsheet, let me know :)
Please send me your corrections or questions, just please don't flame me too hard (looking at you Zelin!)

RESULTS
Replace Total Cost
1 yr : 285795
2 yr : 215199
3 yr : 213276
4 yr : 307036
5 yr : 360946
10 yr : 823593
15 yr : 3057945

* Cost Per Gig Consumer Taken from http://www.mkomo.com/cost-per-gigabyte
** This is a total guess, can someone provide reliable research on this?
*** Just made this up as a test case

 Re: [Theory + Math] Long Term Storage Strategyby Bob Zelin on Sep 30, 2011 at 8:48:46 pm

David writes -
The cheapest total cost over 15 years is to replace every 3rd year (2nd year was close too).

REPLY - hey, I agree with you. What do people do with all those DLT tapes, and SCSI and PATA drives (not to mention those iomega ZIP and JAZ drives). REPLACE every third year.

You know what the problem is of course - no one does this. Just like no one dumped their 1" Tape libraries to Digi Beta (and then HDCam). They just sit on the shelf and rot.

Bob Zelin

 Re: [Theory + Math] Long Term Storage Strategyby David Gagne on Sep 30, 2011 at 9:31:06 pm

Good to hear from someone who has been around long enough to see many generations of this... It's helpful to have both experience as well as the math behind it showing that it makes sense to do this.

Of course the other piece of the puzzle is having good data-retention policies, but that's a tough one to sell as well.

Bob, my thought is to make sure to be up-front with purchasers about this -- and maybe even push for leased storage so that they get in the routine of renewing every third year. It's helpful if they already have it marked on their calendar, "THIRD YEAR, TIME TO GET NEW STORAGE TO SAVE MONEY." Also it's helpful when purchasing -- this purchase doesn't have to last you forever, just for 3 years.

 Re: [Theory + Math] Long Term Storage Strategyby David Gagne on Sep 30, 2011 at 9:32:32 pm

Here's the link to my spreadsheet:

https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0Ao_hE0x5R7SZ...

 Re: [Theory + Math] Long Term Storage Strategyby Chris Gordon on Oct 1, 2011 at 1:00:33 pm

When you're thinking of long term archival storage, don't forget you need to make sure you have equipment and software that can actually read and use the data you've archived. Do you end up keeping your own inventory of parts and equipment to access the older formats or do you do data migrations to new formats? Both can be expensive. Also, don't forget you need to periodically check the media and make sure everything is still ok on it. Over time there are various things that will happen that will render that digital archive useless. This can again, be expensive to do -- periodically go through the media and make sure everything is ok, writing data to new media (may be same type, but just media that isn't worn out from reads/writes) to be ahead of failures. Archiving can be very expensive well beyond just the cost of the media used.

As for 3 years, that is a very common "lifespan" for equipment. You find that after 3 years, failure rates will start increasing as things wear out. For equipment you buy support for, 3 years is often the initial term you can buy. After that you're buying another round of support that is sometimes more expensive than buying new gear (not blaming a vendor selling support -- if failures are far more likely, they need to cover that cost). At work, we put a 3 year life on the vast majority of the servers we buy. After 3 years of operation, we replace them. This helps avoid failures of aging machines, keeps with modern hardware that is supported by newer software, and is tied to our financials (depreciation schedules, etc). For some select gear, we use a 5 year schedule. These numbers are very common throughout the IT industry, so it's not surprising that the numbers work out that way for your scenario.

But to your initial assertion, archiving is expensive especially when you have a constantly growing archive requirement. There's no way around that. Storage has gotten far far cheaper over time, but if rate of your archive growth is greater than the rate of storage cost reductions, you're on the expensive side of the equation.

Chris

