It’s time to think differently about how you measure the efficiency of your Data and the technology that underpins it!
For many years we Storage Vendors have continuously challenged each other for supremacy in the amount of Storage efficiency that we could give you, starting with SnapShots as a way of reducing backup overheads, then on to Deduplication, Compression, Compaction, Cloning. There have been some incredible innovations here and we will continue to see incremental improvements as these techniques continue to improve and new ones are developed.
As with all things though at some time you get to a point of diminishing returns and you need to start considering Data efficiency much more broadly. In a world where you’re working with larger and larger numbers of applications and combining your own resources with that of applications running in SaaS providers or Hyperscalers, your ability to have your Data available across all of these simply, with the smallest footprint and available to as many apps as you may possibly need, becomes a very important measurement.
I call this your Data Amplification Ratio (DAR).
Think about it this way, when you consider efficiency there are what I call ‘below the data’ capabilities, of which all the ones I mentioned before are examples, how do I pack the greatest amount of Data into the smallest amount of space on whatever technology I’m using to store it. But when you think about your Data Amplification Ratio it’s the ‘Above the Data Efficiencies’ that become more important.
Cloning was probably the first of the technologies that started this crossover, sure Cloning enables you to efficiently store multiple virtual copies of data with almost zero overhead, that’s efficiency, but that’s not the real value. The real value is what you can now use these virtual copies for, from one full copy of Data you can now create clones for Test / Dev / Backup / Disaster Recovery, but this has been very ‘Sys Adminy’ if such a term exists, it creates value for the Admins to accelerate what they need to do whilst reducing the cost of doing it, but it also starts to help Application Developers as they can now use this to accelerate Test / Dev workflows. This is the first step toward effectively understanding, implementing and realising the value based on ‘Data Amplification’.
So how do you measure your DAR?
Well let’s say that you have Data that you store to support an IoT project, you’ve got a large amount of data flowing into your Data Collection platform in real time, this is your ‘Base Data set’, so how can we amplify this data? If your platform is open (Hybrid rather than just block), then other Apps can now easily be pointed to this data, your SAP system, your Hadoop platform, Spark etc. if your storage platform is open to enable these different Apps to connect then by simply using Snapshot’s and Clones you can start to have all of these amplifying from a single core set of data. As you bring in new applications to the environment to create new opportunities then these also feed off and therefore continue to amplify the value of your core set of Data.
The future for most organisations is Hybrid, a mixture of on-premises capabilities and those delivered from Cloud providers, so how does this fit into Data Amplification? Here’s a couple of NetApp examples…
1. If my IoT data is feeding into my on premises Data platform, but I am able to simply replicate all or parts of this straight into AWS or Amazon, where I have all of the same Data capabilities, SnapShots, Clones etc. then how many Applications in the AWS environment are now available to amplify this data further? That’s right, hundreds, if not thousands, all amplifying based off a highly efficient core set of data. This is what our ONTAP and ONTAP Cloud software delivers.
2. What about synchronising my on premises data to the Cloud, using any of the hundreds or thousands of tools there to interrogate it, manipulate it, basically extract more and more value from it, then replicate this back again. This is what our Cloud Sync software enables for you.
Your ‘Data Amplification Ratio’ is not something that’s easily measured and you know what? measuring it is less important than being aware of it and considering it when you think about how you’ll build your Data Fabric for the future.
The silo’d approaches of the last generation will continue to become more efficient, but just making them more efficient simply isn’t the answer to the problems that you’re trying to solve for the future. Don’t be misled by a vendor trying to get you to fixate on efficiency as the biggest challenge and opportunity for you, it isn’t, it’s important but keep in context what the real difference in capacity or cost actually is. You have to consider a Data Fabric that allows you to be efficient in the way you store and manage your data, but that also enables you to start to Amplify the value of your Data both on premises and across the Hybrid Cloud.
Storage Efficiency is useful and we can and will keep making improvements, but ‘Data Amplification’ is where real value and innovation lies.