Data deduplication has been a hot topic in the storage world for the past year or so, and with good reason. Any technology with the power to tame the relentless growth of user data is bound to get an IT manager’s interest, as more data inevitably means a greater management burden and higher operational costs – both unwelcome in the current economic climate.
Productive and sometimes profligate users are not the only factors driving exponential data growth – growth is also partly due to the way that data is protected. Most protection schemes are based on holding multiple copies of the data: daily back-ups, snapshots, replication, archive copies and RAID overhead all play their part. The result is a high level of data repetition, duplication and redundancy, adding to the management burden and making less efficient use of storage assets.
Data deduplication is the process of identifying duplicate blocks of data and replacing the duplicate with a pointer to a unique master copy of the data block. Comparing a data block, or rather its hash code, to all the previously written data is a highly compute-intensive task, which has limited its adoption mainly to enterprise-class back-up target appliances from Data Domain, EMC, Quantum and others.
There is no doubt that the appliance vendors have made much of the early running, but the storage market dynamics are evolving rapidly. Back-up application vendors are starting to embed deduplication functions into their software to make fully integrated solutions that are potentially easier to manage and at a more accessible price level than the appliance-based approach.
Deduplication for SMEs
IDC research frequently shows that IT managers in small and midsize enterprises (SMEs) face many of the same storage management challenges as their counterparts in larger enterprises, but generally without the benefit of large IT budgets and staff resources. Most SMEs are looking for ways to maximise their return on existing IT infrastructure, preferably without a major capital investment for additional hardware.
Deduplication is one of the key technologies that can drive storage efficiencies for SMEs, and it is now maturing to the extent that it is a viable option at the price points that SMEs are looking for.
The key benefits of data deduplication for an SME are:
- Save on storage: The ability to store 10x or more back-up data on existing disk storage means delayed capacity upgrades and better storage utilisation. The deduplication ratio when backing up VMDK server images can be considerably higher, making it particularly effective in virtualised environments.
- Ease of management: With the back-up application taking care of the deduplication, it becomes almost transparent to the user and saves the added complexity of managing a dedicated hardware appliance.
- Faster restore: Many more months’ worth of back-up data can be held on disk, meaning fewer restores from tape, less management overhead and higher performance.
- Less network traffic: If the back-up application deduplicates the data before it is sent across the network, back-up traffic is reduced and has less impact on the production systems. If back-up data is being replicated to a remote site via a wide area network, a similar benefit will be realised.
The potential downsides should also be considered:
- Possible throughput restrictions caused by deduplication processing overhead.
- Remote replication may be delayed until the deduplication process has completed.
- Capex costs, including the extra storage used in a hardware back-up target appliance.
- Additional library licence costs may be incurred if a virtual tape library (VTL) appliance is used.
The range of deduplication options available to the user is going through a significant change. Over the past couple of years, the focus has been primarily on network attached storage or VTL back-up targets, often with a high entry price point. The “target-based” approach is now increasingly complemented by the host-based approach, in which the deduplication function is integrated into the back-up application software, and no extra hardware is required.
Such an approach has been announced by Acronis, which is offering host-based block-level deduplication as an option for its Backup & Recovery 10 product family – a suite that is directly targeted at the SME sector.
The significance of this for the market could be profound: the Acronis Backup & Recovery software sells in the sub-€500 price range, which is a fraction of the cost of a dedicated deduplication appliance. The integrated approach should appeal to users looking to minimise complexity and thereby save on management costs.
An interesting aspect of the Acronis solution is that it can execute the deduplication process either at source level, on the server where the files are located, or at the target level, which is the Acronis Storage Node or media server that controls the back-up disk and tape devices. This provides some useful flexibility in that a server holding a large number of duplicate files can be configured to remove the redundant data before sending it across the network to the storage nodes, thereby minimising the impact on network bandwidth.
Data deduplication has undoubtedly matured into an effective means of mitigating the impact of data growth, improving data protection and increasing storage utilisation. With its latest announcement, Acronis has taken a clear initiative in the SME market and accelerated the evolution of deduplication into a mainstream technique for storage optimisation.
This was first published in September 2009