Feature

In-depth: What's in store for our data?

Storage is becoming a pivotal resource in IT, as data volumes continue to rise. Danny Bradbury charts the latest happenings in this crucial area.

Things must have been very tense in the dry, airless corridors of HP over the past few months. Not only was the firm suffering from the departure of CEO Mark Hurd in a sex scandal that damaged its already tarnished ethics image, but it was also locked in a painful battle for thin storage provisioning expert 3PAR that was being played out in public. HP eventually won, trouncing competitor Dell - but why was it so anxious to snap up the firm for such an inflated price ($2.35bn, up from $1.6bn barely a week earlier)?

"Storage spend has been on the rise for a number of years now," says Andrew Reichman, senior analyst at Forrester Research. "Vendors have realised that it's a category that they really have to take notice of. So having a comprehensive line-up of those technologies is really important for them."

A vibrant future for solid state

With rising interest in digital video, a focus on driving energy efficiency into the data centre, and with increasing interest in virtualisation, no wonder storage is becoming an increasingly hot topic for IT departments. Back in January, IDC published a selection of trends that it expected for the storage market in 2010. According to experts in the sector, many of these trends are happening.

One of the IDC predictions was an increased set of use cases for solid state drives (SSDs). "We've been talking about being on the edge of the hockey stick curve in growth terms for a year or two now," says Mark Peters, senior analyst at the Enterprise Strategy Group, of SSD. "And it's the adoption of automated tiering that will help us unlock that door. Until now, you've needed to know what to put on SSD to make it viable."

SSD is fast, and consumes little power, making it perfect for space- and power-constrained datacentres. But it is also very expensive, meaning that only very specific information should be stored on it.

Tiered storage

Tiering of data involves moving it from premium storage, to lower-performance media, and then off to archival space. It s difficult to do if you attempt it manually.

"Tiering has to be automated. This should be taken away from human administrators, and decisions should be made on the fly," says Phil Jones, CTO of storage reseller Shoden.

Automated tiering can be linked to information lifecycle management - the concept pioneered by EMC, which stewarded data from inception to death, based on attached metadata. That complex concept hasn't reached very far into the corporate world yet, because of its complexity and of the need for human involvement.

'Server huggers' in business management, who are highly protective of their digital assets, would need to be involved in helping IT to understand more about their data and what it means. Who has the time for that, and who can let go of the political baggage involved?

Consequently, most algorithms today rely on simple ageing of data - the older it gets, and the less frequent its use, the more likely it is to be deprecated, moving to slower media. Those calculations aren't particularly difficult, but determining how to break data into chunks while maintaining the performance of the whole system is a big challenge, says Reichman. "People are working out how to avoid using too many CPU cycles in the breaking down and movement of the data. With block storage systems, it's much harder. That's where the main interest is," he muses.

"Age is one we still see. There is an 80-20 rule, in terms of 20% of the data being active and 80% being inactive," explains Mark Govan, partner and alliances manager at F5 networks, which specialises in capacity balancing and storage tiering. "Customers like the fact that we can dynamically place content. We can isolate PST files and put them into a different area, for example."

Redefining the storage value chain

If an organisation is lucky enough to have SSD storage, then when it moves to the disk layer, it is likely to be loaded onto high-speed drives. From there, it may move to tape - or even to low-cost SATA drives, which are making significant inroads into storage architectures. Tape used to be de rigeur both for archival and backup purposes, but now, SATA drives are replacing tape in an increasing number of backup environments. Not only are disks seen by many as more reliable, but they are faster, meaning that backup data can be accessed more quickly. Finally, points out Jones, disks take up less space.

"We are replacing tape in mainframe environments with deduplicated disks," he says. "We sold 150 Data Domain appliances, which takes us from us from 80 square metres to six."

Deduplication on tape isn't easy to do, given its linear nature, but deduplication technologies for disk are dramatically reducing the cost of storage. Tape is increasingly being relegated to pure archival applications. Deduplication on disk can yield significant savings, even when dealing with lower-cost hardware.

In a world that operates on email, duplicate mails get sent to multiple recipients every day. Many of them have large attachments. Stripping all of these out of stored data and simply using an index to point to a single master copy can drive huge efficiencies into data centre architecture.

"The next horizon will be extending that deduplication capability to primary storage," says Reichman. "NetApp is leading right now and it is the only vendor to have deduplication built into its primary storage."

Storing more while spending less

This is significant given one of IDC's other drivers: the decapitalisation of IT. Companies may find themselves having to refresh IT equipment after the recession, but they are doing their best to rein in capital expenditure.

Thin provisioning is a key element in reducing capex, points out Jones. Traditionally, business unit 'customers' asking for more storage will often be allocated a single physical disk or array. Much of the storage capacity that those users asked for may go unfilled for long periods of time, meaning that the IT department ends up paying for large amounts of unused inventory. Thin provisioning takes advantage of storage virtualisation, separating the logical storage capacity from the physical storage device. Users are allocated exactly what they need at any particular time, and can be given more capacity (or have it taken away from them) just as they need it.  

"Thin provisioning saves not only capital costs, but also operational costs," says Jones.

Thin provisioning - a key element in 3PAR's portfolio that fuelled HP's interest in the company - is particularly important to data centre density. The less storage needed, the smaller the impact on data centre footprints, both in terms of power and space.

Storage area networks (SANS) are of course the poster child of storage virtualisation. Fibre channel has traditionally been the high performance SAN protocol of choice, but it comes with a significant capital and operational overhead. Not only must companies purchase the host bus adaptors to make the protocol work, but they must also invested in human capital. Administering a fibre channel network is a rarefied skill, for which customers will pay a princely sum.

SANs for the masses

Fibre channel networks operate at up to 8Gbit/sec today, although 16Gbit versions will be offered next year. In the meantime, there are other options. Fibre Channel over Ethernet (FCoE) is an alternative for those that don't wish to deploy two separate network protocols, explains Henry Baltazar, senior analyst storage and systems at analyst firm the 451 Group.

Fibre channel over Ethernet (FCoE) promises to simplify the deployment of fibre channel, at least from a capital expenditure perspective. Using this technology, resellers can run fibre channel protocols over their customers' existing 10Gbit networks, simplifying their architectures.

However, traditional Ethernet, with its CSMA/CD packet collision detection technology, was not good enough to support fibre channel.  "Fibre channel wasn't good at lossy technology, so they made a class of Ethernet with guarantees that you won't drop packets," Baltazar explains.

Converged Enhanced Ethernet, a standard put together by the Data Center Bridging (DCB) Task Group of the HYPERLINK "http://www.google.com/url?q=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FIEEE_802.1&sa=D&sntz=1&usg=AFQjCNF8d2jxDSn39bOqAuxxtbQ6Pngh5w"IEEEHYPERLINK "http://www.google.com/url?q=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FIEEE_802.1&sa=D&sntz=1&usg=AFQjCNF8d2jxDSn39bOqAuxxtbQ6Pngh5w" 802.1 Working Group, guarantees lossless transmission. Called Data Centre Ethernet (DCE) by Cisco, it is the protocol over which FCoE runs.

Yet another SAN solution is iSCSI, which runs the SCSI storage protocol over IP networks. Traditionaly SCSI is a short-range protocol designed for direct-attached storage, but making it available over conventional network infrastructure makes it usable over LANs or long-range WAN environments.

This is doubtless why iSCSI players have also been the subject of multiple acquisitions. Dell purchased iSCSI firm Equalogic for $1.4bn in 2007, while HP bought iSCSI and storage virtualisation pioneer LeftHand in October 2008. 3PAR - which bases its technology on fibre channel - gives HP a broad portfolio, enabling it to cater to both enterprises and small businesses wanting a SAN partner.

Storage for SMBs

As a lower-performance SAN technology that plays happily over lossy network links, iSCSI is an ideal protocol for SMBs that have been wary of the traditional complexity associated with SANs. Not only do they not need to deploy two separate networking protocols, but they don't need specialist staff.

However, be warned, says Reichman. "What's lacking is best practice for configuration. It's still a SAN, and you still have to pay attention. So as companies get more mature with that, and vendors get clearer, we'll see adoption."

This, of course, is where resellers come in. Selling solutions designed to run over commodity infrastructure may mean slightly lower margins, but the real value comes in consulting work for configuration.

SMBs are generally seen as a growth market in technology. Perhaps it is unsurprising, then, that another of IDC's predictions for storage trends in 2010 revolved around this community. The firm suggested that personal storage vendors would begin to find more opportunities for their products in small businesses. Examples are Data Robotics, which started off with the Drobo, a basic RAID storage system that protected consumer data. However, the firm quickly escalated into full iSCSI devices aimed at small business users.

Against this fast-moving backdrop of technology and business development, storage is becoming an increasingly important topic for customers as data volumes increase. Perhaps the final word is best left to one of the best-known stewards of data: Google. CEO Eric Schmidt unveiled this little nugget at the company's Atmosphere 2010 conference in August, where he spoke before 400 CIOs:

"Between the birth of the world and 2003, there were five exabytes of information created," he said. "We [now] create five exabytes every two days. See why it's so painful to operate in information markets?"

Solid state disks in disaster recovery
One area in which solid state drives (SSDs) excel is in reliability. Because there are no moving parts, there are no heads to crash, which makes them suitable for sensitive operating environments. Axxana sells a disaster recovery storage system designed to integrate into a data centre and protect data that has not yet been replicated.

Traditionally, data centre replication has enabled organisations to replicate data between servers, so that if one physical site is taken down, the other still runs. The problem, according to CEO Eli Efrat, is that this synchronous backup becomes expensive with large amounts of data.
 
"These are fibre channel lines. It is very expensive infrastructure. This is why in New York's financial district, you have datacentres so close by across the Hudson," he says.
 
Axxana posits an alternative: asynchronous replication, in which a buffer of a couple of Gb of information is stored at the primary site.

"You take the buffer," says Efrat, "And you put it in a black box. You save that through the disaster." The black box in question consists of SSD storage encased in a highly protective cover, designed to withstand intense heat and pressure. The black box sends data asynchronously to a backup site, enabling customers to use slower, wider-area links rather than relying on very high-speed metropolitan area links. It also protects data centres in the event that a link goes down, he points out.


This was first published in September 2010

Join the conversation Comment

Share
Comments

    Results

    Contribute to the conversation

    All fields are required. Comments will appear at the bottom of the article.