by Chris Marsh
This article marks the third in a three-part series looking at the role of tape in the cloud, data integrity verification in the cloud, and archiving in the cloud. Click here to read the first article in this series and click here to read the second article in this series.
When first contemplating the use of cloud services for archive data, I have to admit I was skeptical about the wisdom and practicality of this approach. It is true that cloud services can offer many advantages to an IT organization. However, the requirements for storing each type of data, which includes primary, backup, disaster recovery, and archive data, are very different. This means that before you choose the cloud for archive or any data for that matter, make sure the service you select is truly protecting your data the way it needs to be protected.
Of course, the kind of protection required depends on the kind of data you are storing. The types of data stored in the cloud include:
- Backup data, which is used to recover a system after a crash;
- Disaster recovery, which protects and maintains business continuity in the event of a natural disaster or an unexpected outage at a site; and
- Archive data, which is stored and accessed over the long term for the purpose of preservation.
Not surprisingly, archive data requires the most nurturing, with continuous and proactive data management strategies required over the long term to ensure that a set of data is both retrievable and readable when an organization requires it. Before considering the cloud as an appropriate target for archive data, an organization needs to recognize the nature of a true archival system along with the benefits and risks associated with turning to the cloud for its data archival needs.
What Is An Archival System?
The archive market is one of the fastest growing markets in the storage industry today. Perhaps because of this, the term ‘archive’ has taken on a life of its own, and is often distorted to fit various newly announced products. According to archive purists, many of these are not true archival systems. An archival system combines storage and data management practices, all of which are focused on ensuring that data managed in that system is both retrievable and readable regardless of its age. The media types, specific vendors, hardware and software used for retrieval along with the management techniques are less important than the steps that must be taken to ensure data integrity and accessibility. These steps include:
- Data integrity monitoring;
- Media integrity monitoring;
- Technology migration strategies for both hardware and software;
- Intelligent indexing (so you can find data when you need it); and
- Proactive heath monitoring with enough redundancy to prevent data loss.
Storing an AIT-1 tape at an offsite location without access to a working AIT-1 drive and software to retrieve that data is not an archive system – it is simply improper data management. Data on that tape is not retrievable and readable, nor is it likely that any steps have been taken to validate the continued integrity of that data since it was removed from the tape drive or library. In contrast, a true archival system ensures both the health and accessibility of the archived data by reliably migrating and monitoring the data as the system ages and changes.
A common misperception about archives is that the data stored in an archival system is the last and only copy of that data. This is simply not the case. Archive applications typically rely on multiple copies of data within a system. This is important, as mentioned in the second article in this series, since data integrity can be monitored by comparing the hash values of two copies of data. Thus, with multiple copies, data management software can proactively monitor and correct data corruption should it occur in an archive.
Technology exists today to proactively monitor the health of media and the data on the media to ensure that data can be migrated off to a healthy location in the event that the original platform shows signs of distress. However, this technology can only monitor data on tapes that reside in the library. This monitoring is not available for tapes that are stored outside the library unless a strategy is implemented to periodically cycle media back through the library.
Given the extensive requirements involved in protecting archive data, establishing an archive system can be daunting. Even if the storage system can retrieve data, there still exists the challenge of rebuilding an environment that can access the data. Virtual environments can provide a key role in rebuilding back-dated systems capable of reading the archived data. For example, think of an engineering schematic designed 10 years ago. Without the proper operating system and the software to view the data, modern systems are useless. However, virtualized environments can be created to restore back-dated systems for the purpose of retrieving and reading archive data.
Where Does Cloud Fit In?
Cloud is a business model for storing data. Whether it’s for accessibility, flexibility, application hosting, or any of the other common reasons for turning to cloud, the cloud provides a model that gives organizations the ability to meet their needs without large upfront capital investments or major overhauls within their own IT infrastructures. Everyday individuals and organizations make decisions to pay third parties to manage processes they either lack the expertise to do themselves or simply choose not to manage themselves. Consider this simple analogy: Many people hire cleaning services for their homes or businesses, not because it is less expensive, but because they do not have the time or the desire to do it themselves. There are always risks associated with these decisions (i.e. the possibility of something going missing from your house after it is professionally cleaned); the previous article in this series (Data Integrity in the Cloud) discussed some of the risks buried within the SLAs of some of the major cloud providers.
It is with these risks and requirements in mind that an organization must assess a cloud archival service. If you can find a cloud service built around the principles of an archival system, with SLAs ensuring the proper management of data, the cloud can be an appropriate alternative. You can be sure that archive data is one that grows consistently each year. Even as retention periods expire, the new data being sent to the cloud will surpass the size of the data being expunged, given today’s data growth rate trends. From a pricing perspective, this means that a cloud archive will only continue to grow in cost. Also, flexibility in storage capacity becomes less valuable, as the storage requirements of an archive are much more predictable than other types of data.
Above all, archive data must be properly managed. If you can’t be sure that the cloud can provide the necessary data management, then, proceed cautiously. After all, data is vital to the data owner, not to the service provider. Whether an archive is maintained by a cloud service or managed internally, the emphasis is on the data nurturing process, not the specific solution. As long as archive data is being properly managed, the use of a cloud archive is a business decision based on pricing and staffing needs and its practicality will vary from organization to organization.
Chris Marsh is the IT market and development manager at Spectra Logic (Boulder, CO). www.spectralogic.com