Information security – protecting the contents of information stored on standard disks, tape, or the newer flash-based solid state disks – is an ever growing concern among enterprise IT organizations, in companies of all sizes. To keep up with this growing concern, more and more IT organizations are turning to encryption techniques to protect their valuable information assets. In addition to concerns over securing stored information, IT organizations are faced with ever-increasing costs to ensure that there is enough capacity in their storage solutions to meet the organization’s present and future demands. To keep up with the challenges associated with tight IT budgets and increasing storage capacity needs, some companies are beginning to turn to lossless data compression techniques.
This article discusses the benefits of deploying lossless data compression and encryption in enterprise storage controller applications. It describes which algorithms are best to use, and how to effectively combine encryption with compression and decompression into a unified solution. A specific focus on the differences between two popular compression algorithms, LZRW3 and GZIP, will highlight which of the two algorithms is best suited for particular storage applications, from both the technical and business perspectives.
Transformation of data through either compression or encryption raises some specific application concerns in the handling of block-based addresses and the placement of such processing within the storage stack. These issues are also addressed in this article.
The capacity and security challenges
Data storage has enjoyed a period of unprecedented growth in recent years. As storage densities have grown exponentially, the costs per gigabyte have steadily dropped. Yet the increase in available data-storage capacity has not kept up with the growth in storage demand. As a result, IT departments are trying to maximize utilization of their existing data-storage systems. More importantly, security concerns have become quite prevalent, and there is a critical need to secure all data at rest.
In the backup domain, the solution to both the density and the security problems has been to simply use software-based compression techniques to reduce the number of tapes (effectively increasing storage density) and encryption to secure the data on the tapes. Software-based encryption and compression are slow and rather expensive in terms of CPU usage. Since backup is done offline, this is usually not as issue. However, as the scope of encryption and compression is extended to online storage, including disks, the software-based approaches do not achieve the performance and latency characteristics that these applications require.
Lossless compression fundamentally depends on eliminating redundancies in any given set of data. Usually this is implemented by searching for repeated data patterns, retaining one instance of each pattern and replacing duplicates with short tokens referring to the retained patterns. Conversely, encryption randomizes data, thus removing all redundancies in the given data set including repeated patterns. As a result, encrypted data can’t be compressed. Therefore, when both encryption and compression are deployed, compression must be done before encryption to be effective.

Figure 1: Compression and Encryption processing flow
Order of execution, as highlighted in Figure 1, is one of several factors that influence the placement of compress-and-encrypt processing within the storage stack. Another important variable is the effect such transformations have on the original data size. Any injected processing (which changes the data size) either up or down, creates a wrinkle in the architecture of the storage stack. Disk devices (unlike tape devices that are at the lowest level of the online storage stack) typically work with a fixed physical block size. The fixed block size affects higher levels of the storage stack, such as the file system, which needs to process data in blocks sizes that are multiples of this physical block size. In fact, one of the key responsibilities of the file system is to map files of varying size, as seen by applications, into the fixed block size needed by the storage stack.
Encryption alone does not generally change the size of the data being protected (though some encryption techniques do add extra metadata to the encrypted stream thus expanding the size). Therefore, encryption can be placed anywhere in the stack. The most transparent location for encryption is very near or within the disk device itself. Placement at the device level minimizes disruption of the storage stack to key management tasks. On the other hand, compression’s sole purpose is to change the size of the data. The compression ratio is unpredictable and sensitive to data pattern. The size of the compressed data can even grow, given highly-incompressible source data.
As a result, compression presents unique challenges for the storage stack. Since file systems already do mappings of variable file size to fixed block sizes, compression’s most logical placement is near the top of the storage stack, above the mapping function, where the post compression size is known. However, it should be noted that most file systems use memory-mapped, block-oriented IO, completed discretely in fixed-block sizes. Consequently, such placement is usually non-trivial and requires significant changes to block mapping.
Finally, there is motivation to keep the compress and encrypt functions close together in order to minimize the number of times data makes performance-degrading system-bus crossings. In short, compression and encryption have conflicting placement needs. Since compression placement is non-trivial and will be disruptive to the storage stack wherever it is placed, the overriding goals of performance and of a generalized solution across platforms should take precedence. Thus, a compress and encrypt solution should be unified in the controller along with the mapping function.
Encryption and compression solutions
There are many encryption algorithms to choose from. Encryption for storage purposes, as in other domains, is measured in two dimensions – how strong and how fast. Encryption is a “survival of the fittest” technology. Secret communication is ancient and although an encryption technique is often invented by a small number of brilliant minds, it becomes instantly public with unimaginable numbers of minds attempting to crack the codes. Encryption science is mature and the algorithms that have endured are strong.
The networking domain began using encryption long before it was thought to be required in storage. Networking has stretched encryption in both throughput and encryption strength. Among the top survivors is the Advanced Encryption Standard (AES). Unlike many encryption algorithms which came before AES and which were intended to be executed by single core microprocessors, AES was invented with hardware offload in mind. Like other encryption algorithms, AES consists of rounds of cyclic processing. However, each round in AES may be executed independently of the previous round without a feedback path, lending itself to hardware parallelization and deep pipelining. Pipeline depth is chosen based on the throughput required and the achievable clock frequency. Also of consideration is the silicon area consumed. The deeper the pipeline, the more encryption table instances required to support parallel accesses. Hardware AES embodiments surpass the throughput performance requirements of modern disk technology with ample headroom, thus such tradeoffs are generally quite painless.
A requirement placed on the application of encryption in storage technology, as opposed to networking technology, is random access. In order to strengthen a given encryption algorithm, a technique is employed which makes proper decryption of the current data block dependent on proper decryption of the prior block. This is accomplished through a means of feed-forward seeding in the encryption algorithm. Such a technique is entirely feasible in networking where data streaming of significant length is the norm. Disk access on the other hand is highly randomized, dictating that any block of data be standalone decrypt-able. The smaller the standalone entity, however, the more vulnerable is the encryption to deciphering and cut-and-paste attacks. Once one block is cracked, it is often easier to decipher the rest.
Another factor in the choice of encryption is storage overhead. The data on a disk is packed in such a way that there is no room to store anything but the data. An encryption technology, when applied to disk storage, cannot depend on any additional data that might be stored as part of the algorithm. AES-XTS is an approach to deployment of AES in storage systems that address the need of random access without adding storage overhead. AES-XTS is known as a tweakable cipher. In a tweakable cipher, all that is required to encrypt or decrypt the data is the data itself, a key, and a tweak value. Neither the key nor the tweak are stored on a per sector basis. The tweak is tied to the address of the data on the disk in some way, and it need not be secret. AES-XTS facilitates a non-repeating algorithmic process that can be inferred based on disk block address. IEEE p1619 specifies AES-XTS in a two-instance deployment as shown in Figure 2. Two tweaks are used which correspond to the Sector and Block numbers of the data being accessed.

Figure 2: IEEE p1619 specifies AES-XTS deployment
The choice of compression is influenced by several factors: compression ratio, throughput, captive endpoints, standardized software/hardware compatibility, lossless or not, and even patent status. Given the storage domain addressed in this article, lossy algorithms acceptable for some video and audio applications are clearly not applicable. Many popular lossless compression algorithms use a technique called run length encoding, replacing large runs of data with a simple code indicating the data pattern and the length of the data run. Gzip and PKZIP are among the most well known algorithms employing run length encoding of the Lempel-Ziv (LZ) method.
Generally speaking, the decoding process (decompression) is quite trivial compared to the encoding process (compression), and is reflected in the compute horsepower required to execute these algorithms. Hardware offload effectively increases the throughput of both while reducing CPU load. Gzip is a patent-free compression algorithm that has been in use in Unix OS variants since the 90’s. Gzip compression is a combination of LZ run length encoding and a post-processing technique called Huffman encoding. Its use is so prevalent that the Gzip function has been built into tape archiving programs such as tar. Gzip allows archives which may have been compressed years ago to be decompressed by any hardware- or software-compliant Gzip implementation. Gzip is highly configurable; accepting flags which tell it how hard to work and how much history to process in order to compress data. A pass-through mode exists allowing incompressible data to be sent straight through without wasting computational processing on an otherwise futile effort. Lastly, Gzip is adaptive. It can “learn” as it processes data and order the most frequently used codes in the most efficient manner. Gzip’s configurability and adaptive behavior can be reflected in hardware. Although the decompression component is quite simple to implement in hardware, the full implementation of the compression component is more challenging.

Figure 3: Gzip functional blocks
The major Gzip internal processing steps are highlighted in Figure 3. Gzip performs compression by keeping a linked list of history addresses that contain matching 3-byte strings. Each new character arrival is rolled into the current 3-byte string when combined with the previous two bytes. A 3-byte hashing function indexes a “head table” that contains the address of the latest occurrence of that string in the history table, and also points to a “previous table” where a chain of previous occurrences of the same string exist in the history window. A search engine uses the match addresses as anchor points to begin byte-by-byte searches against the current 3-bytes, plus as many that match after that. The linked list of anchor points is processed until the longest match, or a match that is “good enough,” is found. The history is a sliding window and, when it is slid, all the hash chains must be adapted in order to have relevance to the window.
Processing speed is highly data pattern dependent, as quality matches may be found nearby or a very long way from the current match. The result of searching is a stream of literals (single byte characters where there are no previous matches within the history window), and length distance pairs (pointers back to a matching string and its length). The Literal/Length/Distance stream is fed into the Huffman encoder which codes the data in a form dependent on how often various sequences are seen. The most frequent occurrences are represented by the shortest codes. Window sliding, hash chain updates, searching, and Huffman encoding are compute-intensive operations. Hardware pipelining at a macro and micro level is essential and complex, but well worth the effort. Hardware implementations of Gzip can achieve throughputs 10 times greater than Gzip running on a microprocessor.
LZRW3 is another variant of run length encoding. In many regards it is a subset of Gzip. The sliding history table is absent. LZRW3 deals with fixed size history blocks up to 32KB. Huffman encoding is also absent, as is a pass-through mode. The remaining LZ technique varies from Gzip. There is only a single level of search performed. In other words, the first match encountered from the current position is the match chosen, regardless of the match length or quality. Rather than sending literals, back pointers and lengths in the compressed output, literals, indexes into a hash table, and lengths represent the output. The approach simplifies compression but complicates decompression as the hash table is not present in the compressed stream and thus must be developed on the fly by the decompressor. The benefits of LZRW3 over Gzip are that it is an easier and smaller hardware implementation of higher throughput, and it works on the fixed sized blocks characteristic of disk mapping. The downsides are that the compression quality is not as high as Gzip, and it is not common to see older archives written in the LZRW3 format. Additionally the LZRW algorithms are covered under several patents noted below.
|
1 |
|
|
2 |
|
|
3 |
|
|
4 |
In conclusion, the highest performance encryption algorithm which also lends itself to hardware acceleration, with performance headroom for the future, is AES-XTS. Gzip compression, with on the fly configurability and its innate pass through modes for incompressible data, is highly appropriate for storage when these modes are reflected in hardware. The solution is most powerful when it is combined side-by-side in the controller. Solutions from companies such as CebaTech Corporation offer high performance and configurable hardware implementations, combining gzip compression and AES-XTS encryption with data integrity checking. Such solutions meet the required performance of current storage technology and provide a growth path for continued performance escalation.
Chad Spackman is CTO at CebaTech.

