Data Defined Storage: Uncover the Value from Your Global Data Store

altby Shahbaz Ali

Organizations today are grappling with unyielding data growth up to and beyond the petabyte level. Massive growth magnifies data management challenges, such as the overheads associated with storage acquisition and operation, as well as exacerbated data protection, governance and security concerns due to regulatory issues and data mobility. Modern businesses must find solutions that provide agility to both enterprise network users and to a geographically dispersed workforce. The capability to leverage big data analytics to gain value from their vast data stores is critical for organizations to gain a competitive edge.

NAS solutions do not scale to the levels required in many of today’s IT environments that are already dealing with petabyte levels of data. When you run out of space on a NAS, you can purchase more NAS devices, which can become costly and create data silos. Data silos are extremely inefficient as they regularly have unused capacity and prevent organizations from understanding and appropriately leveraging the data they possess, except through the individual application that created the data. Users of file based storage systems access data using standard network protocols such as NFS or SMB/CIFS, which creates a challenge for data access for geographically dispersed or mobile users.

What Options Do I Have?
There are a number of choices for storage architects to select from when building solutions that will scale out to petabyte levels, as well as meet the needs of local and mobile users. They include object storage (OBS), software defined storage (SDS) and data defined storage (DDS).

Object Storage
OBS platforms were specifically designed to address the scalability problem of today’s NAS and file server systems. Unlike traditional approaches, OBS does not use a file system hierarchy to store data. Data is stored as ‘objects’ and every object is assigned its own unique identifier. Users retrieve the stored data object using a unique “claim check” code; its actual location on physical media is abstracted within the pool of storage. This architecture allows for virtually unlimited scalability of the virtual storage pool.

The use of objects removes standard network file-sharing protocol access. Users generally access OBS through applications that typically use a REST API (an internet protocol, optimized for online applications). This makes OBS ideal for all online, cloud environments. However, to ingest existing file data and support data access for file based workflows, a third-party file gateway server must be added, with file protocols on one side and object API commands on the other. These gateways often eliminate many of the advantages of using OBS; including scalability, parallel object access, architectural flexibility, and improved data availability.

Software Defined Storage
Over the last few years, the term ‘software defined storage’ has increased in popularity. A rising number of vendors promote their products as SDS, however, at this time there are a variety of intended interpretations of the term. In general, SDS is the abstraction of storage services from the physical storage hardware. The software abstracts hardware resources, pools them into aggregated capacity and automates the action of distributing them, as needed, to applications.

SDS environments enable virtualized storage pools to manage siloed data across geographic sites and provide policy data management related to storage optimization which will reduce the cost of storage and storage administration, offering various options such as deduplication, replication, thin provisioning, snapshots and backup. To a great extent, storage has always been defined by software, it’s just that the software has typically been embedded into proprietary hardware platforms creating a storage appliance, whereas now with SDS, it is abstracted to commodity hardware through storage virtualization software that reduces TCO and enhances infrastructure flexibility.

The Issues
OBS and SDS technologies offer a good foundation to address the scalability needs in today’s storage environments, however, they are only component technologies that contribute towards a total solution. What if the same software were also to provide support for scalable file sharing as well as cloud-based access, comprehensive information governance, and the ability to gain business value from the ‘data about the data’ through metadata, search, e-Discovery and analytics?

Data Defined Storage
DDS goes beyond the basic technology components needed for petabyte level scale out storage and enables organizations to gain value from global data stores.

DDS supports REST API for data management, S3, CDMI and OpenStack compatible APIs for cloud provisioning and integration with existing applications. However, data can also be accessed using standard network protocols such as SMB (CIFS), NFS, as well as web access using HTTPS and FTP. This multiple access and protocol support breaks down the barriers for the growing mobile workforce, facilitating the journey to the cloud and adding significant value and flexibility for a geographically dispersed organization.

DDS uses a data centric approach, building on the benefits of both OBS and SDS technologies. These can only be mapped to the first core innovation pillar of DDS; Media Independent Data Storage. Media Independent Data Storage enables a media agnostic infrastructure, which can utilize any type of storage, including low cost commodity storage to scale out to petabyte-level capacities. DDS unifies all data repositories and exposes globally distributed data stores through the global namespace, eliminating data silos and improving storage utilization. To further reduce TCO, it provides intelligent pool-to-pool tiering based on the value of data over its life cycle, and distributed Object Dedupe and compression, reducing capacity requirements within virtualized storage pools.

In addition to Media Independent Data Storage, DDS includes two other key innovation pillars; Data Security and Identity Management and Distributed Metadata Repository.

Data Security and Identity Management delivers end-to-end information governance, protection, retention management, security and mobility across storage, servers, and smart devices to meet regulatory compliance mandates and to reduce business risk.

Distributed Metadata Repository captures the value of data across distributed data stores, by collecting all basic metadata and custom metadata, and conducting full text indexing and family: filtering of all standard and industry-specific files.

The Distributed Metadata Repository is harnessed to deliver a powerful enterprise search capability and e-Discovery, which improves search and discovery accuracy and retrieval times. The Distributed Metadata Repository also allows organizations a simple way to leverage analytics through integration with big data analytics tools, such as Hadoop, to tap into the full value from their vast stores of unstructured data. Unlike traditional storage platforms, which require the migration from an online pool to an analytics pool, the DDS solution efficiently exposes content within the repository throughout the grid, allowing data-in-place analytics to enable organizations to gain real-time insight to increase business agility and improved competitive advantage.

DDS is a fully integrated data-centric storage solution that breaks down the barriers that previously encumbered traditional data storage and allows organizations to effortlessly meet scalability and performance
needs, today and in the future, while significantly reducing corporate risk and monetizing data assets through enhanced access, search and data analytics. Organizations should consider DDS for delivering the highest possible value from global data stores.

Shahbaz Ali is the president and Chief Executive Officer of Tarmin.

Leave a Reply

WWPI – Covering the best in IT since 1980