AWS introduces Amazon Redshift Spectrum to enable users to run analytic queries in Amazon S3



Amazon Web Services Inc. (AWS), an Amazon.com company announced Wednesday Amazon Redshift Spectrum, a new feature that allows Amazon Redshift customers to run SQL queries against exabytes of their data in Amazon Simple Storage Service (Amazon S3).

With Redshift Spectrum, customers can extend the analytic power of Amazon Redshift beyond data stored on local disks in their data warehouse to query vast amounts of unstructured data in their Amazon S3 “data lake” – without having to load or transform any data. Redshift Spectrum applies sophisticated query optimization, scaling processing across thousands of nodes so results are fast – even with large data sets and complex queries.

Amazon Redshift Spectrum enables users to run Amazon Redshift SQL queries against exabytes of data in Amazon S3. With Redshift Spectrum, customers can extend the analytic power of Amazon Redshift beyond data stored on local disks in the data warehouse to query vast amounts of unstructured data in Amazon S3 “data lake” — without having to load or transform any data.

Redshift Spectrum applies sophisticated query optimization, scaling processing across thousands of nodes so results are fast – even with large data sets and complex queries.

Redshift Spectrum directly queries data in Amazon S3 using the open data formats already being use, including CSV, TSV, Parquet, Sequence, and RCFile. Since Redshift Spectrum supports the same SQL syntax of Amazon Redshift, you can run sophisticated queries using the same Business Intelligence (BI) tools currently being used.

Users can also run queries that span both the frequently accessed data stored locally in Amazon Redshift and full data sets stored cost-effectively in Amazon S3.

Amazon Redshift allows customers to perform complex queries on petabytes of structured data stored on high-performance local disks and get superfast performance – all for a tenth of the cost of traditional data warehouses. However, as the cost of data storage has continued to drop, customers are increasingly storing vast amounts of data in Amazon S3 “data lakes,” including unstructured data that may never make it into a data warehouse.

Now, with Redshift Spectrum, analyzing all of this data is as easy as running a standard Amazon Redshift SQL query. Redshift Spectrum directly queries data in Amazon S3, with no loading or transformation required, using the open data formats customers already use, including CSV, TSV, Parquet, Sequence, and RCFile. Since Redshift Spectrum supports the same SQL syntax of Amazon Redshift, customers can run sophisticated queries using the same Business Intelligence (BI) tools they do today.

Users can also run queries that span both the frequently accessed data stored locally in Amazon Redshift and their full data sets stored cost-effectively in Amazon S3. Redshift Spectrum automatically scales query compute capacity based on the data being retrieved, so queries against Amazon S3 data run fast, whether processing just a few terabytes, petabytes, or even exabytes.

“Customers such as Amgen, Boingo Wireless, Electronic Arts, Hearst, Lyft, Nasdaq, Scholastic, TripAdvisor, and Yahoo! are migrating to Amazon Redshift in droves because it leverages the scale of AWS to analyze petabytes of data with ten times the performance at one-tenth the cost of old guard data warehouses. Many of these customers have asked us to extend the speed and flexibility of Amazon Redshift beyond the data warehouse to analyze all of the data they have in Amazon S3,” said Raju Gulabani, Vice President, Databases, Analytics, and AI, AWS. “Redshift Spectrum does just this, offering the best of both worlds by making it incredibly easy to query exabytes of data in Amazon S3 directly from Amazon Redshift. We’re excited to now make exabyte-scale analytics fast, simple and accessible to companies of all sizes.”

Customers can start using Redshift Spectrum using the AWS Management Console. Amazon Redshift Spectrum is available in the US East (N. Virginia), US East (Ohio), and US West (Oregon) Regions and will expand to additional Regions in the coming months.

Amazon Redshift also supports encrypting unloaded data using Amazon S3 server-side encryption with AWS KMS keys. The Amazon Redshift UNLOAD command now supports Amazon S3 server-side encryption using an AWS KMS key. The UNLOAD command unloads the results of a query to one or more files on Amazon S3.

Users can let Amazon Redshift automatically encrypt data files using Amazon S3 server-side encryption, or can specify a symmetric encryption key that can be managed. With this release, customers can use Amazon S3 server-side encryption with a key managed by AWS KMS. In addition, the COPY command loads Amazon S3 server-side encrypted data files without requiring users to provide the key.

Leave a Reply

WWPI – Covering the best in IT since 1980