Amazon Redshift improves Workload Management console experience

Amazon Redshift Workload Management (WLM) will now allow users to flexibly manage priorities within workloads so that short, fast-running queries do not get stuck in queues behind long-running queries.

Amazon Web Services (AWS) is announcing an improved WLM experience in the Amazon Redshift console. The new features such as in-line validations and simpler error messages can create WLM queues and manage workloads.

When users run queries in Amazon Redshift, the queries are routed to query queues. Each query queue contains a number of query slots. Each queue is allocated a portion of the cluster’s available memory. A queue’s memory is divided among the queue’s query slots. Users can configure WLM properties for each query queue to specify the way that memory is allocated among slots, how queries can be routed to specific queues at run time, and when to cancel long-running queries.

Clients can also use the wlm_query_slot_count parameter, which is separate from the WLM properties, to temporarily enable queries to use more memory by allocating multiple slots.

Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse that makes it simple and cost-effective to analyze data using existing business intelligence tools. By starting small for $0.25 per hour with no commitments and scale to petabytes for $1,000 per terabyte per year, less than a tenth the cost of traditional solutions. Customers typically see three times compression, reducing their costs to $333 per uncompressed terabyte per year.

Amazon Redshift delivers fast query performance by using columnar storage technology to improve I/O efficiency and parallelizing queries across multiple nodes. Data load speed scales linearly with cluster size, with integrations to Amazon S3, Amazon DynamoDB, Amazon Elastic MapReduce, Amazon Kinesis or any SSH-enabled host.

Users pay for what is used only, and can have unlimited number of users doing unlimited analytics on the data for $1000 per terabyte per year, which is 1/10th the cost of traditional data warehouse solutions.

Amazon Redshift allows automation of common administrative tasks to manage, monitor, and scale the data warehouse. By handling all these time-consuming, labor-intensive tasks, Amazon Redshift frees up space to allow users to focus on data and business.

Amazon Redshift uses a variety of innovations to obtain very high query performance on datasets ranging in size from a hundred gigabytes to a petabyte or more. It uses columnar storage, data compression, and zone maps to reduce the amount of I/O needed to perform queries. Amazon Redshift has a massively parallel processing (MPP) data warehouse architecture, parallelizing and distributing SQL operations to take advantage of all available resources.

The underlying hardware is designed for high performance data processing, using local attached storage to maximize throughput between the CPUs and drives, and a 10GigE mesh network to maximize throughput between nodes.

Amazon Redshift automatically and continuously backs up new data to Amazon S3. It stores snapshots for a user-defined period from one up to 35 days. Users can take their own snapshots at any time, and they are retained until they are explicitly deleted. Amazon Redshift can also asynchronously replicate snapshots to S3 in another region for disaster recovery. Once a cluster is deleted, system snapshots are removed, but user snapshots are available until they are explicitly deleted.

With a couple of parameter settings, users can set up Amazon Redshift to use SSL to secure data in transit and hardware-accelerated AES-256 encryption for data at rest. when choosing to enable encryption of data at rest, all data written to disk will be encrypted as well as any backups. By default, Amazon Redshift takes care of key management but users can choose to manage their keys using their own hardware security modules (HSMs), AWS CloudHSM, or AWS Key Management Service.

Amazon Redshift enables users to configure firewall rules to control network access to data warehouse cluster. Users can also run Amazon Redshift inside Amazon VPC to isolate data warehouse cluster in their own virtual network and connect it to existing IT infrastructure using industry-standard encrypted IPsec VPN.

Amazon Redshift integrates with AWS CloudTrail to enable audit of all Redshift API calls. Amazon Redshift also logs all SQL operations, including connection attempts, queries and changes to the database. Users can access these logs using SQL queries against system tables or choose to have them downloaded to a secure location on Amazon S3. Amazon Redshift is compliant with SOC1, SOC2, SOC3 and PCI DSS Level 1 requirements.

When multiple sessions or users running queries at the same time, some queries might consume cluster resources for long periods of time and affect the performance of other queries. In this situation, the short-running queries might have to wait in a queue for a long-running query to complete.

Users can improve system performance and their experience by modifying WLM configuration to create separate queues for the long-running queries and the short-running queries. At run time, users can route queries to these queues according to user groups or query groups.

Clients can also configure up to eight query queues and set the number of queries that can run in each of those queues concurrently, up to a maximum concurrency level of 50 across all of the queues. Users can set up rules to route queries to particular queues based on the user running the query or labels that is specified, and also configure the amount of memory allocated to each queue, so that large queries run in queues with more memory than other queues. Users can also configure the WLM timeout property to limit long-running queries.

Leave a Reply

WWPI – Covering the best in IT since 1980