Amazon QuickSight now supports Amazon EMR with new Presto and Apache Spark Connectors



Amazon Web Services (AWS) announced last week that it has added two new connectors for Presto and Apache Spark. Presto is an open source distributed SQL query engine for running interactive analytic queries against datasets ranging from gigabytes to petabytes. Like Presto, Apache Spark is an open-source, distributed processing system commonly used for big data workloads.

Amazon QuickSight customers can now connect to Presto and Spark (with LDAP authentication enabled) running on Amazon EMR 5.5.0 or above, or self-hosted clusters on EC2 and analyze their big data at interactive speed. Customers can choose to directly query against the Presto and Spark engines, or ingest their data into SPICE for faster visualization.

Presto can run on multiple data sources, including Amazon S3. Its execution framework is fundamentally different from that of Hive/MapReduce. Presto has a custom query and execution engine where the stages of execution are pipelined, similar to a directed acyclic graph (DAG), and all processing occurs in memory to reduce disk I/O. This pipelined execution model can run multiple stages in parallel and streams data from one stage to another as the data becomes available. This reduces end-to-end latency and makes Presto ideal for ad hoc data exploration over large data sets.

Users need to run Presto version 0.167, at a minimum, which is the first release that supports LDAP authentication. LDAP authentication is a requirement for the Presto and Spark connectors and QuickSight refuses to connect if LDAP is not configured on the cluster.

QuickSight makes it easy for you to create visualizations and analyze data with AutoGraph, a feature that automatically selects the best visualization for you based on selected fields. To create a visualization, select the fields on the left panel. In this case, look at the number of connections to CloudFront ordered by the various OS types, by selecting the OS field. Additionally, users can select the bytes fields to look at total bytes transferred by OS instead of count.

Last month, AWS announced support for AWS CloudTrail in Amazon QuickSight, which allows logging of QuickSight events across an AWS account. Whether you have an enterprise setting or a small team scenario, this integration will allow QuickSight administrators to accurately answer questions such as who last changed an analysis, or who has connected to sensitive data. With CloudTrail, administrators have better governance, auditing and risk management of their QuickSight usage

AWS Amazon QuickSight helps to democratize BI with the goal is to make it easier and cheaper to roll out advanced business analytics capabilities to everyone in an organization. Overall, this enables better understanding of business, and allows faster data-driven decisions in an organization. In the past, the ability to share data presented an administrative challenge – that of knowing who has access to what data. Solving this problem ensures compliance with policies, and also provides an opportunity for businesses to see how employees use data to drive crucial decisions.

Amazon QuickSight also allows users to create key performance indicator (KPI) charts, define custom ranges when importing Microsoft Excel spreadsheets, export data to comma separated value (CSV) format, and create aggregate filters for SPICE data sets. In the Enterprise Edition, AWS added an additional option to connect to their on-premises Active Directory using AD Connector.

Amazon QuickSight provides fast, easy-to-use, cloud-powered business intelligence at 1/10th the cost of traditional BI solutions. QuickSight uses a new, super-fast, parallel, in-memory calculation engine (“SPICE”) to perform advanced calculations and render visualizations rapidly.

QuickSight integrates automatically with AWS data services, enables organizations to scale to hundreds of thousands of users, and delivers fast and responsive query performance to them. Users can connect QuickSight to AWS data services, including Amazon Redshift, Amazon RDS, Amazon Aurora, Amazon S3, and Amazon Athena; and also upload CSV, TSV and spreadsheet files or connect to third-party data sources such as Salesforce.

 

Leave a Reply

WWPI – Covering the best in IT since 1980