Microsoft adds security, performance to its Azure HDInsight to make Hadoop enterprise-ready for clouds



Microsoft announced Thursday new capabilities in its managed Hadoop and Spark cloud services Azure HDInsight to make Hadoop enterprise-ready in the cloud and easy for users with security capabilities of any cloud Hadoop solution, big data query speeds that approach data warehousing performance, and new notebook experiences for data scientists all on the latest Hortonworks Data Platform 2.5 and Spark 2.0 platform.

Spark 2.0 is a major release that overhauls the core query engine with “Project Tungsten,” which upgrades Spark with capabilities of a modern compiler to perform cache-efficient vectorized computations. This has enabled up to 10x faster performance with Spark 2.0 on an already-fast platform.

In addition to faster performance, Spark 2.0 also has broader support of the SQL syntax, an improved streaming engine that makes it easier to build real-time solutions, improvements to the Machine Learning pipelines, and more algorithms supported in SparkR. Finally, in response to customer demand, Microsoft and Hortonworks included 100+ fixes for Spark 2.0, improving its stability for production deployments.

With the latest release of Apache HBase for HDInsight, Microsoft is also introducing a Spark-HBase connector, giving users access to improved performance and power of Spark SQL to query HBase, and allows them to perform advanced analytics on top of all the data available in NoSQL database.

Both the latest Hortonworks Data Platform 2.5 and Spark 2.0 are available in Azure HDInsight later. Hive with LLAP is a new cluster type available as a public preview.

To support the adoption of Hadoop in the cloud, Microsoft understands that enterprises need peace of mind that the solution will help protect sensitive corporate data and intellectual property. With the new security features of Azure HDInsight, it gives consumers the highest levels of security for authentication, authorization, auditing and encryption available in the cloud for Hadoop.

The Azure HDInsight big data service can seamlessly integrate Azure Active Directory and Azure Active Directory Domain Services for enterprise-grade authentication and identity management. This is accomplished with a few clicks, making it easy to secure Hadoop clusters; make it easy to leverage existing on-premises Active Directory deployment, which currently supports 1.3 billion daily authentications across 600 million user accounts; while building sophisticated access control policies around users or security groups supported by features such as multifactor authentication.

Azure HDInsight is a managed cloud Hadoop service that includes Apache Ranger, which provides a central policy and management portal where adsministrators can author and maintain fine-grained access control policies over Hadoop data access, components and services. In addition, users can now analyze detailed audit records in the familiar Apache Ranger user interface.

Microsoft has been involved in making Hive run faster with its contributions to Project Stinger and Tez that sped up Hive query performance 100 times. The company’s initial Cloud Hadoop solution delivers onboard LLAP (Long Lived and Process) from the Stinger. Its next initiatives promise sub-second querying on big data, which is 25 times faster than existing Hive.

LLAP keeps data compressed running in-memory, while retaining the ability to scale elastically within a Hadoop cluster. It also brings many enhancements to the Hive execution engine like Smarter Map Joins, Better MapJoin vectorization, a fully vectorized pipeline, and a smarter cost-based optimizer.

In addition to these LLAP enhancements, the latest version of Hive also has faster type conversions, dynamic partitioning optimizations and vectorization support for text files. Collectively, these enhancements have brought a speed improvement of up to 25x when comparing LLAP to Hive on Tez, opening up new scenarios to do interactive BI and reporting on top of big data.

In addition, Microsoft has partnered with Simba to deliver an ODBC driver for Azure HDInsight that can be used with world-class BI tools like Power BI, Tableau and QlikView. Together, this allows business analysts to gain insights over big data using their tool of choice.

Leave a Reply

WWPI – Covering the best in IT since 1980