New BlueData EPIC 3.0 provides large-scale deployments of big data analytics and data science with Docker containers



BlueData, provider of Big-Data-as-a-Service (BDaaS) software platform, announced Tuesday BlueData EPIC version 3.0 that provides increased scalability, enhanced networking, significant security and performance optimizations, and new functionality for distributed data science operations. With BlueData EPIC 3.0, enterprises can deploy large-scale production environments for big data analytics and data science running in Docker containers — either on-premises, in the public cloud, or in a hybrid architecture.

BlueData EPIC provides cloud-like agility and efficiency for on-premises big data deployments with virtual clusters running on embedded and fully-managed Docker containers. The EPIC platform in effect uses Docker containers as lightweight virtual machines, so until now each container would typically need its own network IP address to communicate with the external world and to interoperate with other containers in the virtual cluster.

In this mode, a single physical server can run dozens of lightweight containers and achieve higher resource utilization to improve efficiency. While this is attractive from a TCO perspective, it can also put additional operational constraints to acquire routable IP addresses from the networking team in an enterprise IT department. As a result, some large enterprise customers experienced delays in securing the required set of routable IP addresses for their containerized Big Data deployments — due to operational constraints in their organization.

To address this challenge, BlueData announced its new functionality called the BlueData EPIC Gateway: an optional software feature that completely eliminates the need for routable IP network addresses for each container. By removing the requirement for routable IP addresses, the company now provide greater flexibility to configure the container network for customers’ big data deployments. without this network address limitation, it allows virtual clusters on the BlueData EPIC platform to scale to hundreds of virtual nodes and thousands of containers – without impacting the performance or user experience for analysts and data science teams.

Enterprise deployments of the BlueData EPIC software platform have expanded significantly over the past year, driven by increased adoption of Docker containers and cloud computing for Big Data. Many of these enterprises want their Big Data workloads running in a hybrid model that spans both on-premises and public cloud infrastructure, whether for initial dev/test and sandbox environments or large-scale production deployments. These organizations need a solution that can scale to thousands of containers, with agility and flexibility for distributed data science operations in a multi-tenant architecture. And for Big Data workloads deployed on-premises, they want minimal to no changes to their existing network and security infrastructure.

The new EPIC release can now scale to hundreds of virtual nodes and thousands of on-premises containers by supporting non-routable, private IP ranges that can be accessed via a set of BlueData EPIC gateway hosts. This greatly simplifies the networking infrastructure for containerized Big Data applications in enterprise-wide production deployments. And with EPIC 3.0, BlueData adds fine-grained monitoring for CPU, memory, and other key metrics with a pluggable framework based on Elasticsearch, Metricbeat, and Kibana.

The new 3.0 version extends BlueData EPIC’s differentiation as the only unified Big-Data-as-a-Service solution for on-premises, public cloud, and hybrid deployments. For example, this new release delivers enhanced support for highly secure and highly available multi-tenant environments on Amazon Web Services (AWS): tenants and instances can now be isolated across different Amazon subnets, security groups, regions, and virtual private cloud (VPC) networks.

This release introduces several new optimizations that enable separation of compute and storage for containerized Big Data infrastructure, while ensuring comparable performance to bare-metal deployments. BlueData EPIC 3.0 also provides the option to utilize the same Kerberos principal for end-to-end security — from a Hadoop compute cluster and its associated services to a remote Hadoop Distributed File System (HDFS) — while supporting “mix and match” of different Hadoop versions for compute and storage.

BlueData EPIC 3.0 introduces a new user interface and streamlined user experience for data science teams – including one-click launch of containerized data science environments using pre-defined best practice templates. Building upon functionality released earlier this year, it helps automate the end-to-end lifecycle of data science operations. It includes new pre-integrated application images for Spark 2.x with Python (e.g. with Jupyter Notebook and PySpark) and R (e.g. with RStudio and sparklyr), and provides flexibility to deploy Spark either in standalone mode, with YARN, or now with Apache Mesos for resource management.

BlueData also introduced a new Action Script feature, so that users and tenant administrators can modify specific cluster parameters, review logs, and install required packages after a virtual cluster is launched. This new feature simplifies operational tasks – like adding a custom RPM package to a virtual cluster – and eliminates the need to build an entirely new Docker-based application image or launch a new cluster for these relatively minor changes. Action Scripts can also be used for advanced customer operations to automate certain routine tasks.

“Our customers now include some of the world’s largest companies, with some of the biggest Big Data deployments in the world. I’m very pleased to see our product roadmap being shaped by increased usage and expansion of production environments within these large enterprises,” said Kumar Sreekanti, CEO and co-founder of BlueData. “EPIC 3.0 is one of our most significant software releases so far. It delivers the enterprise-class scalability, security, and performance that these customers need — and solidifies EPIC’s position as the leading software platform for Big-Data-as-a-Service.”

 

Leave a Reply

WWPI – Covering the best in IT since 1980