Hortonworks DataFlow 3.0 improves advancement of streaming analytics applications

Hortonworks Inc., a vendor of open and connected data platforms, announced Monday general availability of Hortonworks DataFlow (HDF) 3.0, the next generation of its open source data-in-motion platform. HDF enables customers to collect, curate, analyze and act on all data in real-time, across the data center and cloud. The growth of the Internet of Things brings new paradigm data from mobile devices, wearable technology and sensors which enterprises can use to uncover actionable intelligence in real-time.

Gartner estimates that, “By 2020, 70 percent of organizations will adopt data streaming to enable real-time analytics” and as such, adoption of HDF has accelerated significantly year-over-year. HDF is the industry’s first open source platform upon which enterprises can quickly build streaming applications for real-time analytics.

HDF 3.0 introduces Streaming Analytics Manager (SAM), allowing application developers, business analysts and administrators the ability to build streaming applications without writing a single line of code therefore greatly simplifying the process and speeding an application’s time to market. With simple drag-and-drop interface, SAM makes it easy to design, develop, test, deploy and maintain streaming applications on HDF. 
A new shared repository of schemas allows applications to flexibly interact with each other across multiple streaming engines including Apache Kafka, Apache Storm and Apache NiFi. Customers benefit from end-to-end data governance and increased operational efficiency.

“Building streaming applications is complex and time-consuming. There is tremendous market demand for an easier way to build next generation streaming analytics applications,” said Jim Curtis, senior analyst at 451 Research. “With the new Streaming Analytics Manager and Schema Registry features in HDF, Hortonworks is correct to focus its efforts on addressing these challenges which can further facilitate an enterprise’s efforts to glean value from streaming data.”
HDF 3.0 will also be newly available for IBM Power Systems to support a broad range of streaming analytics applications on servers designed for data intensive workloads like big data and cognitive analytics. The combination of HDF and Power Systems delivers high performance and efficiency for streaming analytics and makes it easier to manage data-in-motion workloads.

The HDF builds analytics applications with drag and drop visual paradigm with drop down analytics functions; analytics with rich visual dashboard and an analytics engine powered by Druid; and will operate with prebuilt monitoring dashboards of system metrics. It also manages data flows with schema repository that eliminates the need to code and attach schema to every piece of data and reduce operational overhead; allows data consumers and producers to evolve at different rates with schema version management; and storage of schema for any type of entity or data store, not just Kafka.

“IBM is thrilled to expand our collaboration with Hortonworks to help clients accelerate data analytics for cognitive applications,” said Tim Vincent, vice president and IBM Fellow, Cognitive Systems Software. “HDF on Power Systems brings industry-leading system performance to the edge of the data platform to fuel our clients’ competitive advantage.”

Hortonworks also announced Monday general availability of Hortonworks DataFlow version 3.0.  In this release, Hortonworks has introduced two open source product modules – Streaming Analytics Manager and Schema Registry. Both modules enable users to design, develop, test, deploy and maintain streaming analytics applications with minimal special skills and training needed.

To realize the full potential of modern data applications, organizations need to capture both rich, historical insights from data at rest, and perishable insights from data in motion.  Currently, flow management tools are available to help gather, route, filter and transform data from any source.  But companies have lacked equivalent tools for building the analytics apps needed to extract insight from streaming data. Hortonworks has addressed this need with the release of Streaming Analytics Manager and Schema Registry.

Using Streaming Analytics Manager, users can write complex streaming analytics apps without writing a single line of code. Eliminating the need for specialized skillsets, Streaming Analytics Manager provides a graphic programming paradigm with a drag-and-drop interface to build streaming apps for event correlation, context enrichment, and complex pattern matching.  In addition, analytical aggregations with automated alerts become available when insights are discovered.

Using this solution, Hortonworks customers will get an experience for building streaming analytics applications that they already enjoy building flow management applications.  Furthermore, Streaming Analytics Manager brings these apps to market considerably faster, at a lower cost, to accelerate time to value and strategic impact. Streaming Analytics Manager also provides tools to meet the needs of three big data personas: developers, business analysts and IT operations teams.

Schema Registry improves end-to-end data governance and operational efficiency by providing a centralized registry, supporting version management and enabling schema validation.

There are three benefits for this module including centralized registry that includes shared repository of schemas eliminates the need to attach schema to every piece of data. Applications can flexibly interact with each other in order to save or retrieve schemas for the data they need to access. Fully integrated with the flow management component of HDF, including Apache NiFi, Schema Registry allows schemas created using Apache NiFi to be easily managed and reused by the entire platform.

Version management defines relationships between schemas and enables schemas to be shared between HDF components and applications.  Schema Registry supports schema evolution so that a consumer and producer can understand different schema versions but still read all the information shared between them, and schema validation includes schema registry that supports schema validation by enabling generic format conversion and generic routing to ensure data quality.

Leave a Reply

WWPI – Covering the best in IT since 1980