Talend launches Apache Beam-powered solution for self-service, big data preparation
Cloud and big data integration software company Talend announced Monday its initial Apache Beam-powered solution for self-service, big data preparation. Now a top-level Apache project, Apache Beam is a unified programming model for executing both batch and streaming data processing pipelines that are portable across a variety of runtime platforms.
Talend Data Preparation is a self-service solution to enable more employees to access, cleanse and analyze large data sets. Ultimately, the combination of Talend Data Preparation and Apache Beam is designed to help companies speed the time to insight by enabling more users to build data projects that can be run anywhere using the latest processing innovation.
Talend Data Preparation powered by Apache Beam was first introduced back in January, as part of the Winter ‘17 release of Talend’s integration platform and signals Talend’s continued commitment to this data processing technology. The Talend Data Fabric empowers IT to enable business users with access to corporate data lakes so they can expedite data preparation and cleansing activities.
Data Preparation enables anyone to access and cleanse data using browser-based, point-and-click tools. Smart guides and visual tools help anyone understand data attributes and quality status. Results can be exported into desktop tools such as Tableau and Excel, enabling immediate decision-making.
It also empowers decision makers without putting data at risk and undermining compliance. Talend Data Preparation ensures that role-based access and masking rules appropriately control the use of data. With one click, users can embed data preparation recipes back into batch, bulk, and master data management data integration scenarios. Talend takes advantage of the 900 connectors that support data sources and targets to deliver data preparation at enterprise scale.
Talend has been collaborating on the development of Apache Beam with Google and others since 2015, having made several contributions to the Beam community over the last two years. Moving forward, Apache Beam will become a key element of the Talend Data Fabric integration stack.
Talend data preparation capabilities allow customers to provide access any data source, whether it’s housed in Hadoop, the cloud or traditional databases—and share it across users and groups to encourage collaboration. It also utilizes a pre-configured data dictionary to auto-recognize the meaning of the raw data from the data lake, as well as augment the dictionary with their own vocabulary, such as product codes or names; and crowdsource new data definitions from open data and/or the Talend Community.
Data analysts can transform and cleanse data from the data lake, while data stewards can manage exceptions and enforce data policies, which accelerates time-to-insight. Datasets and preparations can be shared and automated to create a single source of trusted data.
“Modern businesses need better access to clean actionable data, in order to support real-time insight across their organization,” said Laurent Bride, chief technology officer, Talend. “However, given the current rate of technology innovation, IT leaders often worry the investments made today, too quickly become obsolete and an obstacle to advancement tomorrow. We believe Apache Beam represents the future because it mitigates the need to re-write applications as new innovations are introduced, systems are moved to the cloud, or integration styles need to be alternated. Talend’s use of Beam for Data Preparation will eventually allow customers to build their preparations once and run them anywhere, which is the ultimate in data agility.”