Splice Machine’s OLAP engine adds columnar storage, in-memory caching to its data platform



Splice Machine, vendor of open-source SQL RDBMS powered by Apache Hadoop and Apache Spark, announced Tuesday at AWS re:Invent 2016 the release of version 2.5 of its data platform for applications. The new version strengthens its ability to concurrently run enterprise-scale transactional and analytical workloads, frequently referred to as HTAP (Hybrid Transactional and Analytical Processing).

With Splice Machine’s hybrid architecture, companies can simplify operational complexity, eliminate need for special coding skills, power concurrent applications and support machine learning.

Users can avoid managing separate systems, tuning them individually for performance, and writing low-level code and batch programs to keep them in sync. Developers can use a single industry-standard SQL and JDBC/ODBC interface to work with the system.

The ACID transaction implementation is designed for both analytical and operational workloads. This means that it supports high concurrency with even thousands of users or devices updating the system at the same time. Its MVCC, using snapshot isolation, can handle fine-grained updates without locking reads.

Modern applications adapt over time by continuously transforming operational data into aggregated features that train statistical machine learning models and deploy those models in real-time decision systems. Splice Machine enables the feature engineering, model selection, and deployment process to take place on one platform without significant data movement.

Version 2.5 of Splice Machine delivers Columnar External Tables that enable hybrid columnar and row-based querying. Columnar external tables can be created in Apache Parquet, Apache ORC or text formats. Columnar Storage improves large table scans, large joins, aggregations or groupings while the native row-based storage is used for write-optimized ingestion, single-record lookups/updates and short scans.

It also includes In-Memory Caching via Pinning gives the ability to move tables and columnar data files into memory for lightning-fast data access. It avoids multiple table scans or writes to high-latency file systems such as Amazon S3. The capability allows data to be stored on very inexpensive storage while being very performant in-memory when required in applications.

Its statistics via sketching helps solve the age-old problem that cost-based optimizers are only as good as their statistics, but most statistics are poor because statistics computation is expensive. Splice Machine utilizes the sketching library created by Yahoo! to provide very fast approximate analysis of Big Data statistics with bounded errors. Now with the power of sketches and histograms, the Splice Machine cost-based optimizer can choose indexes, join orders, and join algorithms with much more accuracy.

The offering also delivers cost-optimized storage for AWS users. Data can be stored locally in ephemeral storage, on EBS, S3 and EFS. Depending on the workload and longevity of data, different data can be stored in different storage systems with different price/performance characteristics.

“The new capabilities further emphasize the benefits of Splice Machine’s hybrid architecture,” said Monte Zweben, co-founder and CEO of Splice Machine. “For modern applications that need to combine fast data ingestion, web-scale transactional and analytical workloads, and continuous machine learning, one storage model does not fit all. The Splice Machine SQL RDBMS tightly integrates multiple compute engines, with in-memory and persistent storage in both row-based and columnar formats. The cost-based optimizer uses new advanced statistics to find the optimal execution strategy across all these resources for OLTP and OLAP workloads.”

 

Leave a Reply

WWPI – Covering the best in IT since 1980