Apache ShardingSphere 5.0.0: Kernel Optimizations

Having undergone almost two years of development and optimization, the Apache ShardingSphere 5.0.0 GA version was finally released. 

Compared with the 4.1.1 GA version, the new version's kernel is greatly improved: *first, the kernel optimizations are subject to the pluggable architecture allowing users to combine kernel features as they wish; secondly, aiming to improve SQL distributed query capabilities, Federation Execution Engine is created to satisfy users' needs in complex business scenarios. We also made optimizations at the API level to help users reduce costs.

After reading this article, you will understand some kernel-related changes we made and the feature differences between the new Apache ShardingSphere version and the previous one. You will also learn more about these optimizations and study how to use the 5.0.0 version in a practical scenario case that integrates data sharding, read/write splitting, as well as encryption and decryption.

Pluggable Architecture Kernel

Database Plus is known as the North-Star of Apache ShardingSphere's 5.0.0 GA version. Our mission is to build a criterion and ecosystem above multi-model databases, and provide users with additional functions. The three characteristics of Database Plus are called Link, Enhance, and Pluggable. 

Precisely, by connecting to heterogeneous databases, Apache ShardingSphere can provide multi-model databases with management services and enhanced features including but not limited to data sharding, data encryption & decryption, and distributed transactions. With its pluggable platform, the enhanced functions can be expanded indefinitely, or to put it another way, users can flexibly extend the features as they need. 

The emergence of the Database Plus concept indicates that ShardingSphere has evolved from a middleware into an innovative distributed database ecosystem. Since we set Database Plus as our new direction and there are many extension points in our pluggable system, having a pluggable kernel was a given. The figure below illustrates the new pluggable Kernel: 

All of Apache ShardingSphere Kernel' processes (i.e. metadata loader, SQL parser, SQL router, SQL rewriter, and SQL executor & result merger) provide extension points built on which Apache ShardingSphere implements default features such as data sharding, read/write splitting, encryption & decryption, shadow database stress testing, and high availability.

We can divide the extension points into two categories: feature-based extension points and technology-based extension points. 

Among the processes of Apache ShardingSphere Kernel, technical extension points include the extension points of the SQL Parser Engine and SQL Executor engine, while function extension points are the extension points of the Metadata Loading, SQL Router engine, SQL Rewriter Engine, and SQL Executor & Result Merger Engine.

The extension points of the SQL Parser Engine are SQL AST Analysis and SQL Tree Traversal. The SQL Parser Engine of Apache ShardingSphere built on these two extension points can parse and traverse many database dialects such as MySQL, PostgreSQL, Oracle, SQLServer, openGauss, and SQL92 by default. Users can also write code to parse more database dialects not currently supported by Apache ShardingSphere SQL Parser, or develop SQL Audit and other new features.

How about the SQL Execution Engine extension points? Its extension depends on different execution methods. Currently, Apache ShardingSphere's SQL Executor has a single-threaded execution engine and a multi-threaded execution engine. The single-threaded execution engine is used to execute transaction statements, while the multi-threaded one applies to scenarios that do not include transactions to improve SQL execution performance. In the future, we will provide more execution engines such as MPP Execution Engine that meets the requirements for SQL execution in distribution scenarios. 

Apache ShardingSphere provides function extension points for data sharding, read-write splitting, encryption & decryption, shadow database stress testing, and high availability. These features implement all or part of the function extension points to meet their needs. Meanwhile, within each of them, internal sub-level function extension points such as Sharding Strategy, Distributed ID Generator, and Load Balancing Algorithm are also provided. The following extension points are implemented in the Apache ShardingSphere kernel functions:

  • Data Sharding implements all the extension points of the metadata loader, SQL router, SQL rewriter and result merger. For the data sharding function, extension points such as sharding algorithm and distributed ID are provided.

  • Read/write SplittingK implements the function extension point SQL Router and for the function, the Load Balancing Algorithm extension point is provided.

  • Encryption & Decryption implements metadata Loader, SQL Rewriter, and Result Merger. Inside, the Encrypt and Decrypt Algorithm extension point is provided.

  • Shadow Database Stress Testing implements the extension point SQL Router. The sub-level extension point Shadow Algorithm is provided.

  • High Availability also implements the SQL Router extension point. 

Given the extension points, Apache ShardingSphere functions are truly scalable.  Multi-tenancy, SQL Audit, and other new features will be seamlessly added to the Apache ShardingSphere ecosystem via these extension points.

Additionally, a user can also leverage these extension points to develop custom features when development needs to quickly deploy a distributed database system. For a detailed description of the pluggable architecture's extension points, please refer to the developer manual.

We compare the 5.0.0 GA version's pluggable kernel with that of the 4.1.1 GA version and find some major differences (as shown in the table below):

First, the two versions have different product positioning. Version 5.0.0 GA is the milestone in Apache ShardingSphere's evolution from a database sharding middleware into a distributed database ecosystem where features can be easily integrated into the pluggable architecture. 

Second, the 4.1.1 GA version only supports basic functions, while the 5.0.0 GA version cares about the infrastructure and feature best practices. Users are even allowed to abandon some features and develop their custom functions on the kernel infrastructure. In terms of coupling, the kernel functions in the 5.0.0 GA version are isolated from each other, so they cannot perceive the existence of another feature, ensuring kernel stability to the greatest extent. Lastly, considering function combination, the 5.0.0 version places all features (e.g. data sharding, read/write splitting, shadow database stress testing, encryption & decryption, and high availability) at the same level, so users can combine features as they prefer. The 4.1.1 GA version imposes data sharding on other functions.

In summary, the enhanced 5.0.0 GA version's pluggable kernel allows users to freely combine functions to satisfy their business needs just as if they were building blocks. However, at the same time, adopting the new pluggable architecture also changes the ways we use kernel functions. In this article, we'd like to showcase some practical examples, and tell you in detail how to combine these functions in the 5.0.0 GA version.

Open Source Project Links:

ShardingSphere Twitter:https://twitter.com/ShardingSphere