本文不讨论框架实现原理以及源码分析,只做功能使用案例说明
数据分片:
表分片可以帮助评论应用程序更有效地管理其不断增长的评论表,提高性能和可扩展性,同时还使备份和维护任务更易于管理
 Apache ShardingSphere 有两种形式:
 Apache ShardingSphere 有两种形式:
 
 
- ShardingSphere-JDBC是一个轻量级的Java框架,在Java的JDBC层提供额外的服务。
- ShardingSphere-Proxy是一个透明的数据库代理,提供了一个数据库服务器,封装了数据库二进制协议来支持异构语言。
本文主要针对ShardingSphere-JDBC 的数据分片。
依赖项:
org.apache.shardingsphere:shardingsphere-jdbc-core:5.3.2
org.apache.shardingsphere:shardingsphere-cluster-mode-core:5.3.2
org.apache.shardingsphere:shardingsphere-cluster-mode-repository-zookeeper:5.3.2
org.apache.shardingsphere:shardingsphere-cluster-mode-repository-api:5.3.2建议使用ShardingSphere的版本是5.X版本,最好是非spring boot 的 starter版本,这样会更加灵活
配置项:
ShardingSphere-JDBC配置主要有两种方式:YAML配置和Java配置。本文选择了YAML配置方式
application.yaml
spring:
  datasource:
    username: my_user
    password: my_password
    url: jdbc:mysql://localhost:3306/reviews-db?allowPublicKeyRetrieval=true&useSSL=false
    tomcat:
      validation-query: "SELECT 1"
      test-while-idle: true
  jpa:
    properties:
      hibernate:
        dialect: org.hibernate.dialect.MySQL8Dialect
    open-in-view: false
    hibernate:
      ddl-auto: none我们指定用于数据源的驱动程序将是ShardingSphereDriver并且url应该根据此文件选择sharding.yaml
dataSources:
  master:
    dataSourceClassName: com.zaxxer.hikari.HikariDataSource
    driverClassName: com.mysql.jdbc.Driver
    jdbcUrl: jdbc:mysql://localhost:3306/reviews-db?allowPublicKeyRetrieval=true&useSSL=false
    username: my_user
    password: my_password
    connectionTimeoutMilliseconds: 30000
    idleTimeoutMilliseconds: 60000
    maxLifetimeMilliseconds: 1800000
    maxPoolSize: 65
    minPoolSize: 1
mode:
  type: Standalone
  repository:
    type: JDBC
rules:
  - !SHARDING
    tables:
      reviews:
        actualDataNodes: master.reviews_$->{0..1}
        tableStrategy:
          standard:
            shardingColumn: course_id
            shardingAlgorithmName: inline
    shardingAlgorithms:
      inline:
        type: INLINE
        props:
          algorithm-expression: reviews_$->{course_id % 2}
          allow-range-query-with-inline-sharding: true
props:
  proxy-hint-enabled: true
  sql-show: true现在让我们分析配置中重要的属性:
- dataSources.master– 这是我们主数据源的定义。
- mode– 既可以是standalone with- JDBCtype,也可以是cluster with- Zookeepertype(推荐用于生产),用于配置信息持久化
- rules– 在这里,我们可以启用各种ShardingSphere功能,例如 –- !SHARDING
- tables.reviews– 在这里,我们根据- inline语法规则描述实际的表,这意味着我们将有两个表- reviews_0并按- reviews_1列分片- course_id。
- shardingAlgorithms– 在这里,我们通过一个 groovy 表达式来描述手动内联分片算法,该表达式告诉评论表根据列分为两个表- course_id。
- props– 在这里,我们启用了拦截/格式化 sql 查询(可以禁用/注释 p6spy)。
重要提示: 在开始我们的应用程序之前,我们需要确保创建了我们定义的分片,因此我在我的数据库中创建了两个表:
reviews_0和reviews_1(init.sql)。
请求调试:
现在我们准备启动我们的应用程序并执行一些请求
POST http://localhost:8070/api/v1/reviews/
Content-Type: application/json
{
  "text": "This is a great course!",
  "author": "John Doe",
  "authorTelephone": "555-1234",
  "authorEmail": "johndoe@example.com",
  "invoiceCode": "ABC123",
  "courseId": 123
}我们可以看到如下日志:
INFO 35412 --- [nio-8070-exec-2] ShardingSphere-SQL: Actual SQL: master ::: insert into reviews_1 (created_at, last_modified_at, author, author_email, author_telephone, course_id, invoice_code, text, id) values (?, ?, ?, ?, ?, ?, ?, ?, ?) ::: [2023-04-17 15:42:01.8069745, 2023-04-17 15:42:01.8069745, John Doe, johndoe@example.com, 555-1234, 123, ABC123, This is a great course!, 4]如果我们要使用不同的负载执行另一个请求:
INFO 35412 --- [nio-8070-exec-8] ShardingSphere-SQL: Actual SQL: master ::: insert into reviews_1 (created_at, last_modified_at, author, author_email, author_telephone, course_id, invoice_code, text, id) values (?, ?, ?, ?, ?, ?, ?, ?, ?) ::: [2023-04-17 15:43:47.3267788, 2023-04-17 15:43:47.3267788, Mike Scott, mikescott@example.com, 555-1234, 123, ABC123, This is an amazing course!, 5]现在我们可以根据course_id
GET http://localhost:8070/api/v1/reviews/filter?courseId=123
GET http://localhost:8070/api/v1/reviews/filter?courseId=124并在日志中观察我们两个表之间的路由是如何发生的。
INFO 35412 --- [nio-8070-exec-9] ShardingSphere-SQL: Actual SQL: master ::: select review0_.id as id1_0_, review0_.created_at as created_2_0_, review0_.last_modified_at as last_mod3_0_, review0_.author as author4_0_, review0_.author_email as author_e5_0_, review0_.author_telephone as author_t6_0_, review0_.course_id as course_i7_0_, review0_.invoice_code as invoice_8_0_, review0_.text as text9_0_ from reviews_1 review0_ where review0_.course_id=? ::: [123]
INFO 35412 --- [nio-8070-exec-5] ShardingSphere-SQL: Actual SQL: master ::: select review0_.id as id1_0_, review0_.created_at as created_2_0_, review0_.last_modified_at as last_mod3_0_, review0_.author as author4_0_, review0_.author_email as author_e5_0_, review0_.author_telephone as author_t6_0_, review0_.course_id as course_i7_0_, review0_.invoice_code as invoice_8_0_, review0_.text as text9_0_ from reviews_0 review0_ where review0_.course_id=? ::: [124]第一个select针对reviews_1表,第二个针对reviews_0-正在运行的分片
分片进阶操作:
默认分片策略是对配置文件中的shardingColumn进行algorithm-expression配置规则运算,如果有些定制化的场景需求,那么也可以自己实现分片计算逻辑
sharding.jdbc.config.sharding.tables.reviews.actual-data-nodes= master.reviews_$->{0..1}
sharding.jdbc.config.sharding.tables.reviews.table-strategy.standard.sharding-column=course_id
sharding.jdbc.config.sharding.tables.reviews.table-strategy.standard.precise-algorithm-class-name=com.demo.shardingjdbc.PreciseShardingDBAlgorithm
sharding.jdbc.config.sharding.tables.reviews.table-strategy.standard.range-algorithm-class-name=com.demo.shardingjdbc.RangeShardingDBAlgorithm自定义精确匹配策略实现:
主要用于where、in
public class PreciseShardingDBAlgorithm implements PreciseShardingAlgorithm<String> {
   @Override
   public String doSharding(Collection<String> availableTargetNames, PreciseShardingValue<String> preciseShardingValue) {
        
    }
}自定义范围匹配的策略实现:
public class PreciseShardingDBAlgorithm implements RangeShardingAlgorithm<String> {
   @Override
   public String doSharding(Collection<String> availableTargetNames, PreciseShardingValue<String> preciseShardingValue) {
        
    }
}读写分离:
现在让我们想象另一个问题,评论应用时间可能会在高峰时段承受高压力,从而导致响应时间变慢并降低用户体验。针对这个问题,我们可以实现读写分离来平衡负载,提高性能。
ShardingSphere 为我们提供了读写分离的 解决方案。读写分离涉及将读取查询定向到副本数据库,将写入查询定向到主数据库,确保读取请求不会干扰写入请求并优化数据库性能。
在配置读写分离解决方案之前,我们必须对数据库架构进行一定变更(主从模式)
读写分离数据源配置:
dataSources:
  master:
    dataSourceClassName: com.zaxxer.hikari.HikariDataSource
    driverClassName: com.mysql.jdbc.Driver
    jdbcUrl: jdbc:mysql://localhost:3306/reviews-db?allowPublicKeyRetrieval=true&useSSL=false
    username: my_user
    password: my_password
    connectionTimeoutMilliseconds: 30000
    idleTimeoutMilliseconds: 60000
    maxLifetimeMilliseconds: 1800000
    maxPoolSize: 65
    minPoolSize: 1
  slave0:
    dataSourceClassName: com.zaxxer.hikari.HikariDataSource
    driverClassName: com.mysql.jdbc.Driver
    jdbcUrl: jdbc:mysql://localhost:49922/reviews-db?allowPublicKeyRetrieval=true&useSSL=false
    username: my_user
    password: my_password
    connectionTimeoutMilliseconds: 30000
    idleTimeoutMilliseconds: 60000
    maxLifetimeMilliseconds: 1800000
    maxPoolSize: 65
    minPoolSize: 1
  slave1:
    dataSourceClassName: com.zaxxer.hikari.HikariDataSource
    driverClassName: com.mysql.jdbc.Driver
    jdbcUrl: jdbc:mysql://localhost:49923/reviews-db?allowPublicKeyRetrieval=true&useSSL=false
    username: my_user
    password: my_password
    connectionTimeoutMilliseconds: 30000
    idleTimeoutMilliseconds: 60000
    maxLifetimeMilliseconds: 1800000
    maxPoolSize: 65
    minPoolSize: 1读写分离规则:
- !READWRITE_SPLITTING
  dataSources:
    readwrite_ds:
      staticStrategy:
        writeDataSourceName: master
        readDataSourceNames:
          - slave0
          - slave1
      loadBalancerName: readwrite-load-balancer
  loadBalancers:
    readwrite-load-balancer:
      type: ROUND_ROBIN我们指定写入数据源名称为master读取数据源指向我们的 slaves:slave0and slave1; 我们选择了一种round-robin 负载均衡器算法。
重要提示: 最后要进行的更改是关于分片规则,它对新配置的读写分离规则一无所知并直接指向 master:
分片数据源变更:
sharding.jdbc.config.sharding.tables.reviews.actual-data-nodes=readwrite_ds.reviews_$->{0..1}我们可以启动我们的应用程序,运行相同的 POST 请求并观察日志:
INFO 22860 --- [nio-8070-exec-1] ShardingSphere-SQL: Actual SQL: master ::: insert into reviews_0 (created_at, last_modified_at, author, author_email, author_telephone, course_id, invoice_code, text, id) values (?, ?, ?, ?, ?, ?, ?, ?, ?) ::: [2023-04-17 16:12:07.25473, 2023-04-17 16:12:07.25473, Mike Scott, mikescott@example.com, 555-1234, 124, ABC123, This is an amazing course!, 7]这里分片仍然有效,并且查询发生在master数据源(写入数据源)中。但是如果我们要运行几个 GET 请求,我们将观察到以下内容:
INFO 22860 --- [nio-8070-exec-2] ShardingSphere-SQL: Actual SQL: slave0 ::: select review0_.id as id1_0_, review0_.created_at as created_2_0_, review0_.last_modified_at as last_mod3_0_, review0_.author as author4_0_, review0_.author_email as author_e5_0_, review0_.author_telephone as author_t6_0_, review0_.course_id as course_i7_0_, review0_.invoice_code as invoice_8_0_, review0_.text as text9_0_ from reviews_0 review0_ where review0_.course_id=? ::: [124]
INFO 22860 --- [nio-8070-exec-4] ShardingSphere-SQL: Actual SQL: slave1 ::: select review0_.id as id1_0_, review0_.created_at as created_2_0_, review0_.last_modified_at as last_mod3_0_, review0_.author as author4_0_, review0_.author_email as author_e5_0_, review0_.author_telephone as author_t6_0_, review0_.course_id as course_i7_0_, review0_.invoice_code as invoice_8_0_, review0_.text as text9_0_ from reviews_0 review0_ where review0_.course_id=? ::: [124]您可以观察读写分离的运行情况;我们的 写入 查询发生在master数据源中,但我们的读取查询发生在主副本(slave0和slave1)中,同时保持正确的分片规则。
数据屏蔽:
关于我们的应用程序的另一个假想问题。想象一下,由于数据隐私法规,某些用户或应用程序可能需要访问客户电子邮件、电话号码和发票代码等敏感信息,同时对其他人保持隐藏状态。
为了解决这个问题,我们可以实施数据屏蔽解决方案,在映射结果时或在 SQL 级别屏蔽敏感数据。ShardingSphere在这里通过另一个易于启用的功能来解决——数据屏蔽。
配置变更:
- !MASK
  tables:
    reviews:
      columns:
        invoice_code:
          maskAlgorithm: md5_mask
        author_email:
          maskAlgorithm: mask_before_special_chars_mask
        author_telephone:
          maskAlgorithm: keep_first_n_last_m_mask
  maskAlgorithms:
    md5_mask:
      type: MD5
    mask_before_special_chars_mask:
      type: MASK_BEFORE_SPECIAL_CHARS
      props:
        special-chars: '@'
        replace-char: '*'
    keep_first_n_last_m_mask:
      type: KEEP_FIRST_N_LAST_M
      props:
        first-n: 3
        last-m: 2
        replace-char: '*'让我们看看这里有什么:
- table.reviews– 我们为前面提到的每一列定义了三种掩码 算法
- maskAlgorithms.md5_mask– 我们- MD5为 invoice_code 指定了算法类型
- maskAlgorithms.mask_before_special_chars_mask– 我们- MASK_BEFORE_SPECIAL_CHARS为列配置了算法,这意味着@- author_email符号之前的所有字符都将替换为*符号。
- maskAlgorithms.keep_first_n_last_m_mask– 我们- KEEP_FIRST_N_LAST_M为- author_telephone列配置了算法,这意味着只有电话号码的前 3 个和后 2 个字符保持不变;介于两者之间的所有内容都将被* 符号掩盖。
我们启动我们的应用程序并执行相同的 POST 请求
INFO 35296 --- [nio-8070-exec-1] ShardingSphere-SQL: Actual SQL: master ::: insert into reviews_0 (created_at, last_modified_at, author, author_email, author_telephone, course_id, invoice_code, text, id) values (?, ?, ?, ?, ?, ?, ?, ?, ?) ::: [2023-04-17 16:26:51.8188306, 2023-04-17 16:26:51.8188306, Mike Scott, mikescott@example.com, 555-1234, 124, ABC123, This is an amazing course!, 9][
  {
    "text": "This is an amazing course!",
    "author": "Mike Scott",
    "authorTelephone": "555***34",
    "authorEmail": "*********@example.com",
    "invoiceCode": "bbf2dead374654cbb32a917afd236656",
    "courseId": 124,
    "id": 9,
    "lastModifiedAt": "2023-04-17T15:44:43"
  },
]数据在数据库中保持不变,但在查询和传递时,根据我们在数据屏蔽规则中定义的算法屏蔽了电话、电子邮件和发票代码



















