一、概念
架构:
- Client/Server
- Ensemble(集群,ZK服务器组),最小节点数为3
- ZK Leadera
- ZK Follower
ZooKeeper数据模型
- znode:用于存储数据,分为持久的(default)、临时的、顺序的
- stat(metadata)
Session:
- Client连接到Server的时候分配一个SessionID
- Client发送心跳保持Session
- Session结束时基于该Session创建的znode也会删除
Watches:
- 用于Client 监视 znode,主要用于接收ZooKeeper集合中的znone更改通知
二、部署
2.1.单节点部署
Docker镜像:https://github.com/31z4/zookeeper-docker/tree/master/3.8.0
启动命令:
# 主要是config、data和datalog三个要持久化
docker run -d --name=zookeeper -v /opt/zookeeper/data/:/data/ -v /opt/zookeeper/datalog/:/datalog/ -v /opt/zookeeper/conf/:/conf/ -p 2181:2181 zookeeper:3.8.0
默认的配置(docker-entrypoint.sh写入)
# cat /opt/zookeeper/conf/zoo.cfg
# 存放内存数据结构的snapshot
dataDir=/data
# wal日志存放目录
dataLogDir=/datalog
# zk 使用的基本时间单位,以毫秒为单位。用于做心跳时间的间隔,最小会话超时将是 tickTime 的两倍。
tickTime=2000 
# 集群初始化的时候,follower连接leader允许的心跳数(tickTime)
initLimit=5
# flower和leader之间的心跳过期时间
syncLimit=2
# 自动删除 客户端与服务端交互的时候产生的日志snapRetainCount以快照为限制,purgeInterval以时间为限制
# 自动删除的话,可能会影响性能?
autopurge.snapRetainCount=3
autopurge.purgeInterval=0
# 一个客户端的连接数限制
maxClientCnxns=60
standaloneEnabled=true
admin.enableServer=true
# 2888: peer之间通信的端口
# 3888: 选举新的leader使用的端口
# 2818:客户端连接的端口
server.1=localhost:2888:3888;2181
启动后,会在data目录下生成myid文件(zk节点的id)和数据文件(version-2/snapshot.0)
 使用Cli连接ZK
docker exec -it zookeeper /apache-zookeeper-3.8.0-bin/bin/zkCli.sh -server 127.0.0.1:2181
Connecting to localhost:2181
...
# 一些基本操作
ls /
create /zk_test my_data  # 创建znode, -s顺序znode, -e临时znode
get /zk_test 
set /zk_test junk
stat /zk_test  # 查看节点状态 
delete /zk_test
使用go sdk实现简单的CUR + Watch
package main
import (
    "fmt"
    "github.com/go-zookeeper/zk"
    "time"
)
// https://pkg.go.dev/github.com/go-zookeeper/zk#Conn.Create
func create(c *zk.Conn, path string, data []byte){
    // flags
    // 0:永久
    // zk.FlagEphemeral
    // zk.FlagSequence
    // 3:Ephemeral和Sequence
    var flags int32 = 0
    
    path, err := c.Create(path, data, flags, zk.WorldACL(zk.PermAll))
    if err != nil {
        fmt.Println("创建失败", err)
        return
    }
    fmt.Println("创建成功", path)
}
func get(c *zk.Conn, path string){
    
    data, state, err := c.Get(path)
    if err != nil {
        fmt.Println("获取失败")
        return
    }
    fmt.Println("data", string(data))
    fmt.Println("state", state)
    
}
func update(c *zk.Conn, path string, data []byte){
    _, sate, _ := c.Get(path)
    _, err := c.Set(path, data, sate.Version)
    if err != nil {
        fmt.Println("更新失败", err)
        return
    }
    fmt.Println("更新成功")
}
func main() {
    conn, _, err := zk.Connect([]string{"192.168.56.4"}, time.Second)
    if err != nil {
        panic(err)
    }
    defer conn.Close()
    
    path := "/test-path"
    data := []byte("test-data")
    
    create(conn, path, data)
    get(conn, path)
    
    newData := []byte("new-data")
    update(conn, path, newData)
    get(conn, path)
    
    exists, state, eventChannel, err := conn.ExistsW("/test-path")
    if err != nil {
        panic(err)
    }
    // 如果监听的path有修改,就会收到通知(只会触发一次)
    // 有空的时候实现一下循环监听
    go func() {
        e := <-eventChannel
        fmt.Println("========================")
        fmt.Println("path:", e.Path)
        fmt.Println("type:", e.Type.String())
        fmt.Println("state:", e.State.String())
        fmt.Println("========================")
    }()
    fmt.Println("exists", exists)
    fmt.Println("state", state)
    time.Sleep(100 * time.Second)
}
2.2.Kubernetes部署ZK集群
部署zk集群可以使用statefulset,也可以使用zk operator
 statefulset实现比较简单,能满足需求的情况下,可以不用引入zk operator
当前用statefulset实现,使用的镜像启动脚本:https://github.com/kow3ns/kubernetes-zookeeper/blob/master/docker/scripts/start-zookeeper
这个启动脚本最重要的通过hostname获取每个zk节点的myid,让statefulset可以支持zk集群。 然后就是把传入的参数转化成配置文件。镜像用的是ubuntu14,可以替换成我们用的统一基础镜像,后续的升级版本或者新增的配置可以修改Dockerfile和启动脚本
zookeeper@zk-0:/var/lib$ cat /opt/zookeeper/conf/zoo.cfg 
#This file was autogenerated DO NOT EDIT
clientPort=2181
dataDir=/var/lib/zookeeper/data
dataLogDir=/var/lib/zookeeper/data/log
tickTime=2000
initLimit=10
syncLimit=5
maxClientCnxns=60
minSessionTimeout=4000
maxSessionTimeout=40000
autopurge.snapRetainCount=3
autopurge.purgeInteval=12
server.1=zk-0.zk-hs.default.svc.cluster.local:2888:3888
server.2=zk-1.zk-hs.default.svc.cluster.local:2888:3888
server.3=zk-2.zk-hs.default.svc.cluster.local:2888:3888
k8s部署ZK集群实现高可用主要的点:
- 3副本Statefulset(3节点以上,奇数副本数)
- 挂载Volume,主要是给data和datalog使用
- 通过PodAntiAffinity让不同的Pod运行在不同的node之上,最好是不同AZ上的节点
- 配置PDB,至少要保证半数以上的节点存活,比如3节点就要保证2个节点存活
- 创建一个Headless-Service,用于zk peer之间互访
- 创建一个ClusterIP类型的Service,用于client访问zk集群
- 对于各个配置项的优化,目前暂不研究
# 用于zk集群peer之间通信的Headless Service,让zk peer通过dns的方式发现对方
apiVersion: v1
kind: Service
metadata:
  name: zk-hs
  labels:
    app: zk
spec:
  ports:
  - port: 2888
    name: server
  - port: 3888
    name: leader-election
  clusterIP: None
  selector:
    app: zk
---
# 用于客户端访问zk集群的ClusterIP Service
apiVersion: v1
kind: Service
metadata:
  name: zk-cs
  labels:
    app: zk
spec:
  ports:
  - port: 2181
    name: client
  selector:
    app: zk
---
# PDB,主动驱逐场景下保证最多只有1个zk节点不可用
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: zk-pdb
spec:
  selector:
    matchLabels:
      app: zk
  maxUnavailable: 1
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: zk
spec:
  selector:
    matchLabels:
      app: zk
  serviceName: zk-hs
  replicas: 3
  updateStrategy:
    type: RollingUpdate
  podManagementPolicy: OrderedReady
  template:
    metadata:
      labels:
        app: zk
    spec:
      # affinity:
      #   podAntiAffinity:
      #     requiredDuringSchedulingIgnoredDuringExecution:
      #       - labelSelector:
      #           matchExpressions:
      #             - key: "app"
      #               operator: In
      #               values:
      #               - zk
      #         topologyKey: "kubernetes.io/hostname"
      containers:
      - name: zk
        imagePullPolicy: IfNotPresent
        image: "guglecontainers/kubernetes-zookeeper:1.0-3.4.10"
        resources:
          limits:
            memory: "512M"
            cpu: 0.5
        #   requests:
        #     memory: "1Gi"
        #     cpu: "0.5"
        ports:
        - containerPort: 2181
          name: client
        - containerPort: 2888
          name: server
        - containerPort: 3888
          name: leader-election
        command:
        - sh
        - -c
        - "start-zookeeper \
          --servers=3 \
          --data_dir=/var/lib/zookeeper/data \
          --data_log_dir=/var/lib/zookeeper/data/log \
          --conf_dir=/opt/zookeeper/conf \
          --client_port=2181 \
          --election_port=3888 \
          --server_port=2888 \
          --tick_time=2000 \
          --init_limit=10 \
          --sync_limit=5 \
          --max_client_cnxns=60 \
          --snap_retain_count=3 \
          --purge_interval=12 \
          --max_session_timeout=40000 \
          --min_session_timeout=4000 \
          --log_level=INFO"
        readinessProbe:
          exec:
            command:
            - sh
            - -c
            - "zookeeper-ready 2181"
          initialDelaySeconds: 10
          timeoutSeconds: 5
        livenessProbe:
          exec:
            command:
            - sh
            - -c
            - "zookeeper-ready 2181"
          initialDelaySeconds: 10
          timeoutSeconds: 5
        volumeMounts:
        - name: datadir
          mountPath: /var/lib/zookeeper
      securityContext:
        runAsUser: 1000
        fsGroup: 1000
      volumes:
      - name: "datadir"
        emptyDir: {}
  # 挂载Volume
  # volumeClaimTemplates:
  # - metadata:
  #     name: datadir
  #   spec:
  #     accessModes: [ "ReadWriteOnce" ]
  #     resources:
  #       requests:
  #         storage: 10Gi

Refs
- https://zookeeper.apache.org/doc/r3.8.0/index.html
- http://www.dba.cn/book/zookeeper/ZOOKEEPERZhongWenShouCe/ZOOKEEPERGaiShu.html
- https://hub.docker.com/_/zookeeper
- https://kubernetes.io/zh/docs/tutorials/stateful-application/zookeeper/


















