在国产OpenEuler 24.03上,手把手教你搭建Hadoop 3.3.4三节点集群(含一键管理脚本)
在国产OpenEuler 24.03上构建高可用Hadoop 3.3.4集群自动化部署与智能运维实战当企业级大数据平台遇上国产操作系统会碰撞出怎样的火花OpenEuler作为国产Linux发行版的领军者其24.03 LTS版本在稳定性与安全性上的突破使其成为构建自主可控大数据基础设施的理想选择。本文将带您深入探索如何在OpenEuler 24.03上部署生产级Hadoop 3.3.4集群并通过创新的自动化运维方案实现从手工操作到智能管理的质变飞跃。1. 环境规划与系统调优1.1 硬件资源配置策略构建高可用Hadoop集群的首要步骤是科学规划硬件资源。我们建议采用以下配置方案节点类型CPU核心数内存容量磁盘配置网络带宽Master节点8核32GB500GB SSD系统盘10Gbps2TB HDFS数据盘×2Worker节点16核64GB500GB SSD系统盘10Gbps4TB HDFS数据盘×4提示在OpenEuler环境下建议使用XFS文件系统格式化的数据盘其在大文件处理性能上优于ext41.2 OpenEuler系统深度优化OpenEuler 24.03作为专为企业级场景设计的操作系统需要进行针对性调优# 禁用不必要的服务以node1为例 sudo systemctl disable firewalld sudo systemctl stop firewalld sudo systemctl disable NetworkManager-wait-online.service # 优化内核参数 echo vm.swappiness 10 | sudo tee -a /etc/sysctl.conf echo net.ipv6.conf.all.disable_ipv6 1 | sudo tee -a /etc/sysctl.conf echo net.core.somaxconn 32768 | sudo tee -a /etc/sysctl.conf sudo sysctl -p # 调整文件描述符限制 echo hadoop - nofile 65536 | sudo tee -a /etc/security/limits.conf echo hadoop - nproc 32768 | sudo tee -a /etc/security/limits.conf1.3 集群拓扑设计与服务分布我们采用三节点黄金架构实现服务高可用与负载均衡节点规划 - node1 (10.90.100.101): • NameNode • JournalNode • ZKFC • DataNode • NodeManager - node2 (10.90.100.102): • ResourceManager • NameNode (Standby) • JournalNode • ZKFC • DataNode • NodeManager - node3 (10.90.100.103): • JournalNode • DataNode • NodeManager • HistoryServer这种设计确保了关键服务(NameNode, ResourceManager)的HA特性同时通过JournalNode实现元数据同步避免单点故障。2. 基础环境自动化配置2.1 智能主机初始化脚本传统的手工配置方式效率低下且容易出错我们开发了全自动初始化脚本init_env.sh#!/bin/bash # 定义集群节点IP和主机名 declare -A NODES( [node1]10.90.100.101 [node2]10.90.100.102 [node3]10.90.100.103 ) # 基础环境配置函数 function init_node { # 设置主机名 hostnamectl set-hostname $1 echo $2 $1 /etc/hosts # 配置静态IP cat /etc/sysconfig/network-scripts/ifcfg-ens33 EOF TYPEEthernet BOOTPROTOstatic NAMEens33 DEVICEens33 ONBOOTyes IPADDR$2 NETMASK255.255.255.0 GATEWAY10.90.100.1 DNS1114.114.114.114 EOF # 创建Hadoop专用用户 useradd -m -s /bin/bash hadoop echo hadoop ALL(ALL) NOPASSWD:ALL /etc/sudoers # 安装基础工具 dnf install -y rsync net-tools lrzsz telnet vim } # 主执行逻辑 current_node$(hostname -s) init_node $current_node ${NODES[$current_node]} systemctl restart network2.2 免密登录的进阶实现传统的ssh-copy-id方式在集群规模大时效率低下我们采用更高效的密钥分发方案# 在node1上生成密钥对 su - hadoop ssh-keygen -t rsa -P -f ~/.ssh/id_rsa # 编写分布式密钥分发脚本 cat ~/distribute_key.sh EOF #!/bin/bash for node in node1 node2 node3; do ssh $node mkdir -p ~/.ssh chmod 700 ~/.ssh scp ~/.ssh/id_rsa.pub hadoop$node:~/.ssh/authorized_keys ssh $node chmod 600 ~/.ssh/authorized_keys done EOF # 执行分发 chmod x ~/distribute_key.sh ./distribute_key.sh2.3 增强版文件分发系统基于rsync的智能分发脚本xsync增加了断点续传和增量同步功能#!/bin/bash # 参数校验 if [ $# -lt 1 ]; then echo Usage: $0 file_or_dir [nodes] exit 1 fi # 默认集群节点 NODES${2:-node1 node2 node3} # 获取绝对路径 FILE$(readlink -f $1) DIR$(dirname $FILE) BASE$(basename $FILE) # 多节点并行分发 for host in $NODES; do echo Syncing to $host rsync -avz --partial --progress --delete \ -e ssh -o StrictHostKeyCheckingno \ $FILE hadoop$host:$DIR/ done wait echo All files synchronized successfully3. Hadoop集群高级部署3.1 JDK与Hadoop自动化安装我们采用源码编译方式获取针对OpenEuler优化的JDK和Hadoop# JDK11自动安装脚本 JDK_URLhttps://repo.huaweicloud.com/openjdk/11.0.2/openjdk-11.0.2_linux-x64_bin.tar.gz wget -O /tmp/jdk11.tar.gz $JDK_URL tar -xzf /tmp/jdk11.tar.gz -C /opt ln -s /opt/jdk-11.0.2 /opt/jdk # 配置环境变量 cat /etc/profile EOF export JAVA_HOME/opt/jdk export PATH\$JAVA_HOME/bin:\$PATH EOF source /etc/profile # Hadoop编译安装针对OpenEuler优化 HADOOP_SRC_URLhttps://archive.apache.org/dist/hadoop/core/hadoop-3.3.4/hadoop-3.3.4-src.tar.gz wget -O /tmp/hadoop-src.tar.gz $HADOOP_SRC_URL tar -xzf /tmp/hadoop-src.tar.gz -C /opt cd /opt/hadoop-3.3.4-src # 安装编译依赖 dnf install -y gcc-c cmake autoconf automake libtool zlib-devel openssl-devel # 开始编译 mvn package -Pdist,native -DskipTests -Dtar3.2 高可用配置模板Hadoop高可用配置是生产环境的核心我们提供经过验证的配置模板core-site.xml关键配置configuration property namefs.defaultFS/name valuehdfs://mycluster/value /property property nameha.zookeeper.quorum/name valuenode1:2181,node2:2181,node3:2181/value /property property namehadoop.tmp.dir/name value/data/hadoop/tmp/value /property /configurationhdfs-site.xml高可用配置property namedfs.nameservices/name valuemycluster/value /property property namedfs.ha.namenodes.mycluster/name valuenn1,nn2/value /property property namedfs.namenode.rpc-address.mycluster.nn1/name valuenode1:8020/value /property property namedfs.namenode.http-address.mycluster.nn1/name valuenode1:9870/value /property property namedfs.namenode.shared.edits.dir/name valueqjournal://node1:8485;node2:8485;node3:8485/mycluster/value /property3.3 智能资源管理配置YARN资源管理需要根据实际硬件进行精细化调整!-- yarn-site.xml优化配置 -- property nameyarn.nodemanager.resource.memory-mb/name value57344/value !-- 56GB of 64GB -- /property property nameyarn.scheduler.maximum-allocation-mb/name value57344/value /property property nameyarn.nodemanager.resource.cpu-vcores/name value14/value !-- 16 cores - 2 for system -- /property property nameyarn.scheduler.maximum-allocation-vcores/name value14/value /property property nameyarn.nodemanager.vmem-check-enabled/name valuefalse/value /property4. 智能运维与监控体系4.1 全能集群管理脚本我们开发的hdp_manager.sh脚本集成了集群管理、日志收集、性能监控等多项功能#!/bin/bash # 定义颜色代码 RED\033[0;31m GREEN\033[0;32m NC\033[0m # No Color # 集群节点定义 NODES(node1 node2 node3) MASTER_NODEnode1 RESOURCE_NODEnode2 case $1 in start) echo -e ${GREEN}Starting Hadoop Cluster...${NC} # 启动Zookeeper集群 for node in ${NODES[]}; do ssh $node /opt/zookeeper/bin/zkServer.sh start done # 启动JournalNode for node in ${NODES[]}; do ssh $node /opt/hadoop/bin/hdfs --daemon start journalnode done # 格式化ZKFC并启动NameNode ssh $MASTER_NODE /opt/hadoop/bin/hdfs zkfc -formatZK ssh $MASTER_NODE /opt/hadoop/sbin/start-dfs.sh ssh $RESOURCE_NODE /opt/hadoop/sbin/start-yarn.sh ;; stop) echo -e ${RED}Stopping Hadoop Cluster...${NC} ssh $RESOURCE_NODE /opt/hadoop/sbin/stop-yarn.sh ssh $MASTER_NODE /opt/hadoop/sbin/stop-dfs.sh # 停止JournalNode for node in ${NODES[]}; do ssh $node /opt/hadoop/bin/hdfs --daemon stop journalnode done # 停止Zookeeper for node in ${NODES[]}; do ssh $node /opt/zookeeper/bin/zkServer.sh stop done ;; status) echo -e ${GREEN}Cluster Status:${NC} # 检查Zookeeper状态 for node in ${NODES[]}; do echo -n Zookeeper$node: ssh $node /opt/zookeeper/bin/zkServer.sh status | grep Mode done # 检查HDFS服务 echo -e \nHDFS Services: for node in ${NODES[]}; do echo $node: ssh $node jps | grep -E NameNode|DataNode|JournalNode|DFSZKFailoverController done # 检查YARN服务 echo -e \nYARN Services: for node in ${NODES[]}; do echo $node: ssh $node jps | grep -E ResourceManager|NodeManager done ;; *) echo Usage: $0 {start|stop|status} exit 1 ;; esac4.2 实时监控告警系统集成PrometheusGrafana的监控方案配置# 在所有节点安装Node Exporter for node in node1 node2 node3; do ssh $node wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz ssh $node tar xzf node_exporter-*.tar.gz cd node_exporter-* nohup ./node_exporter done # 配置Prometheus抓取规则 cat prometheus.yml EOF global: scrape_interval: 15s scrape_configs: - job_name: hadoop static_configs: - targets: [node1:9100, node2:9100, node3:9100] - job_name: hdfs static_configs: - targets: [node1:9870, node2:9870] - job_name: yarn static_configs: - targets: [node2:8088] EOF4.3 日志智能分析方案ELK日志收集系统的快速部署# 在日志分析节点(node3)上部署ELK sudo dnf install -y java-11-openjdk wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.17.0-linux-x86_64.tar.gz tar xzf elasticsearch-*.tar.gz cd elasticsearch-*/bin ./elasticsearch -d # 安装Logstash wget https://artifacts.elastic.co/downloads/logstash/logstash-7.17.0-linux-x86_64.tar.gz tar xzf logstash-*.tar.gz # 配置Hadoop日志收集 cat logstash-hadoop.conf EOF input { file { path /opt/hadoop/logs/*.log start_position beginning } } filter { grok { match { message %{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{DATA:component}: %{GREEDYDATA:message} } } } output { elasticsearch { hosts [localhost:9200] index hadoop-logs-%{YYYY.MM.dd} } } EOF # 启动Logstash nohup ./logstash -f logstash-hadoop.conf
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2439033.html
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!