阿里云服务器无法使用SSH连接,网站访问也出现异常,登录阿里云平台,系统提示:系统出现了内核Panic,OOM异常或内部宕机、性能抖动。后台询问了阿里云客服,说需要安装和开kdump 服务,于是开始了kdump的学习。
kdump概念:
 当系统崩溃时,kdump 使用 kexec 启动到第二个内核,第二个内核通常叫做捕获内核,以很小内存启动以捕获转储镜像。第一个内核保留了内存的一部分给第二内核启动用。由于 kdump 利用 kexec 启动捕获内核,绕过了 BIOS,所以第一个内核的内存得以保留。这是内核崩溃转储的本质。
kdump正常运行的条件:
 1. 系统中开启kdump服务
 2. 启动文件配置中,合理分配了崩溃内存容量
CentOS7: 检查系统中kdump状态的方法:
 systemctl status kdump.service
centos7 默认已安装kdump:
yum install kernel-debuginfo kexec-tools crash
yum install kexec-tools
设置crashkernel预留内存大小,修改/etc/default/grub文件
找到GRUB_CMDLINE_LINUX配置项,修改crashkernel的值,默认auto,须根据服务器内存大小合理设置crashkernel的值,如果系统的内存 <= 8 GB 对kdump kernel不会保留任何内容(等同于关闭kdump),如果系统的内存> 8 GB但是<= 16 GB,crashkernel=auto会保留256M,等同于crashkernel=256M,如果系统内存> 16GB, crashkernel=auto会保留512M, 等同于crashkernel=512M.
3.需要重新生成grub配置文件,重启系统才能生效
grub2-mkconfig -o /boot/grub2/grub.cfg
 reboot
4.开启kdump服务:
systemctl start kdump.service //启动kdump
 systemctl enable kdump.service //设置开机启动
5.输入命令systemctl status kdump.service检查kdump服务时否开启

 输入命令 systemctl is-active kdump.service 
如果提示Starting kdump:[OK]则启动完成。
6.手动触发一下crash dump
echo 1 >/proc/sys/kernel/sysrq; echo c > /proc/sysrq-trigger
如果没有问题,系统会自动重启,重启后可以看到在/var/crash/目录下生成了coredump文件
打开crash来分析:
# crash vmcore /usr/lib/debug/lib/modules/3.10.0-957.1.3.el7.x86_64/vmlinux
  
  
 crash 7.2.3-8.el7
 Copyright (C) 2002-2017  Red Hat, Inc.
 Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
 Copyright (C) 1999-2006  Hewlett-Packard Co
 Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
 Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
 Copyright (C) 2005, 2011  NEC Corporation
 Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
 Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
 This program is free software, covered by the GNU General Public License,
 and you are welcome to change it and/or distribute copies of it under
 certain conditions.  Enter "help copying" to see the conditions.
 This program has absolutely no warranty.  Enter "help warranty" for details.
 GNU gdb (GDB) 7.6
 Copyright (C) 2013 Free Software Foundation, Inc.
 License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
 This is free software: you are free to change and redistribute it.
 There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
 and "show warranty" for details.
 This GDB was configured as "x86_64-unknown-linux-gnu"...
  
  
 WARNING: kernel relocated [126MB]: patching 85619 gdb minimal_symbol values
  
  
       KERNEL: /usr/lib/debug/lib/modules/3.10.0-957.1.3.el7.x86_64/vmlinux
     DUMPFILE: vmcore  [PARTIAL DUMP]
         CPUS: 4
         DATE: Fri Jun 18 05:32:32 2021
       UPTIME: 00:47:57
 LOAD AVERAGE: 0.00, 0.01, 0.05
        TASKS: 413
     NODENAME: localhost.localdomain
      RELEASE: 3.10.0-957.1.3.el7.x86_64
      VERSION: #1 SMP Thu Nov 29 14:49:43 UTC 2018
      MACHINE: x86_64  (3799 Mhz)
       MEMORY: 2 GB
        PANIC: "SysRq : Trigger a crash"
          PID: 12653
      COMMAND: "bash"
         TASK: ffffa1071aca8000  [THREAD_INFO: ffffa1074b32c000]
          CPU: 3
        STATE: TASK_RUNNING (SYSRQ)
  
  
 crash>



















