Lework Study hard, improve every day.

在 debian 10 上以 kubeadm 方式安装 Kubernetes v1.20.5 ha集群

2021-04-03
lework
本文 42063 字,阅读全文约需 121 分钟

本次使用全手工的方式在 debian 10 系统上以 kubeadm 形式部署 kubernetesha集群,ha 方式选择 node 节点代理 apiserver 的方式。

k8s-node-ha

环境信息

System OS IP Address Docker Kernel Hostname Cpu Memory Role
Debian 10.9 192.168.77.140 20.10.5 4.19.0-16-amd64 k8s–master-node1 2C 4G master
Debian 10.9 192.168.77.141 20.10.5 4.19.0-16-amd64 k8s-master-node2 2C 4G master
Debian 10.9 192.168.77.142 20.10.5 4.19.0-16-amd64 k8s-master-node3 2C 4G master
Debian 10.9 192.168.77.143 20.10.5 4.19.0-16-amd64 k8s-worker-node1 2C 4G worker
Debian 10.9 192.168.77.144 20.10.5 4.19.0-16-amd64 k8s-worker-node2 2C 4G worker
Debian 10.9 192.168.77.144 20.10.5 4.19.0-16-amd64 k8s-worker-node3 2C 4G worker

版本信息

kubeadm: v1.20.5

Kubernetes: v1.20.5

etcd: v3.4.13

Docker CE: 20.10.5

Flannel : v0.13.0

网络信息

  • Cluster IP CIDR: 10.244.0.0/16
  • Service Cluster IP CIDR: 10.96.0.0/12
  • Service DNS IP: 10.96.0.10
  • DNS DN: cluster.local
  • Kubernetes API: apiserver.k8s.local:6443

初始化所有节点

在集群所有节点上执行下面的操作 注意:以下操作有些存在过度优化,请根据自身情况择选。

APT调整

镜像源调整

mv  /etc/apt/sources.list{,.bak}
cat > /etc/apt/sources.list <<EOF
deb http://mirrors.aliyun.com/debian/ buster main contrib non-free
deb-src http://mirrors.aliyun.com/debian/ buster main contrib non-free

deb http://mirrors.aliyun.com/debian/ buster-updates main contrib non-free
deb-src http://mirrors.aliyun.com/debian/ buster-updates main contrib non-free

deb http://mirrors.aliyun.com/debian-security/ buster/updates main contrib non-free
deb-src http://mirrors.aliyun.com/debian-security/ buster/updates main contrib non-free
EOF
apt-get update

取消安装服务自启动

echo -e '#!/bin/sh\nexit 101' | install -m 755 /dev/stdin /usr/sbin/policy-rc.d

取消自动更新包

systemctl mask apt-daily.service apt-daily-upgrade.service
systemctl stop apt-daily.timer apt-daily-upgrade.timer
systemctl disable apt-daily.timer apt-daily-upgrade.timer
systemctl kill --kill-who=all apt-daily.service

cat > /etc/apt/apt.conf.d/10cloudinit-disable << __EOF
APT::Periodic::Enable "0";
// undo what's in 20auto-upgrade
APT::Periodic::Update-Package-Lists "0";
APT::Periodic::Unattended-Upgrade "0";
__EOF

关闭防火墙

systemctl stop firewalld && systemctl disable firewalld
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X

关闭selinux

setenforce 0
sed -i "s#=enforcing#=disabled#g" /etc/selinux/config

关闭swap

swapoff -a && sysctl -w vm.swappiness=0
sed -ri '/^[^#]*swap/s@^@#@' /etc/fstab

limit 限制

[ ! -f /etc/security/limits.conf_bak ] && cp /etc/security/limits.conf{,_bak}
cat << EOF >> /etc/security/limits.conf
root soft nofile 655360
root hard nofile 655360
root soft nproc 655360
root hard nproc 655360
root soft core unlimited
root hard core unlimited

* soft nofile 655360
* hard nofile 655360
* soft nproc 655360
* hard nproc 655360
* soft core unlimited
* hard core unlimited
EOF

[ ! -f /etc/systemd/system.conf_bak ] && cp /etc/systemd/system.conf.conf{,_bak}
cat << EOF >> /etc/systemd/system.conf
DefaultLimitCORE=infinity
DefaultLimitNOFILE=655360
DefaultLimitNPROC=655360
EOF

系统参数

cat << EOF >  /etc/sysctl.d/99-kube.conf
# https://www.kernel.org/doc/Documentation/sysctl/
#############################################################################################
# 调整虚拟内存
#############################################################################################

# Default: 30
# 0 - 任何情况下都不使用swap。
# 1 - 除非内存不足(OOM),否则不使用swap。
vm.swappiness = 0

# 内存分配策略
#0 - 表示内核将检查是否有足够的可用内存供应用进程使用;如果有足够的可用内存,内存申请允许;否则,内存申请失败,并把错误返回给应用进程。
#1 - 表示内核允许分配所有的物理内存,而不管当前的内存状态如何。
#2 - 表示内核允许分配超过所有物理内存和交换空间总和的内存
vm.overcommit_memory=1

# OOM时处理
# 1关闭,等于0时,表示当内存耗尽时,内核会触发OOM killer杀掉最耗内存的进程。
vm.panic_on_oom=0

# vm.dirty_background_ratio 用于调整内核如何处理必须刷新到磁盘的脏页。
# Default value is 10.
# 该值是系统内存总量的百分比,在许多情况下将此值设置为5是合适的。
# 此设置不应设置为零。
vm.dirty_background_ratio = 5

# 内核强制同步操作将其刷新到磁盘之前允许的脏页总数
# 也可以通过更改 vm.dirty_ratio 的值(将其增加到默认值30以上(也占系统内存的百分比))来增加
# 推荐 vm.dirty_ratio 的值在60到80之间。
vm.dirty_ratio = 60

# vm.max_map_count 计算当前的内存映射文件数。
# mmap 限制(vm.max_map_count)的最小值是打开文件的ulimit数量(cat /proc/sys/fs/file-max)。
# 每128KB系统内存 map_count应该大约为1。 因此,在32GB系统上,max_map_count为262144。
# Default: 65530
vm.max_map_count = 2097152

#############################################################################################
# 调整文件
#############################################################################################

fs.may_detach_mounts = 1

# 增加文件句柄和inode缓存的大小,并限制核心转储。
fs.file-max = 2097152
fs.nr_open = 2097152
fs.suid_dumpable = 0

# 文件监控
fs.inotify.max_user_instances=8192
fs.inotify.max_user_watches=524288
fs.inotify.max_queued_events=16384

#############################################################################################
# 调整网络设置
#############################################################################################

# 为每个套接字的发送和接收缓冲区分配的默认内存量。
net.core.wmem_default = 25165824
net.core.rmem_default = 25165824

# 为每个套接字的发送和接收缓冲区分配的最大内存量。
net.core.wmem_max = 25165824
net.core.rmem_max = 25165824

# 除了套接字设置外,发送和接收缓冲区的大小
# 必须使用net.ipv4.tcp_wmem和net.ipv4.tcp_rmem参数分别设置TCP套接字。
# 使用三个以空格分隔的整数设置这些整数,分别指定最小,默认和最大大小。
# 最大大小不能大于使用net.core.wmem_max和net.core.rmem_max为所有套接字指定的值。
# 合理的设置是最小4KiB,默认64KiB和最大2MiB缓冲区。
net.ipv4.tcp_wmem = 20480 12582912 25165824
net.ipv4.tcp_rmem = 20480 12582912 25165824

# 增加最大可分配的总缓冲区空间
# 以页为单位(4096字节)进行度量
net.ipv4.tcp_mem = 65536 25165824 262144
net.ipv4.udp_mem = 65536 25165824 262144

# 为每个套接字的发送和接收缓冲区分配的最小内存量。
net.ipv4.udp_wmem_min = 16384
net.ipv4.udp_rmem_min = 16384

# 启用TCP窗口缩放,客户端可以更有效地传输数据,并允许在代理方缓冲该数据。
net.ipv4.tcp_window_scaling = 1

# 提高同时接受连接数。
net.ipv4.tcp_max_syn_backlog = 10240

# 将net.core.netdev_max_backlog的值增加到大于默认值1000
# 可以帮助突发网络流量,特别是在使用数千兆位网络连接速度时,
# 通过允许更多的数据包排队等待内核处理它们。
net.core.netdev_max_backlog = 65536

# 增加选项内存缓冲区的最大数量
net.core.optmem_max = 25165824

# 被动TCP连接的SYNACK次数。
net.ipv4.tcp_synack_retries = 2

# 允许的本地端口范围。
net.ipv4.ip_local_port_range = 2048 65535

# 防止TCP时间等待
# Default: net.ipv4.tcp_rfc1337 = 0
net.ipv4.tcp_rfc1337 = 1

# 减少tcp_fin_timeout连接的时间默认值
net.ipv4.tcp_fin_timeout = 15

# 积压套接字的最大数量。
# Default is 128.
net.core.somaxconn = 32768

# 打开syncookies以进行SYN洪水攻击保护。
net.ipv4.tcp_syncookies = 1

# 避免Smurf攻击
# 发送伪装的ICMP数据包,目的地址设为某个网络的广播地址,源地址设为要攻击的目的主机,
# 使所有收到此ICMP数据包的主机都将对目的主机发出一个回应,使被攻击主机在某一段时间内收到成千上万的数据包
net.ipv4.icmp_echo_ignore_broadcasts = 1

# 为icmp错误消息打开保护
net.ipv4.icmp_ignore_bogus_error_responses = 1

# 启用自动缩放窗口。
# 如果延迟证明合理,这将允许TCP缓冲区超过其通常的最大值64K。
net.ipv4.tcp_window_scaling = 1

# 打开并记录欺骗,源路由和重定向数据包
net.ipv4.conf.all.log_martians = 1
net.ipv4.conf.default.log_martians = 1

# 告诉内核有多少个未附加的TCP套接字维护用户文件句柄。 万一超过这个数字,
# 孤立的连接会立即重置,并显示警告。
# Default: net.ipv4.tcp_max_orphans = 65536
net.ipv4.tcp_max_orphans = 65536

# 不要在关闭连接时缓存指标
net.ipv4.tcp_no_metrics_save = 1

# 启用RFC1323中定义的时间戳记:
# Default: net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_timestamps = 1

# 启用选择确认。
# Default: net.ipv4.tcp_sack = 1
net.ipv4.tcp_sack = 1

# 增加 tcp-time-wait 存储桶池大小,以防止简单的DOS攻击。
# net.ipv4.tcp_tw_recycle 已从Linux 4.12中删除。请改用net.ipv4.tcp_tw_reuse。
net.ipv4.tcp_max_tw_buckets = 14400
net.ipv4.tcp_tw_reuse = 1

# accept_source_route 选项使网络接口接受设置了严格源路由(SSR)或松散源路由(LSR)选项的数据包。
# 以下设置将丢弃设置了SSR或LSR选项的数据包。
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.default.accept_source_route = 0

# 打开反向路径过滤
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.rp_filter = 1

# 禁用ICMP重定向接受
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.default.secure_redirects = 0

# 禁止发送所有IPv4 ICMP重定向数据包。
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0

# 开启IP转发.
net.ipv4.ip_forward = 1

# 禁止IPv6
net.ipv6.conf.lo.disable_ipv6=1
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1

# 要求iptables不对bridge的数据进行处理
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-arptables = 1

# arp缓存
# 存在于 ARP 高速缓存中的最少层数,如果少于这个数,垃圾收集器将不会运行。缺省值是 128
net.ipv4.neigh.default.gc_thresh1=2048
# 保存在 ARP 高速缓存中的最多的记录软限制。垃圾收集器在开始收集前,允许记录数超过这个数字 5 秒。缺省值是 512
net.ipv4.neigh.default.gc_thresh2=4096
# 保存在 ARP 高速缓存中的最多记录的硬限制,一旦高速缓存中的数目高于此,垃圾收集器将马上运行。缺省值是 1024
net.ipv4.neigh.default.gc_thresh3=8192

# 持久连接
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_keepalive_probes = 10

# conntrack表
net.nf_conntrack_max=1048576
net.netfilter.nf_conntrack_max=1048576
net.netfilter.nf_conntrack_buckets=262144
net.netfilter.nf_conntrack_tcp_timeout_fin_wait=30
net.netfilter.nf_conntrack_tcp_timeout_time_wait=30
net.netfilter.nf_conntrack_tcp_timeout_close_wait=15
net.netfilter.nf_conntrack_tcp_timeout_established=300

#############################################################################################
# 调整内核参数
#############################################################################################

# 地址空间布局随机化(ASLR)是一种用于操作系统的内存保护过程,可防止缓冲区溢出攻击。
# 这有助于确保与系统上正在运行的进程相关联的内存地址不可预测,
# 因此,与这些流程相关的缺陷或漏洞将更加难以利用。
# Accepted values: 0 = 关闭, 1 = 保守随机化, 2 = 完全随机化
kernel.randomize_va_space = 2

# 调高 PID 数量
kernel.pid_max = 65536
kernel.threads-max=30938

# coredump
kernel.core_pattern=core

# 决定了检测到soft lockup时是否自动panic,缺省值是0
kernel.softlockup_all_cpu_backtrace=1
kernel.softlockup_panic=1
EOF
sysctl --system

history 数据格式和 ps1

cat << EOF >> /etc/bash.bashrc
# history actions record,include action time, user, login ip
HISTFILESIZE=5000
HISTSIZE=5000
USER_IP=\$(who -u am i 2>/dev/null | awk '{print \$NF}' | sed -e 's/[()]//g')
if [ -z \$USER_IP ]
then
  USER_IP=\$(hostname -i)
fi
HISTTIMEFORMAT="%Y-%m-%d %H:%M:%S \$USER_IP:\$(whoami) "
export HISTFILESIZE HISTSIZE HISTTIMEFORMAT

# PS1
PS1='\[\033[0m\]\[\033[1;36m\][\u\[\033[0m\]@\[\033[1;32m\]\h\[\033[0m\] \[\033[1;31m\]\w\[\033[0m\]\[\033[1;36m\]]\[\033[33;1m\]\\$ \[\033[0m\]'
EOF

journal 日志

mkdir -p /var/log/journal /etc/systemd/journald.conf.d
cat << EOF > /etc/systemd/journald.conf.d/99-prophet.conf
[Journal]
# 持久化保存到磁盘
Storage=persistent
# 压缩历史日志
Compress=yes
SyncIntervalSec=5m
RateLimitInterval=30s
RateLimitBurst=1000
# 最大占用空间 10G
SystemMaxUse=10G
# 单日志文件最大 200M
SystemMaxFileSize=200M
# 日志保存时间 3 周
MaxRetentionSec=3week
# 不将日志转发到 syslog
ForwardToSyslog=no
EOF

ssh登录信息

cat << EOF > /etc/profile.d/zz-ssh-login-info.sh
#!/bin/sh
#
# @Time    : 2020-02-04
# @Author  : lework
# @Desc    : ssh login banner

export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
shopt -q login_shell && : || return 0
echo -e "\033[0;32m
 ██╗  ██╗ █████╗ ███████╗
 ██║ ██╔╝██╔══██╗██╔════╝
 █████╔╝ ╚█████╔╝███████╗
 ██╔═██╗ ██╔══██╗╚════██║
 ██║  ██╗╚█████╔╝███████║
 ╚═╝  ╚═╝ ╚════╝ ╚══════ by lework\033[0m"

# os
upSeconds="\$(cut -d. -f1 /proc/uptime)"
secs=\$((\${upSeconds}%60))
mins=\$((\${upSeconds}/60%60))
hours=\$((\${upSeconds}/3600%24))
days=\$((\${upSeconds}/86400))
UPTIME_INFO=\$(printf "%d days, %02dh %02dm %02ds" "\$days" "\$hours" "\$mins" "\$secs")

if [ -f /etc/redhat-release ] ; then
    PRETTY_NAME=\$(< /etc/redhat-release)

elif [ -f /etc/debian_version ]; then
   DIST_VER=\$(</etc/debian_version)
   PRETTY_NAME="\$(grep PRETTY_NAME /etc/os-release | sed -e 's/PRETTY_NAME=//g' -e  's/"//g') (\$DIST_VER)"

else
    PRETTY_NAME=\$(cat /etc/*-release | grep "PRETTY_NAME" | sed -e 's/PRETTY_NAME=//g' -e 's/"//g')
fi

if [[ -d "/system/app/" && -d "/system/priv-app" ]]; then
    model="\$(getprop ro.product.brand) \$(getprop ro.product.model)"

elif [[ -f /sys/devices/virtual/dmi/id/product_name ||
        -f /sys/devices/virtual/dmi/id/product_version ]]; then
    model="\$(< /sys/devices/virtual/dmi/id/product_name)"
    model+=" \$(< /sys/devices/virtual/dmi/id/product_version)"

elif [[ -f /sys/firmware/devicetree/base/model ]]; then
    model="\$(< /sys/firmware/devicetree/base/model)"

elif [[ -f /tmp/sysinfo/model ]]; then
    model="\$(< /tmp/sysinfo/model)"
fi

MODEL_INFO=\${model}
KERNEL=\$(uname -srmo)
USER_NUM=\$(who -u | wc -l)
RUNNING=\$(ps ax | wc -l | tr -d " ")

# disk
totaldisk=\$(df -h -x devtmpfs -x tmpfs -x debugfs -x aufs -x overlay --total 2>/dev/null | tail -1)
disktotal=\$(awk '{print \$2}' <<< "\${totaldisk}")
diskused=\$(awk '{print \$3}' <<< "\${totaldisk}")
diskusedper=\$(awk '{print \$5}' <<< "\${totaldisk}")
DISK_INFO="\033[0;33m\${diskused}\033[0m of \033[1;34m\${disktotal}\033[0m disk space used (\033[0;33m\${diskusedper}\033[0m)"

# cpu
cpu=\$(awk -F':' '/^model name/ {print \$2}' /proc/cpuinfo | uniq | sed -e 's/^[ \t]*//')
cpun=\$(grep -c '^processor' /proc/cpuinfo)
cpuc=\$(grep '^cpu cores' /proc/cpuinfo | tail -1 | awk '{print \$4}')
cpup=\$(grep '^physical id' /proc/cpuinfo | wc -l)
CPU_INFO="\${cpu} \${cpup}P \${cpuc}C \${cpun}L"

# get the load averages
read one five fifteen rest < /proc/loadavg
LOADAVG_INFO="\033[0;33m\${one}\033[0m / \${five} / \${fifteen} with \033[1;34m\$(( cpun*cpuc ))\033[0m core(s) at \033[1;34m\$(grep '^cpu MHz' /proc/cpuinfo | tail -1 | awk '{print \$4}')\033 MHz"

# mem
MEM_INFO="\$(cat /proc/meminfo | awk '/MemTotal:/{total=\$2/1024/1024;next} /MemAvailable:/{use=total-\$2/1024/1024; printf("\033[0;33m%.2fGiB\033[0m of \033[1;34m%.2fGiB\033[0m RAM used (\033[0;33m%.2f%%\033[0m)",use,total,(use/total)*100);}')"

# network
# extranet_ip=" and \$(curl -s ip.cip.cc)"
IP_INFO="\$(ip a | grep glo | awk '{print \$2}' | head -1 | cut -f1 -d/)\${extranet_ip:-}"

# Container info
CONTAINER_INFO="\$(sudo /usr/bin/crictl ps -a -o yaml 2> /dev/null | awk '/^  state: /{gsub("CONTAINER_", "", \$NF) ++S[\$NF]}END{for(m in S) printf "%s%s:%s ",substr(m,1,1),tolower(substr(m,2)),S[m]}')Images:\$(sudo /usr/bin/crictl images -q 2> /dev/null | wc -l)"

# info
echo -e "
 Information as of: \033[1;34m\$(date +"%Y-%m-%d %T")\033[0m
 
 \033[0;1;31mProduct\033[0m............: \${MODEL_INFO}
 \033[0;1;31mOS\033[0m.................: \${PRETTY_NAME}
 \033[0;1;31mKernel\033[0m.............: \${KERNEL}
 \033[0;1;31mCPU\033[0m................: \${CPU_INFO}

 \033[0;1;31mHostname\033[0m...........: \033[1;34m\$(hostname)\033[0m
 \033[0;1;31mIP Addresses\033[0m.......: \033[1;34m\${IP_INFO}\033[0m

 \033[0;1;31mUptime\033[0m.............: \033[0;33m\${UPTIME_INFO}\033[0m
 \033[0;1;31mMemory\033[0m.............: \${MEM_INFO}
 \033[0;1;31mLoad Averages\033[0m......: \${LOADAVG_INFO}
 \033[0;1;31mDisk Usage\033[0m.........: \${DISK_INFO} 

 \033[0;1;31mUsers online\033[0m.......: \033[1;34m\${USER_NUM}\033[0m
 \033[0;1;31mRunning Processes\033[0m..: \033[1;34m\${RUNNING}\033[0m
 \033[0;1;31mContainer Info\033[0m.....: \${CONTAINER_INFO}
"
EOF

chmod +x /etc/profile.d/zz-ssh-login-info.sh
echo 'ALL ALL=NOPASSWD: /usr/bin/crictl info' > /etc/sudoers.d/crictl

时间同步

ntpd --version > /dev/null 2>1 && apt-get remove -y ntp
apt-get install -y chrony
[ ! -f /etc/chrony.conf_bak ] && cp /etc/chrony.conf{,_bak} #备份默认配置
cat << EOF > /etc/chrony.conf
server ntp.aliyun.com iburst
server cn.ntp.org.cn iburst
server ntp.shu.edu.cn iburst
server 0.cn.pool.ntp.org iburst
server 1.cn.pool.ntp.org iburst
server 2.cn.pool.ntp.org iburst
server 3.cn.pool.ntp.org iburst

driftfile /var/lib/chrony/drift
makestep 1.0 3
logdir /var/log/chrony
EOF

timedatectl set-timezone Asia/Shanghai
chronyd -q -t 1 'server cn.pool.ntp.org iburst maxsamples 1'
systemctl enable chronyd
systemctl start chronyd
chronyc sources -v
chronyc sourcestats

启用ipvs

apt-get install -y ipvsadm ipset sysstat conntrack libseccomp2

开机自启动加载ipvs内核

:> /etc/modules-load.d/ipvs.conf
module=(
ip_vs
ip_vs_rr
ip_vs_wrr
ip_vs_sh
nf_conntrack
br_netfilter
)
for kernel_module in ${module[@]};do
    /sbin/modinfo -F filename $kernel_module |& grep -qv ERROR && echo $kernel_module >> /etc/modules-load.d/ipvs.conf || :
done
# systemctl enable --now systemd-modules-load.service

ipvsadm --clear

系统审计

apt-get install -y auditd audispd-plugins
cat << EOF > /etc/audit/rules.d/audit.rules

# Remove any existing rules
-D

# Buffer Size
-b 8192

# Failure Mode
-f 1

# Ignore errors
-i 

# docker
-w /usr/bin/dockerd -k docker
-w /var/lib/docker -k docker
-w /etc/docker -k docker
-w /usr/lib/systemd/system/docker.service -k docker
-w /etc/systemd/system/docker.service -k docker
-w /usr/lib/systemd/system/docker.socket -k docker
-w /etc/default/docker -k docker
-w /etc/sysconfig/docker -k docker
-w /etc/docker/daemon.json -k docker

# containerd
-w /usr/bin/containerd -k containerd
-w /var/lib/containerd -k containerd
-w /usr/lib/systemd/system/containerd.service -k containerd
-w /etc/containerd/config.toml -k containerd

# runc 
-w /usr/bin/runc -k runc

# kube
-w /usr/bin/kubeadm -k kubeadm
-w /usr/bin/kubelet -k kubelet
-w /usr/bin/kubectl -k kubectl
-w /var/lib/kubelet -k kubelet
-w /etc/kubernetes -k kubernetes
EOF
chmod 600 /etc/audit/rules.d/audit.rules
sed -i 's#max_log_file =.*#max_log_file = 80#g' /etc/audit/auditd.conf 

systemctl stop auditd && systemctl start auditd
systemctl enable auditd

dns 选项

grep single-request-reopen /etc/resolv.conf || sed -i '1ioptions timeout:2 attempts:3 rotate single-request-reopen' /etc/resolv.conf

升级内核

注意:这里是可选项。

debian 10 默认关闭了 cgroup hugetlb 。可以通过更新内核开启。

root@debian:~# cat /etc/debian_version
10.9
root@debian:~# uname -r
4.19.0-16-amd64
root@debian:~# grep HUGETLB /boot/config-$(uname -r)    
# CONFIG_CGROUP_HUGETLB is not set
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y

更新内核

# 添加 backports 源
$ sudo echo "deb   http://mirrors.aliyun.com/debian buster-backports main" > /etc/apt/sources.list.d/backports.list

# 更新来源
$ sudo apt update

# 安装 Linux 内核映像 
$ sudo apt -t buster-backports  install linux-image-amd64

# 安装 Linux 内核标头(可选)
$ sudo apt -t buster-backports install linux-headers-amd64

重启之后,再次查看 cgroup hugetlb

root@debian:~# cat /etc/debian_version 
10.9
root@debian:~# uname -r
5.10.0-0.bpo.4-amd64
root@debian:~# grep HUGETLB /boot/config-$(uname -r)                   
CONFIG_CGROUP_HUGETLB=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y

安装 docker-ce

apt-get install -y apt-transport-https ca-certificates curl gnupg2 lsb-release bash-completion
    
curl -fsSL https://mirrors.aliyun.com/docker-ce/linux/debian/gpg | sudo apt-key add -
echo "deb [arch=amd64] https://mirrors.aliyun.com/docker-ce/linux/debian $(lsb_release -cs)   stable" > /etc/apt/sources.list.d/docker-ce.list
sudo apt-get update
apt-get install -y docker-ce docker-ce-cli containerd.io
apt-mark hold docker-ce docker-ce-cli containerd.io

cp /usr/share/bash-completion/completions/docker /etc/bash_completion.d/
mkdir  /etc/docker
cat >> /etc/docker/daemon.json <<EOF
{
  "data-root": "/var/lib/docker",
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "200m",
    "max-file": "5"
  },
  "default-ulimits": {
    "nofile": {
      "Name": "nofile",
      "Hard": 655360,
      "Soft": 655360
    },
    "nproc": {
      "Name": "nproc",
      "Hard": 655360,
      "Soft": 655360
    }
  },
  "live-restore": true,
  "oom-score-adjust": -1000,
  "max-concurrent-downloads": 10,
  "max-concurrent-uploads": 10,
  "storage-driver": "overlay2",
  "storage-opts": ["overlay2.override_kernel_check=true"],
  "exec-opts": ["native.cgroupdriver=systemd"],
  "registry-mirrors": [
    "https://yssx4sxy.mirror.aliyuncs.com/"
  ]
}
EOF

systemctl enable --now docker

sed -i 's|#oom_score = 0|oom_score = -999|' /etc/containerd/config.toml
cat << EOF > /etc/crictl.yaml
runtime-endpoint: unix:///var/run/dockershim.sock
image-endpoint: unix:///var/run/dockershim.sock
timeout: 2
debug: false
pull-image-on-create: true
disable-pull-on-run: false
EOF

systemctl enable --now containerd

主机名配置

hostnamectl set-hostname k8s-master-node1

根据每个节点名称进行设置

节点主机名解析

cat << EOF >> /etc/hosts
192.168.77.140 k8s-master-node1
192.168.77.141 k8s-master-node2
192.168.77.142 k8s-master-node3
192.168.77.143 k8s-worker-node1
192.168.77.144 k8s-worker-node2
192.168.77.145 k8s-worker-node3
EOF

部署master节点

安装 kube 组件

在所有的 k8s-master 节点上安装 kubeadm , kubelet, kubectl

echo 'deb https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial main' > /etc/apt/sources.list.d/kubernetes.list
curl -s https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | sudo apt-key add -
apt-get update

export KUBE_VERSION="1.20.5"
apt-get install -y kubeadm=$KUBE_VERSION-00 kubelet=$KUBE_VERSION-00 kubectl=$KUBE_VERSION-00
sudo apt-mark hold kubelet kubeadm kubectl

[ -d /etc/bash_completion.d ] && \
    { kubectl completion bash > /etc/bash_completion.d/kubectl; \
      kubeadm completion bash > /etc/bash_completion.d/kubadm; }
      
[ ! -d /usr/lib/systemd/system/kubelet.service.d ] && mkdir -p /usr/lib/systemd/system/kubelet.service.d
cat << EOF > /usr/lib/systemd/system/kubelet.service.d/11-cgroup.conf
[Service]
CPUAccounting=true
MemoryAccounting=true
BlockIOAccounting=true
ExecStartPre=/usr/bin/bash -c '/usr/bin/mkdir -p /sys/fs/cgroup/{cpuset,memory,systemd,pids,"cpu,cpuacct"}/{system,kube,kubepods}.slice'
Slice=kube.slice
EOF
systemctl daemon-reload
 
systemctl enable kubelet.service

下面的操作在 k8s-master-node1 节点上进行

设置 apiserver 解析

echo '127.0.0.1 apiserver.cluster.local' >> /etc/hosts

配置 kubeadm

建立kubeadm-config.yaml

[ ! -d etc/kubernetes ] && mkdir -p /etc/kubernetes

cat <<EOF > /etc/kubernetes/kubeadm-config.yaml
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
nodeRegistration:
  # cri 配置
  criSocket: /var/run/dockershim.sock
  kubeletExtraArgs:
    runtime-cgroups: /system.slice/docker.service

---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: ipvs
ipvs:
  # ipvs 配置
  minSyncPeriod: 5s
  syncPeriod: 5s
  scheduler: wrr

---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
maxPods: 200
cgroupDriver: systemd
runtimeRequestTimeout: 5m
# 此配置保证了 kubelet 能在 swap 开启的情况下启动
failSwapOn: false
nodeStatusUpdateFrequency: 5s
rotateCertificates: true
imageGCLowThresholdPercent: 70
imageGCHighThresholdPercent: 80
# 软驱逐阀值
evictionSoft:
  imagefs.available: 15%
  memory.available: 512Mi
  nodefs.available: 15%
  nodefs.inodesFree: 10%
# 达到软阈值之后,持续时间超过多久才进行驱逐
evictionSoftGracePeriod:
  imagefs.available: 3m
  memory.available: 1m
  nodefs.available: 3m
  nodefs.inodesFree: 1m
# 硬驱逐阀值
evictionHard:
  imagefs.available: 10%
  memory.available: 256Mi
  nodefs.available: 10%
  nodefs.inodesFree: 5%
evictionMaxPodGracePeriod: 30
# 节点资源预留
kubeReserved:
  cpu: 200m
  ephemeral-storage: 1Gi
systemReserved:
  cpu: 300m
  ephemeral-storage: 1Gi
kubeReservedCgroup: /kube.slice
systemReservedCgroup: /system.slice
enforceNodeAllocatable: 
- pods
- kube-reserved
- system-reserved

---
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: 1.20.5
controlPlaneEndpoint: apiserver.cluster.local:6443
networking:
  dnsDomain: cluster.local
  podSubnet: 10.244.0.0/16
  serviceSubnet: 10.96.0.0/16
imageRepository: registry.cn-hangzhou.aliyuncs.com/kainstall
apiServer:
  certSANs:
  - 127.0.0.1
  - apiserver.cluster.local
  - 192.168.77.140
  - 192.168.77.141
  - 192.168.77.142
  extraArgs:
    event-ttl: 720h
    service-node-port-range: 30000-50000
    # 审计日志相关配置
    audit-log-maxage: '20'
    audit-log-maxbackup: '10'
    audit-log-maxsize: '100'
    audit-log-path: /var/log/kube-audit/audit.log
    audit-policy-file: /etc/kubernetes/audit-policy.yaml
  extraVolumes:
  - name: audit-config
    hostPath: /etc/kubernetes/audit-policy.yaml
    mountPath: /etc/kubernetes/audit-policy.yaml
    readOnly: true
    pathType: File
  - name: audit-log
    hostPath: /var/log/kube-audit
    mountPath: /var/log/kube-audit
    pathType: DirectoryOrCreate
  - name: localtime
    hostPath: /etc/localtime
    mountPath: /etc/localtime
    readOnly: true
    pathType: File
controllerManager:
  extraArgs:
    bind-address: 0.0.0.0
    node-cidr-mask-size: '24'
    deployment-controller-sync-period: '10s'
    pod-eviction-timeout: 2m
    terminated-pod-gc-threshold: '30'
    experimental-cluster-signing-duration: 87600h
    feature-gates: RotateKubeletServerCertificate=true
    horizontal-pod-autoscaler-use-rest-clients: 'true'
    horizontal-pod-autoscaler-sync-period: 10s
    node-monitor-grace-period: 10s
  extraVolumes:
  - hostPath: /etc/localtime
    mountPath: /etc/localtime
    name: localtime
    readOnly: true
    pathType: File
scheduler:
  extraArgs:
    bind-address: 0.0.0.0
  extraVolumes:
  - hostPath: /etc/localtime
    mountPath: /etc/localtime
    name: localtime
    readOnly: true
    pathType: File
EOF

以上是对kubeadm的配置信息

配置审计

cat << EOF > /etc/kubernetes/audit-policy.yaml
# Log all requests at the Metadata level.
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: Metadata
EOF

初始化节点

使用kubeadm初始化control plane

kubeadm init --config=/etc/kubernetes/kubeadm-config.yaml --upload-certs

…
Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of the control-plane node running the following command on each as root:

  kubeadm join apiserver.cluster.local:6443 --token nj9wia.n8pmj18em5cfzg7e \
    --discovery-token-ca-cert-hash sha256:64ddf85513a7d0321cc19bad82209c83783276daa4cd8ebcb114f63804e649be \
    --control-plane --certificate-key 7bd6f9874e65df89af33f8f235494ec2a3544e58be89f96b5118be9e8bbc7a15

Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join apiserver.cluster.local:6443 --token nj9wia.n8pmj18em5cfzg7e \
    --discovery-token-ca-cert-hash sha256:64ddf85513a7d0321cc19bad82209c83783276daa4cd8ebcb114f63804e649be 

记录下join信息,后面node节点加入时使用。

注意:这里如果出现 提示找不到 hugetlb,是因为 debian 10 默认关闭了 cgroup hugetlb 关闭了。可以通过更新内核开启。

使用netstat -ntlp查看服务是否正常启动

[root@k8s-master-node1 /home/test]# netstat -ntlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 192.168.77.140:2379     0.0.0.0:*               LISTEN      7336/etcd           
tcp        0      0 127.0.0.1:2379          0.0.0.0:*               LISTEN      7336/etcd           
tcp        0      0 192.168.77.140:2380     0.0.0.0:*               LISTEN      7336/etcd           
tcp        0      0 127.0.0.1:2381          0.0.0.0:*               LISTEN      7336/etcd           
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      483/sshd            
tcp        0      0 127.0.0.1:29159         0.0.0.0:*               LISTEN      7621/kubelet        
tcp        0      0 127.0.0.1:10248         0.0.0.0:*               LISTEN      7621/kubelet        
tcp        0      0 127.0.0.1:10249         0.0.0.0:*               LISTEN      8042/kube-proxy     
tcp6       0      0 :::6443                 :::*                    LISTEN      7511/kube-apiserver 
tcp6       0      0 :::10256                :::*                    LISTEN      8042/kube-proxy     
tcp6       0      0 :::10257                :::*                    LISTEN      6993/kube-controlle 
tcp6       0      0 :::10259                :::*                    LISTEN      7164/kube-scheduler 
tcp6       0      0 :::22                   :::*                    LISTEN      483/sshd            
tcp6       0      0 :::10250                :::*                    LISTEN      7621/kubelet      

设置kubeconfig

mkdir -p $HOME/.kube
cp -rp /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config

通过kubectl检查kubernetes运行情况

[root@k8s-master-node1 /home/test]# kubectl get no
NAME               STATUS     ROLES                  AGE   VERSION
k8s-master-node1   NotReady   control-plane,master   94s   v1.20.5

因为没安装 cni,所以现在还是 NotReady

其他 Master 加入集群

在其他 master 上执行

将其他两个master节点加入进集群

配置审计策略

cat << EOF > /etc/kubernetes/audit-policy.yaml
# Log all requests at the Metadata level.
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: Metadata
EOF

加入集群

# api域名先指向k8s-m1
echo '192.168.77.140 apiserver.cluster.local' >> /etc/hosts

kubeadm join apiserver.cluster.local:6443 --token nj9wia.n8pmj18em5cfzg7e \
  --discovery-token-ca-cert-hash sha256:64ddf85513a7d0321cc19bad82209c83783276daa4cd8ebcb114f63804e649be \
  --control-plane --certificate-key 7bd6f9874e65df89af33f8f235494ec2a3544e58be89f96b5118be9e8bbc7a15

mkdir -p $HOME/.kube
cp -rp /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config

# 将api域名指向本地
sed -i 's#192.168.77.140 apiserver.k8s.local#127.0.0.1 apiserver.k8s.local#g' /etc/hosts

master 节点完成后,此时的集群列表

# kubectl get nodes
NAME               STATUS     ROLES                  AGE     VERSION
k8s-master-node1   NotReady   control-plane,master   2m      v1.20.5
k8s-master-node2   NotReady   control-plane,master   2m22s   v1.20.5
k8s-master-node3   NotReady   control-plane,master   2m42s   v1.20.5

部署worker节点

安装 kube 组件

安装kubeadmkubelet

echo 'deb https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial main' > /etc/apt/sources.list.d/kubernetes.list
curl -s https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | sudo apt-key add -
apt-get update

export KUBE_VERSION="1.20.5"
apt-get install -y kubeadm=$KUBE_VERSION-00 kubelet=$KUBE_VERSION-00
apt-mark hold kubelet kubeadm

[ -d /etc/bash_completion.d ] && kubeadm completion bash > /etc/bash_completion.d/kubadm
      
[ ! -d /usr/lib/systemd/system/kubelet.service.d ] && mkdir -p /usr/lib/systemd/system/kubelet.service.d
cat << EOF > /usr/lib/systemd/system/kubelet.service.d/11-cgroup.conf
[Service]
CPUAccounting=true
MemoryAccounting=true
BlockIOAccounting=true
ExecStartPre=/usr/bin/bash -c '/usr/bin/mkdir -p /sys/fs/cgroup/{cpuset,memory,systemd,pids,"cpu,cpuacct"}/{system,kube,kubepods}.slice'
Slice=kube.slice
EOF
systemctl daemon-reload
 
systemctl enable kubelet.service

配置 haproxy

使用 haproxy 来提供 Kubernetes API Server 的负载均衡

apt-ge install -y haproxy

cat <<EOF > /etc/haproxy/haproxy.cfg
global
  log 127.0.0.1 local0
  log 127.0.0.1 local1 notice
  tune.ssl.default-dh-param 2048

defaults
  log global
  mode http
  option dontlognull
  timeout connect 5000ms
  timeout client 600000ms
  timeout server 600000ms

listen stats
  bind :9090
  mode http
  balance
  stats uri /haproxy_stats
  stats auth admin:admin123
  stats admin if TRUE

frontend kube-apiserver-https
   mode tcp
   bind :6443
   default_backend kube-apiserver-backend

backend kube-apiserver-backend
  mode tcp
  balance roundrobin
  stick-table type ip size 200k expire 30m
  stick on src
  server apiserver1 192.168.77.140:6443 check
  server apiserver2 192.168.77.141:6443 check
  server apiserver3 192.168.77.142:6443 check
EOF

启动haproxy

systemctl start haproxy
systemctl enable --now haproxy

加入集群

# echo '127.0.0.1 apiserver.cluster.local' >> /etc/hosts

# kubeadm join apiserver.cluster.local:6443 --token nj9wia.n8pmj18em5cfzg7e \
    --discovery-token-ca-cert-hash sha256:64ddf85513a7d0321cc19bad82209c83783276daa4cd8ebcb114f63804e649be 

配置集群

配置 worker role

将 worker 节点的 role 标签设置为 worker

# kubectl get node --selector='!node-role.kubernetes.io/master' | grep '<none>' | awk '{print "kubectl label node " $1 " node-role.kubernetes.io/worker= --overwrite" }' | bash

# kubectl get node
NAME               STATUS     ROLES                  AGE   VERSION
k8s-master-node1   NotReady   control-plane,master   40m   v1.20.5
k8s-master-node2   NotReady   control-plane,master   20m   v1.20.5
k8s-master-node3   NotReady   control-plane,master   17m   v1.20.5
k8s-worker-node1   NotReady   worker                 85s   v1.20.5
k8s-worker-node2   NotReady   worker                 81s   v1.20.5
k8s-worker-node3   NotReady   worker                 79s   v1.20.5

配置 网络

Flannel 是一种简单易行的方式来配置为Kubernetes设计的第三层网络结构。本次选用 flannel 组件作为集群网络组件。

# 下载 flannel
wget https://cdn.jsdelivr.net/gh/coreos/flannel@v0.13.0/Documentation/kube-flannel.yml

# 配置 pod 网络
sed -i 's#10.244.0.0/16#10.244.0.0/16#g' kube-flannel.yml

# 配置直接路由
sed -i 's#"Type": "vxlan"#"Type": "vxlan", "DirectRouting": true#g' kube-flannel.yml

# 应用 flannel
kubectl apply -f kube-flannel.yml

# 等待 flannel pod 正常启动
kubectl wait --namespace kube-system --for=condition=ready pods --selector=app=flannel --timeout=60s

flannel 启动正常时,节点状态也就ready了

# kubectl  get nodes
NAME               STATUS   ROLES                  AGE    VERSION
k8s-master-node1   Ready    control-plane,master   41m   v1.20.5
k8s-master-node2   Ready    control-plane,master   22m   v1.20.5
k8s-master-node3   Ready    control-plane,master   18m   v1.20.5
k8s-worker-node1   Ready    worker                 2m    v1.20.5
k8s-worker-node2   Ready    worker                 2m    v1.20.5
k8s-worker-node3   Ready    worker                 2m    v1.20.5

配置 metrics-server

Metrics Server 是实现了Metrics API的元件,其目标是取代Heapster作为Pod与Node提供资源的Usage metrics,该组件会从每个Kubernetes节点上的Kubelet所公开的Summary API中收集Metrics。

在任意master节点上执行kubectl top命令

# kubectl top node
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

发现top指令无法取得Metrics,这表示Kubernetes 丛集没有安装Heapster或是Metrics Server 来提供Metrics API给top指令取得资源使用量。

部署metric-server组件

# wget https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.4.2/components.yaml -O metrics-server.yaml

# sed -i -e 's#k8s.gcr.io/metrics-server#registry.cn-hangzhou.aliyuncs.com/kainstall#g' \
       -e '/--kubelet-preferred-address-types=.*/d' \
	   -e 's/\(.*\)- --secure-port=4443/\1- --secure-port=4443\n\1- --kubelet-insecure-tls\n\1- --kubelet-preferred-address-types=InternalIP,InternalDNS,ExternalIP,ExternalDNS,Hostname/g' \
	   metrics-server.yaml
             
# kubectl apply -f metrics-server.yaml

查看聚合的api

# kubectl get apiservices.apiregistration.k8s.io  | grep v1beta1.metrics.k8s.io
v1beta1.metrics.k8s.io                 kube-system/metrics-server   True        55M

# kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes"
{"kind":"NodeMetricsList","apiVersion":"metrics.k8s.io/v1beta1","metadata":{"selfLink":"/apis/metrics.k8s.io/v1beta1/nodes"},"items":[{"metadata":{"name":"k8s-master-node1","selfLink":"/apis/metrics.k8s.io/v1beta1/nodes/k8s-master-node1","creationTimestamp":"2021-04-03T13:56:08Z"},"timestamp":"2021-04-03T13:55:41Z","window":"30s","usage":{"cpu":"224628621n","memory":"1255272Ki"}},{"metadata":{"name":"k8s-master-node2","selfLink":"/apis/metrics.k8s.io/v1beta1/nodes/k8s-master-node2","creationTimestamp":"2021-04-03T13:56:08Z"},"timestamp":"2021-04-03T13:55:45Z","window":"30s","usage":{"cpu":"207660844n","memory":"984580Ki"}},{"metadata":{"name":"k8s-master-node3","selfLink":"/apis/metrics.k8s.io/v1beta1/nodes/k8s-master-node3","creationTimestamp":"2021-04-03T13:56:08Z"},"timestamp":"2021-04-03T13:55:39Z","window":"30s","usage":{"cpu":"217148133n","memory":"1053340Ki"}},{"metadata":{"name":"k8s-worker-node1","selfLink":"/apis/metrics.k8s.io/v1beta1/nodes/k8s-worker-node1","creationTimestamp":"2021-04-03T13:56:08Z"},"timestamp":"2021-04-03T13:55:42Z","window":"30s","usage":{"cpu":"87641005n","memory":"565416Ki"}},{"metadata":{"name":"k8s-worker-node2","selfLink":"/apis/metrics.k8s.io/v1beta1/nodes/k8s-worker-node2","creationTimestamp":"2021-04-03T13:56:08Z"},"timestamp":"2021-04-03T13:55:38Z","window":"30s","usage":{"cpu":"85071223n","memory":"454372Ki"}},{"metadata":{"name":"k8s-worker-node3","selfLink":"/apis/metrics.k8s.io/v1beta1/nodes/k8s-worker-node3","creationTimestamp":"2021-04-03T13:56:08Z"},"timestamp":"2021-04-03T13:55:37Z","window":"30s","usage":{"cpu":"47689888n","memory":"408796Ki"}}]}

完成后,等待一段时间收集Metrics,再次执行kubectl top

# kubectl top node
NAME               CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
k8s-master-node1   238m         15%    1272Mi          74%       
k8s-master-node2   220m         14%    956Mi           55%       
k8s-master-node3   208m         13%    1036Mi          60%       
k8s-worker-node1   83m          5%     552Mi           32%       
k8s-worker-node2   86m          5%     443Mi           25%       
k8s-worker-node3   48m          9%     399Mi           23%  

配置 coredns

kubeadm 中的 coredns,默认没有 反亲和性 配置,这样会使 pod 存在同一个节点上,从而增加了风险。

# kubectl get pods -n kube-system -l k8s-app=kube-dns -o wide
NAME                       READY   STATUS    RESTARTS   AGE    IP           NODE               NOMINATED NODE   READINESS GATES
coredns-85bb79f4b4-4r6jw   1/1     Running   0          164m   10.244.2.3   k8s-master-node3   <none>           <none>
coredns-85bb79f4b4-gwptq   1/1     Running   0          164m   10.244.2.2   k8s-master-node3   <none>           <none>

给 coredns 添加 反亲和性,防止coredns 集中在一个节点上。

# kubectl -n kube-system patch deployment coredns --patch '{"spec": {"template": {"spec": {"affinity":{"podAntiAffinity":{"preferredDuringSchedulingIgnoredDuringExecution":[{"weight":100,"podAffinityTerm":{"labelSelector":{"matchExpressions":[{"key":"k8s-app","operator":"In","values":["kube-dns"]}]},"topologyKey":"kubernetes.io/hostname"}}]}}}}}}' --record

再次查看 pod 分配,已经分布在不同的节点上了。

# kubectl get pods -n kube-system -l k8s-app=kube-dns -o wide
NAME                       READY   STATUS    RESTARTS   AGE   IP           NODE               NOMINATED NODE   READINESS GATES
coredns-8496bbfb78-dvs5r   1/1     Running   0          10m   10.244.3.3   k8s-worker-node1   <none>           <none>
coredns-8496bbfb78-x274f   1/1     Running   0          10m   10.244.4.4   k8s-worker-node2   <none>           <none>

配置 etcd 定时备份

这里我们通过 CronJob 资源进行定时备份 etcd 数据到节点目录 /var/lib/etcd/backups 中, 保存最近 30 个备份。

master_num=3
etcd_image=$(kubeadm config images list --config=/etc/kubernetes/kubeadm-config.yaml 2>/dev/null | grep etcd:)

cat << EOF | kubectl apply -f -
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: etcd-snapshot
  namespace: kube-system
spec:
  schedule: '0 */6 * * *'
  successfulJobsHistoryLimit: 3
  suspend: false
  concurrencyPolicy: Allow
  failedJobsHistoryLimit: 3
  jobTemplate:
    spec:
      backoffLimit: 6
      parallelism: ${master_num}
      completions: ${master_num}
      template:
        metadata:
          labels:
            app: etcd-snapshot
        spec:
          affinity:
            podAntiAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
              - labelSelector:
                  matchExpressions:
                  - key: app
                    operator: In
                    values:
                    - etcd-snapshot
                topologyKey: 'kubernetes.io/hostname'
          containers:
          - name: etcd-snapshot
            image: ${etcd_image}
            imagePullPolicy: IfNotPresent
            args:
            - -c
            - etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt
              --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key
              snapshot save /backup/etcd-snapshot-\$(date +%Y-%m-%d_%H:%M:%S_%Z).db
              && echo 'delete old backups' && find /backup -type f -mtime +30 -exec rm -fv {} \\; || echo error
            command:
            - /bin/sh
            env:
            - name: ETCDCTL_API
              value: '3'
            resources: {}
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
            volumeMounts:
            - name: etcd-certs
              mountPath: /etc/kubernetes/pki/etcd
              readOnly: true
            - name: backup
              mountPath: /backup
            - name: etc
              mountPath: /etc
            - name: bin
              mountPath: /usr/bin
            - name: lib64
              mountPath: /lib64
            - name: lib
              mountPath: /lib
          dnsPolicy: ClusterFirst
          hostNetwork: true
          nodeSelector:
            node-role.kubernetes.io/master: ''
          tolerations:
          - effect: NoSchedule
            operator: Exists
          restartPolicy: OnFailure
          terminationGracePeriodSeconds: 30
          volumes:
          - name: etcd-certs
            hostPath:
              path: /etc/kubernetes/pki/etcd
              type: DirectoryOrCreate
          - name: backup
            hostPath:
              path: /var/lib/etcd/backups
              type: DirectoryOrCreate
          - name: etc
            hostPath:
              path: /etc
          - name: bin
            hostPath:
              path: /usr/bin
          - name: lib64
            hostPath:
              path: /lib64
          - name: lib
            hostPath:
              path: /lib
EOF

kubectl -n kube-system get cronjob
NAME            SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
etcd-snapshot   0 */6 * * *   False     0        <none>          70s


jobname="etcd-snapshot-$(date +%s)"
kubectl create job --from=cronjob/etcd-snapshot ${jobname} -n kube-system && \
kubectl wait --for=condition=complete job/${jobname} -n kube-system

配置 Ingress

Ingress 是 Kubernetes 中的一个抽象资源,其功能是透过 Web Server 的 Virtual Host 概念以域名(Domain Name)方式转发到内部 Service,这避免了使用 Service 中的 NodePort 与 LoadBalancer 类型所带来的限制(如 Port 数量上限),而实现 Ingress 功能则是透过 Ingress Controller来达成,它会负责监听 Kubernetes API中的 Ingress 与 Service 资源物件,并在发生资源变化时,依据资源预期的结果来设定 Web Server。另外Ingress Controller 有许多实现可以选择:

  • Ingress NGINX: Kubernetes 官方维护的,也是本次安装使用的 Controller。
  • F5 BIG-IP Controller: F5 所开发的 Controller,它能够让管理员透过 CLI 或 API 从 Kubernetes 与 OpenShift 管理 F5 BIG-IP 设备。
  • Ingress Kong: 著名的开源 API Gateway 专案所维护的 Kubernetes Ingress Controller。
  • Træfik: 是一套开源的 HTTP 反向代理与负载平衡器。
  • Voyager: 一套以 HAProxy 为底的 Ingress Controller。

部署 ingress-nginx

wget https://cdn.jsdelivr.net/gh/kubernetes/ingress-nginx@controller-v0.44.0/deploy/static/provider/baremetal/deploy.yaml -O ingress-nginx.yaml

sed -i -e 's#k8s.gcr.io/ingress-nginx#registry.cn-hangzhou.aliyuncs.com/kainstall#g' \
       -e 's#@sha256:.*$##g' ingress-nginx.yaml

kubectl wait --namespace ingress-nginx --for=condition=ready pods --selector=app.kubernetes.io/component=controller --timeout=60s
pod/ingress-nginx-controller-67848f7b-2gxzb condition met

官方默认加上了 admission 功能,而我们的 apiserver 使用宿主机的dns,不是coredns,所以连接不上 ingress-nginx Controller 的 service 地址,这里我们把 admission 准入钩子去掉,使我们创建 ingress 资源时,不去验证Controller。

admission webhook 的作用我简单的总结下,当用户的请求到达 k8s apiserver 后,apiserver 根据 MutatingWebhookConfigurationValidatingWebhookConfiguration 的配置,先调用 MutatingWebhookConfiguration 去修改用户请求的配置文件,最后会调用 ValidatingWebhookConfiguration 来验证这个修改后的配置文件是否合法。

kubectl delete -A ValidatingWebhookConfiguration ingress-nginx-admission

配置 Dashboard

Dashboard 是Kubernetes官方开发的基于Web的仪表板,目的是提升管理Kubernetes集群资源便利性,并以资源视觉化方式,来让人更直觉的看到整个集群资源状态。

部署 dashboard

wget https://cdn.jsdelivr.net/gh/kubernetes/dashboard@v2.2.0/aio/deploy/recommended.yaml -O dashboard.yaml
kubectl apply -f dashboard.yaml

部署 ingress

cat << EOF | kubectl apply -f -
---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/secure-backends: 'true'
    nginx.ingress.kubernetes.io/backend-protocol: 'HTTPS'
    nginx.ingress.kubernetes.io/ssl-passthrough: 'true'
  name: kubernetes-dashboard
  namespace: kubernetes-dashboard 
spec:
  tls:
  - hosts:
    - kubernetes-dashboard.cluster.local
    secretName: kubernetes-dashboard-certs
  rules:
  - host: kubernetes-dashboard.cluster.local
    http:
      paths:
      - path: /
        backend:
          serviceName: kubernetes-dashboard
          servicePort: 443
EOF

创建 sa,使用 sa 的 token 进行登录 dashboard

kubectl create serviceaccount kubernetes-dashboard-admin-sa -n kubernetes-dashboard
kubectl create clusterrolebinding kubernetes-dashboard-admin-sa --clusterrole=cluster-admin --serviceaccount=kubernetes-dashboard:kubernetes-dashboard-admin-sa -n kubernetes-dashboard

kubectl describe secrets $(kubectl describe sa kubernetes-dashboard-admin-sa -n kubernetes-dashboard | awk '/Tokens/ {print $2}') -n kubernetes-dashboard | awk '/token:/{print $2}'

eyJhbGciOiJSUzI1NiIsImtpZCI6IkFpMWkxemI3cnFlUmNmVzFJbno5a3IzWktpVGxlaXJlaXZna0NxRlRRTWMifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJrdWJlcm5ldGVzLWRhc2hib2FyZC1hZG1pbi1zYS10b2tlbi05YnY0bCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJrdWJlcm5ldGVzLWRhc2hib2FyZC1hZG1pbi1zYSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6IjI4ODIyOWQ0LTA3NTItNGIyNi05ZjhjLWE1N2ZlYzNjNGYwMyIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlcm5ldGVzLWRhc2hib2FyZDprdWJlcm5ldGVzLWRhc2hib2FyZC1hZG1pbi1zYSJ9.CEBpIcs8bS5960T-F4Bdo4Di4Y61CBDG9SoRWosjxIPm8RuRD2CwSX_fNBMbXLxPVPZME90EwJkaBTSHBwTb1pYOphety4OvFXStpnqj6tyJl9EWrzLdibfnJnZ1cq0X9cIGyjQ6gcuboqQJQEzxgBJdTuELU3LHHQhXOzQZHH_pMPVjntRIQEQ5wGDvRz50Eig-KMaK3IislFEgA3a8mkdVppkJA3gPprEAnUp6of8bKFa6rrZL1Kcx0u-C9We6sgzZcLYPahx6zKO4l0XkaxHrwiNodgBbCaqJ2C3V78p1HD9u16XMKjoz5rzkakajJmB0zMeRaFdcFHlKXrQ1gQ

获取 dashboard 的 ingres 连接地址

echo https://$(kubectl get node -o jsonpath='{range .items[*]}{ .status.addresses[?(@.type=="InternalIP")].address} {.status.conditions[?(@.status == "True")].status}{"\n"}{end}' | awk '{if($2=="True")a=$1}END{print a}'):$(kubectl get svc --all-namespaces -o go-template="{{range .items}}{{if eq .metadata.name \"ingress-nginx-controller\" }}{{range.spec.ports}}{{if eq .port "443"}}{{.nodePort}}{{end}}{{end}}{{end}}{{end}}")


https://192.168.77.145:34239

将 host 绑定后,使用token 进行登录

192.168.77.145 kubernetes-dashboard.cluster.local

https://kubernetes-dashboard.cluster.local:34239

测试集群

重启集群

  1. 将 集群节点 全部重启
  2. 获取节点信息
# kubectl get node
NAME               STATUS   ROLES                  AGE    VERSION
k8s-master-node1   Ready    control-plane,master   50m   v1.20.5
k8s-master-node2   Ready    control-plane,master   31m   v1.20.5
k8s-master-node3   Ready    control-plane,master   27m   v1.20.5
k8s-worker-node1   Ready    worker                 11m   v1.20.5
k8s-worker-node2   Ready    worker                 11m   v1.20.5
k8s-worker-node3   Ready    worker                 11m   v1.20.5

部署 whoami app

部署应用

cat <<EOF | kubectl apply -f -
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ingress-demo-app
  labels:
    app: ingress-demo-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: ingress-demo-app
  template:
    metadata:
      labels:
        app: ingress-demo-app
    spec:
      containers:
      - name: whoami
        image: traefik/whoami:v1.6.1
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: ingress-demo-app
spec:
  type: ClusterIP
  selector:
    app: ingress-demo-app
  ports:
    - name: http
      port: 80
      targetPort: 80
---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: ingress-demo-app
  annotations:
    kubernetes.io/ingress.class: nginx
spec:
  rules:
  - host: app.demo.com
    http:
      paths:
      - path: /
        backend:
          serviceName: ingress-demo-app
          servicePort: 80
EOF

获取应用的pods

kubectl get pods -l app=ingress-demo-app
NAME                                READY   STATUS    RESTARTS   AGE
ingress-demo-app-694bf5d965-8zqlb   1/1     Running   0          41s
ingress-demo-app-694bf5d965-h4bcm   1/1     Running   0          41s

通过 ingress 访问

echo http://$(kubectl get node -o jsonpath='{range .items[*]}{ .status.addresses[?(@.type=="InternalIP")].address} {.status.conditions[?(@.status == "True")].status}{"\n"}{end}' | awk '{if($2=="True")a=$1}END{print a}'):$(kubectl get svc --all-namespaces -o go-template="{{range .items}}{{if eq .metadata.name \"ingress-nginx-controller\" }}{{range.spec.ports}}{{if eq .port "80"}}{{.nodePort}}{{end}}{{end}}{{end}}{{end}}")
http://192.168.77.145:38825

kubectl get pods -n ingress-nginx -l app.kubernetes.io/component=controller -o wide
NAME                                      READY   STATUS    RESTARTS   AGE   IP            NODE               NOMINATED NODE   READINESS GATES
ingress-nginx-controller-67848f7b-2gxzb   1/1     Running   2          4d    10.244.3.11   k8s-worker-node1   <none>           <none>



curl -H 'Host:app.demo.com' http://192.168.77.145:38825
Hostname: ingress-demo-app-694bf5d965-8zqlb
IP: 127.0.0.1
IP: 10.244.5.2
RemoteAddr: 10.244.3.11:59762
GET / HTTP/1.1
Host: app.demo.com
User-Agent: curl/7.64.0
Accept: */*
X-Forwarded-For: 192.168.77.145
X-Forwarded-Host: app.demo.com
X-Forwarded-Port: 80
X-Forwarded-Proto: http
X-Real-Ip: 192.168.77.145
X-Request-Id: 1c5c76949fe909222a76d59c631ac82b
X-Scheme: http

从 whoami 应用返回单额信息可以看到,我们通过 ingress 访问到了 whomai app。

重置集群

安装有问题的时候,可以使用下列命令重置集群

kubeadm reset -f
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
ipvsadm --clear 
systemctl stop kubelet
docker rm -f -v $(docker ps -q)
find /var/lib/kubelet | xargs -n 1 findmnt -n -t tmpfs -o TARGET -T | uniq | xargs -r umount -v
rm -r -f /etc/kubernetes /var/lib/kubelet /var/lib/etcd ~/.kube/config
原文地址 https://lework.github.io/2021/04/03/debian-kubeadm-install/

Comments

Content