当前位置：首页 > news >正文

制作一个网站软件开发公司的优势

news 2025/11/15 4:53:44

制作一个网站,软件开发公司的优势,wordpress博客重装,百度一下网页搜索文章目录一、前言1、硬件配置2、组网拓扑3、总体方案二、软件安装三、集群部署1、配置多路径2、配置高可用集群3、配置zpool4、部署lustre5、配置Lustre角色高可用6、配置Lustre状态监控6.1、Lustre网络状态监控6.2、Lustre集群状态监控6.3、配置优化6.3.1、设置故障恢复不回… 文章目录一、前言1、硬件配置2、组网拓扑3、总体方案二、软件安装三、集群部署1、配置多路径2、配置高可用集群3、配置zpool4、部署lustre5、配置Lustre角色高可用6、配置Lustre状态监控6.1、Lustre网络状态监控6.2、Lustre集群状态监控6.3、配置优化6.3.1、设置故障恢复不回迁6.3.2、设置资源超时时间四、配置说明五、维护操作1、pcs集群管理2、pcs资源管理一、前言参考链接Category:Lustre_High_Availability 1、硬件配置节点类型主机名IP地址硬件配置盘阵//5U84 JBOD盘阵硬盘16TB SAS HDD x 70、800GB SAS SSD x 4 io模块io模块 x 2单个模块3个SAS接口SAS线Mini SAS HD连接线 x 4服务器test-lustre2千兆10.1.80.12/16 万兆172.16.21.12/244U4路服务器 CPUIntel Gold 5120 CPU 2.20GHz x 4 内存512GB 网络1Gb x 1、10Gb x 4bond4 SAS卡LSI SAS 9300-8e 双口SAS卡 x 1 系统盘1.2T SAS SSD * 2RAID1服务器test-lustre3千兆10.1.80.13/16 万兆172.16.21.13/244U4路服务器 CPUIntel Gold 5120 CPU 2.20GHz x 4 内存512GB 网络1Gb x 1、10Gb x 4bond4 SAS卡LSI SAS 9300-8e 双口SAS卡 x 1 系统盘1.2T SAS SSD * 2RAID1客户端test-lustre1千兆10.1.80.11/16 万兆172.16.21.11/244U4路服务器 CPUIntel Gold 5120 CPU 2.20GHz x 4 内存512GB 网络1Gb x 1、10Gb x 4bond4 系统盘1.2T SAS SSD * 2RAID1客户端test-lustre4千兆10.1.80.14/16 万兆172.16.21.14/244U4路服务器 CPUIntel Gold 5120 CPU 2.20GHz x 4 内存512GB 网络1Gb x 1、10Gb x 4bond4 系统盘1.2T SAS SSD * 2RAID1 2、组网拓扑 3、总体方案本篇主要阐述Lustre软raid方案部署说明总体配置如下 a、配置多路径一台盘阵通过4根SAS线连接至两台服务器双口SAS卡每台服务器通过multipath对所有硬盘配置主主多路径b、配置高可用集群通过pacemaker配置高可用集群建立集群节点互信c、配置zpool 通过zfs配置软raid使用10块HDD组raid6共7组用于OST数据存储使用4块SSD组raid10共1组用于MGT/MDT数据存储d、部署lustre 通过mkfs.lustre对lustre角色MGT/MDT/OSTraid盘进行格式化操作e、配置Lustre角色高可用通过pcs创建ocf:heartbeat:ZFS、ocf:lustre:Lustre资源其中ocf:heartbeat:ZFS用于管理zpool池导入导出ocf:lustre:Lustre用于管理lustre角色挂载f、配置Lustre状态监控通过pcs创建ocf:lustre:healthLNET、ocf:lustre:healthLUSTRE监控规则应用到所有资源上其中ocf:lustre:healthLNET用于监测LNet网络连接状态ocf:lustre:healthLUSTRE用于监测Lustre集群监控状态当监测到某一节点异常时将该节点资源转移到另一节点上二、软件安装安装lustre软件包 # 安装lustre软件包 yum install -y lustre lustre-iokit kmod-lustre lustre-osd-zfs-mount lustre-zfs-dkms # 安装lustre-resource-agents软件包为实现pacemaker可管理lustre服务需要安装资源代理程序安装完成后会生成/usr/lib/ocf/resource.d/lustre/目录 yum install -y lustre-resource-agents安装zfs软件包 yum install -y zfs 安装multipath多路径软件包 yum install -y device-mapper-multipath安装pacemaker高可用软件包 yum install -y pacemaker pcs三、集群部署 1、配置多路径生成配置文件执行命令mpathconf --enable在/etc目录下生成配置文件multipath.conf 修改配置文件使用多路径主主配置修改配置文件/etc/mulipath.conf添加配置信息如下 defaults {path_selector round-robin 0path_grouping_policy multibususer_friendly_names yesfind_multipaths yes }devices {device {path_grouping_policy multibus path_checker turpath_selector round-robin 0hardware_handler 1 aluaprio aluafailback immediaterr_weight uniformno_path_retry queue} }设置开机自启动 systemctl restart multipathd systemctl enable multipathd查看当前配置当前已生成多路径主主配置每个磁盘都有两条主路径 [roottest-lustre2 ~]# multipath -ll mpathbp (35000c500d75452bb) dm-71 SEAGATE ,ST16000NM004J size15T features0 hwhandler0 wprw -- policyround-robin 0 prio1 statusactive|- 1:0:77:0 sdca 68:224 active ready running- 1:0:160:0 sdew 129:128 active ready running2、配置高可用集群所有集群节点配置如下启动pcsd服务关闭防火墙及SELinux systemctl restart pcsd systemctl enable pcsdsystemctl stop firewalld systemctl disable firewalld setenforce 0 sed -i s#SELINUXenforcing#SELINUXdisabled#g /etc/selinux/config所有集群节点配置如下使用集群网配置hosts文件解析 [roottest-lustre2 ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 172.16.21.12 test-lustre2 172.16.21.13 test-lustre3[roottest-lustre3 ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 172.16.21.12 test-lustre2 172.16.21.13 test-lustre3所有集群节点配置如下为hacluster用户设置相同的密码 [roottest-lustre2 ~]# echo 123456 |passwd --stdin hacluster [roottest-lustre3 ~]# echo 123456 |passwd --stdin hacluster任一集群节点配置如下配置集群互信创建双机pacemaker集群使用双心跳网络否则容易出现脑裂启动所有集群节点corosync和pacemaker服务注双机pacemaker集群时忽略节点quorum功能禁用STONITH组件功能 [roottest-lustre2 ~]# pcs cluster auth 172.16.21.12 172.16.21.13 -u hacluster -p 123456 [roottest-lustre2 ~]# pcs cluster setup --start --name my_cluster 172.16.21.12,10.1.80.12 172.16.21.13,10.1.80.13 --force --transport udpu --token 7000 [roottest-lustre2 ~]# pcs cluster start --all [roottest-lustre2 ~]# pcs property set no-quorum-policyignore [roottest-lustre2 ~]# pcs property set stonith-enabledfalse所有集群节点配置如下禁用掉pacemaker/corosync服务开机自启动通过rc.local延时启动pacemaker服务 [roottest-lustre2 ~]# pcs cluster disable 172.16.21.12 [roottest-lustre2 ~]# pcs cluster disable 172.16.21.13 [roottest-lustre2 ~]# cat /etc/rc.local | tail -n 2 sleep 20 systemctl start pacemaker [roottest-lustre2 ~]# chmod x /etc/rc.d/rc.local[roottest-lustre3 ~]# cat /etc/rc.local | tail -n2 sleep 20 systemctl start pacemaker [roottest-lustre3 ~]# chmod x /etc/rc.d/rc.local3、配置zpool 任一集群节点配置如下完成所有zpool创建创建mdtpool1使用4块ssd组raid10用于mdt、mgt数据存储 zpool create -O canmountoff -o cachefilenone -f mdtpool1 mirror /dev/disk/by-id/dm-uuid-mpath-35000c500a19bb39b /dev/disk/by-id/dm-uuid-mpath-35000c500a19b9c0f mirror /dev/disk/by-id/dm-uuid-mpath-35000c500a19b9ce7 /dev/disk/by-id/dm-uuid-mpath-35000c500a19b9db3创建ostpool1、ostpool2、ostpool3、ostpool4、ostpool5、ostpool6、ostpool7分别使用10块hdd组raid6用于ost数据存储 zpool create -O canmountoff -o cachefilenone -f ostpool1 raidz2 /dev/disk/by-id/dm-uuid-mpath-35000c500d75452bb /dev/disk/by-id/dm-uuid-mpath-35000c500d75aac43 /dev/disk/by-id/dm-uuid-mpath-35000c500d75a6ebb /dev/disk/by-id/dm-uuid-mpath-35000c500d75a9c57 /dev/disk/by-id/dm-uuid-mpath-35000c500d7618177 /dev/disk/by-id/dm-uuid-mpath-35000c500d75eb73f /dev/disk/by-id/dm-uuid-mpath-35000c500d7547f2b /dev/disk/by-id/dm-uuid-mpath-35000c500d75a6f8b /dev/disk/by-id/dm-uuid-mpath-35000c500d758dc9f /dev/disk/by-id/dm-uuid-mpath-35000c500d761a0bf zpool create -O canmountoff -o cachefilenone -f ostpool2 raidz2 /dev/disk/by-id/dm-uuid-mpath-35000c500d7554ea3 /dev/disk/by-id/dm-uuid-mpath-35000c500d752814b /dev/disk/by-id/dm-uuid-mpath-35000c500d7601487 /dev/disk/by-id/dm-uuid-mpath-35000c500d76175ef /dev/disk/by-id/dm-uuid-mpath-35000c500d761b28f /dev/disk/by-id/dm-uuid-mpath-35000c500d761d31f /dev/disk/by-id/dm-uuid-mpath-35000c500d75b238f /dev/disk/by-id/dm-uuid-mpath-35000c500d752d64b /dev/disk/by-id/dm-uuid-mpath-35000c500cb28ca87 /dev/disk/by-id/dm-uuid-mpath-35000c500d7616417 zpool create -O canmountoff -o cachefilenone -f ostpool3 raidz2 /dev/disk/by-id/dm-uuid-mpath-35000c500cac14f1b /dev/disk/by-id/dm-uuid-mpath-35000c500d758d977 /dev/disk/by-id/dm-uuid-mpath-35000c500d758186b /dev/disk/by-id/dm-uuid-mpath-35000c500cadd3ce7 /dev/disk/by-id/dm-uuid-mpath-35000c500d75a99ef /dev/disk/by-id/dm-uuid-mpath-35000c500d75a9bb7 /dev/disk/by-id/dm-uuid-mpath-35000c500d75c37bf /dev/disk/by-id/dm-uuid-mpath-35000c500d7587f7b /dev/disk/by-id/dm-uuid-mpath-35000c500d75aa373 /dev/disk/by-id/dm-uuid-mpath-35000c500ca90ab77 zpool create -O canmountoff -o cachefilenone -f ostpool4 raidz2 /dev/disk/by-id/dm-uuid-mpath-35000c500d75a9b7b /dev/disk/by-id/dm-uuid-mpath-35000c500d753adbf /dev/disk/by-id/dm-uuid-mpath-35000c500cadab52b /dev/disk/by-id/dm-uuid-mpath-35000c500ca84573b /dev/disk/by-id/dm-uuid-mpath-35000c500d75b1b1b /dev/disk/by-id/dm-uuid-mpath-35000c500d75901ab /dev/disk/by-id/dm-uuid-mpath-35000c500d75fd637 /dev/disk/by-id/dm-uuid-mpath-35000c500d755544f /dev/disk/by-id/dm-uuid-mpath-35000c500d75fdea7 /dev/disk/by-id/dm-uuid-mpath-35000c500d75dbd9b zpool create -O canmountoff -o cachefilenone -f ostpool5 raidz2 /dev/disk/by-id/dm-uuid-mpath-35000c500d7529d23 /dev/disk/by-id/dm-uuid-mpath-35000c500d75dc3b7 /dev/disk/by-id/dm-uuid-mpath-35000c500d76137fb /dev/disk/by-id/dm-uuid-mpath-35000c500d75fceaf /dev/disk/by-id/dm-uuid-mpath-35000c500cb267be7 /dev/disk/by-id/dm-uuid-mpath-35000c500d75204fb /dev/disk/by-id/dm-uuid-mpath-35000c500d75e9d1b /dev/disk/by-id/dm-uuid-mpath-35000c500d7588383 /dev/disk/by-id/dm-uuid-mpath-35000c500cb310e3b /dev/disk/by-id/dm-uuid-mpath-35000c500d75f32c3 zpool create -O canmountoff -o cachefilenone -f ostpool6 raidz2 /dev/disk/by-id/dm-uuid-mpath-35000c500ae3fe1df /dev/disk/by-id/dm-uuid-mpath-35000c500ae415cfb /dev/disk/by-id/dm-uuid-mpath-35000c500ae43608f /dev/disk/by-id/dm-uuid-mpath-35000c500ae409607 /dev/disk/by-id/dm-uuid-mpath-35000c500ae42344b /dev/disk/by-id/dm-uuid-mpath-35000c500ae412df3 /dev/disk/by-id/dm-uuid-mpath-35000c500ae3fcfeb /dev/disk/by-id/dm-uuid-mpath-35000c500ae2aca6b /dev/disk/by-id/dm-uuid-mpath-35000c500ae3fd05f /dev/disk/by-id/dm-uuid-mpath-35000c500ae42288f zpool create -O canmountoff -o cachefilenone -f ostpool7 raidz2 /dev/disk/by-id/dm-uuid-mpath-35000c500ae40855f /dev/disk/by-id/dm-uuid-mpath-35000c500ae34d007 /dev/disk/by-id/dm-uuid-mpath-35000c500ae408dfb /dev/disk/by-id/dm-uuid-mpath-35000c500ae42ea27 /dev/disk/by-id/dm-uuid-mpath-35000c500ae40363b /dev/disk/by-id/dm-uuid-mpath-35000c500ae2b39d3 /dev/disk/by-id/dm-uuid-mpath-35000c500ae41b827 /dev/disk/by-id/dm-uuid-mpath-35000c500ae40c133 /dev/disk/by-id/dm-uuid-mpath-35000c500ae42ddb7 /dev/disk/by-id/dm-uuid-mpath-35000c500ae40b6d7查看zpool信息 [roottest-lustre2 ~]# zpool list NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT mdtpool1 1.45T 408K 1.45T - 0% 0% 1.00x ONLINE - ostpool1 145T 1.20M 145T - 0% 0% 1.00x ONLINE - ostpool2 145T 1.20M 145T - 0% 0% 1.00x ONLINE - ostpool3 145T 1.23M 145T - 0% 0% 1.00x ONLINE - ostpool4 145T 1.20M 145T - 0% 0% 1.00x ONLINE - ostpool5 145T 1.20M 145T - 0% 0% 1.00x ONLINE - ostpool6 145T 1.44M 145T - 0% 0% 1.00x ONLINE - ostpool7 145T 1.15M 145T - 0% 0% 1.00x ONLINE -4、部署lustre 所有集群节点配置如下配置Lustre集群网络生成spl_hostid加载模块启动服务 echo options lnet networkstcp0(bond0) /etc/modprobe.d/lustre.confgenhostiddepmod -a systemctl restart lustre任一集群节点配置如下部署mgt、mdt、ost角色 #部署mgt/mdt mkfs.lustre --mgs --mdt --index0 --fsnamelustrefs --mgsnode172.16.21.12tcp0 --mgsnode172.16.21.13tcp0 --servicenode 172.16.21.12tcp0 --servicenode 172.16.21.13tcp0 --backfstypezfs mdtpool1/mdt1 #部署ost mkfs.lustre --ost --index0 --fsnamelustrefs --mgsnode172.16.21.12tcp0 --mgsnode172.16.21.13tcp0 --servicenode 172.16.21.12tcp0 --servicenode 172.16.21.13tcp0 --backfstypezfs ostpool1/ost1 --mkfsoptions recordsize1024K mkfs.lustre --ost --index1 --fsnamelustrefs --mgsnode172.16.21.12tcp0 --mgsnode172.16.21.13tcp0 --servicenode 172.16.21.12tcp0 --servicenode 172.16.21.13tcp0 --backfstypezfs ostpool2/ost2 --mkfsoptions recordsize1024K mkfs.lustre --ost --index2 --fsnamelustrefs --mgsnode172.16.21.12tcp0 --mgsnode172.16.21.13tcp0 --servicenode 172.16.21.12tcp0 --servicenode 172.16.21.13tcp0 --backfstypezfs ostpool3/ost3 --mkfsoptions recordsize1024K mkfs.lustre --ost --index3 --fsnamelustrefs --mgsnode172.16.21.12tcp0 --mgsnode172.16.21.13tcp0 --servicenode 172.16.21.12tcp0 --servicenode 172.16.21.13tcp0 --backfstypezfs ostpool4/ost4 --mkfsoptions recordsize1024K mkfs.lustre --ost --index4 --fsnamelustrefs --mgsnode172.16.21.12tcp0 --mgsnode172.16.21.13tcp0 --servicenode 172.16.21.12tcp0 --servicenode 172.16.21.13tcp0 --backfstypezfs ostpool5/ost5 --mkfsoptions recordsize1024K mkfs.lustre --ost --index5 --fsnamelustrefs --mgsnode172.16.21.12tcp0 --mgsnode172.16.21.13tcp0 --servicenode 172.16.21.12tcp0 --servicenode 172.16.21.13tcp0 --backfstypezfs ostpool6/ost6 --mkfsoptions recordsize1024K mkfs.lustre --ost --index6 --fsnamelustrefs --mgsnode172.16.21.12tcp0 --mgsnode172.16.21.13tcp0 --servicenode 172.16.21.12tcp0 --servicenode 172.16.21.13tcp0 --backfstypezfs ostpool7/ost7 --mkfsoptions recordsize1024K5、配置Lustre角色高可用所有集群节点配置如下配置ZFS资源高可用创建lustre集群挂载目录 wget -P /usr/lib/ocf/resource.d/heartbeat https://raw.githubusercontent.com/ClusterLabs/resource-agents/master/heartbeat/ZFS chmod 755 /usr/lib/ocf/resource.d/heartbeat/ZFS chown root:root /usr/lib/ocf/resource.d/heartbeat/ZFSmkdir -p /lustrefs mkdir -p /lustre/mdt1 mkdir -p /lustre/ost1 mkdir -p /lustre/ost2 mkdir -p /lustre/ost3 mkdir -p /lustre/ost4 mkdir -p /lustre/ost5 mkdir -p /lustre/ost6 mkdir -p /lustre/ost7任一集群节点配置如下对luster角色创建挂载服务由pcs控制角色挂载注默认情况下pcs会平均将所有挂载分到两个节点下以mdt角色挂载服务举例 a、创建zfs-mdtpool1资源用于管理mdtpool1 zpool存储池导入 b、创建mdt1资源用于管理mdt1角色挂载 c、通过指定--group group_mdt1将zfs-mdtpool1、mdt1资源分到同一个资源组group_mdt1下从而控制先导入zpool存储池再执行挂载zpool存储池 pcs resource create zfs-mdtpool1 ocf:heartbeat:ZFS poolmdtpool1 --group group_mdt1 pcs resource create mdt1 ocf:lustre:Lustre targetmdtpool1/mdt1 mountpoint/lustre/mdt1/ --group group_mdt1pcs resource create zfs-ostpool1 ocf:heartbeat:ZFS poolostpool1 --group group_ost1 pcs resource create ost1 ocf:lustre:Lustre targetostpool1/ost1 mountpoint/lustre/ost1/ --group group_ost1pcs resource create zfs-ostpool2 ocf:heartbeat:ZFS poolostpool2 --group group_ost2 pcs resource create ost2 ocf:lustre:Lustre targetostpool2/ost2 mountpoint/lustre/ost2/ --group group_ost2pcs resource create zfs-ostpool3 ocf:heartbeat:ZFS poolostpool3 --group group_ost3 pcs resource create ost3 ocf:lustre:Lustre targetostpool3/ost3 mountpoint/lustre/ost3/ --group group_ost3pcs resource create zfs-ostpool4 ocf:heartbeat:ZFS poolostpool4 --group group_ost4 pcs resource create ost4 ocf:lustre:Lustre targetostpool4/ost4 mountpoint/lustre/ost4/ --group group_ost4pcs resource create zfs-ostpool5 ocf:heartbeat:ZFS poolostpool5 --group group_ost5 pcs resource create ost5 ocf:lustre:Lustre targetostpool5/ost5 mountpoint/lustre/ost5/ --group group_ost5pcs resource create zfs-ostpool6 ocf:heartbeat:ZFS poolostpool6 --group group_ost6 pcs resource create ost6 ocf:lustre:Lustre targetostpool6/ost6 mountpoint/lustre/ost6/ --group group_ost6pcs resource create zfs-ostpool7 ocf:heartbeat:ZFS poolostpool7 --group group_ost7 pcs resource create ost7 ocf:lustre:Lustre targetostpool7/ost7 mountpoint/lustre/ost7/ --group group_ost7客户端使用多个mgs nid地址挂载集群目录注当出现一个mgs故障后客户端会自动连接另外一个mgs访问集群目录 mount -t lustre 172.16.21.12tcp0:172.16.21.13tcp0:/lustrefs /lustrefs/查看当前pcs集群状态 [roottest-lustre2 ~]# pcs status Cluster name: my_clusterWARNINGS: Corosync and pacemaker node names do not match (IPs used in setup?)Stack: corosync Current DC: test-lustre2 (version 1.1.23-1.el7-9acf116022) - partition with quorum Last updated: Thu Jun 8 18:17:39 2023 Last change: Thu Jun 8 18:15:39 2023 by root via cibadmin on test-lustre22 nodes configured 16 resource instances configuredOnline: [ test-lustre2 test-lustre3 ]Full list of resources:Resource Group: group_mdtzfs-mdtpool (ocf::heartbeat:ZFS): Started test-lustre2mgt (ocf::lustre:Lustre): Started test-lustre2Resource Group: group_ost1zfs-ostpool1 (ocf::heartbeat:ZFS): Started test-lustre3ost1 (ocf::lustre:Lustre): Started test-lustre3Resource Group: group_ost2zfs-ostpool2 (ocf::heartbeat:ZFS): Started test-lustre2ost2 (ocf::lustre:Lustre): Started test-lustre2Resource Group: group_ost3zfs-ostpool3 (ocf::heartbeat:ZFS): Started test-lustre3ost3 (ocf::lustre:Lustre): Started test-lustre3Resource Group: group_ost4zfs-ostpool4 (ocf::heartbeat:ZFS): Started test-lustre2ost4 (ocf::lustre:Lustre): Started test-lustre2Resource Group: group_ost5zfs-ostpool5 (ocf::heartbeat:ZFS): Started test-lustre3ost5 (ocf::lustre:Lustre): Started test-lustre3Resource Group: group_ost6zfs-ostpool6 (ocf::heartbeat:ZFS): Started test-lustre2ost6 (ocf::lustre:Lustre): Started test-lustre2Resource Group: group_ost7zfs-ostpool7 (ocf::heartbeat:ZFS): Started test-lustre3ost7 (ocf::lustre:Lustre): Started test-lustre3Daemon Status:corosync: active/disabledpacemaker: active/disabledpcsd: active/enabled6、配置Lustre状态监控 6.1、Lustre网络状态监控通过lctl ping监控LNet网络状态当出现异常时将资源强制切换到另一集群节点上任一集群节点配置如下创建Lustre网络状态监控规则 pcs resource create healthLNET ocf:lustre:healthLNET lctltrue multiplier1000 devicebond0 host_list172.16.21.12tcp0 172.16.21.13tcp0 --clone任一集群节点配置如下将规则应用到所有Lustre资源下 pcs constraint location zfs-mdtpool1 rule score-INFINITY pingd lte 0 or not_defined pingd pcs constraint location mdt1 rule score-INFINITY pingd lte 0 or not_defined pingdpcs constraint location zfs-ostpool1 rule score-INFINITY pingd lte 0 or not_defined pingd pcs constraint location ost1 rule score-INFINITY pingd lte 0 or not_defined pingdpcs constraint location zfs-ostpool2 rule score-INFINITY pingd lte 0 or not_defined pingd pcs constraint location ost2 rule score-INFINITY pingd lte 0 or not_defined pingdpcs constraint location zfs-ostpool3 rule score-INFINITY pingd lte 0 or not_defined pingd pcs constraint location ost3 rule score-INFINITY pingd lte 0 or not_defined pingdpcs constraint location zfs-ostpool4 rule score-INFINITY pingd lte 0 or not_defined pingd pcs constraint location ost4 rule score-INFINITY pingd lte 0 or not_defined pingdpcs constraint location zfs-ostpool5 rule score-INFINITY pingd lte 0 or not_defined pingd pcs constraint location ost5 rule score-INFINITY pingd lte 0 or not_defined pingdpcs constraint location zfs-ostpool6 rule score-INFINITY pingd lte 0 or not_defined pingd pcs constraint location ost6 rule score-INFINITY pingd lte 0 or not_defined pingdpcs constraint location zfs-ostpool7 rule score-INFINITY pingd lte 0 or not_defined pingd pcs constraint location ost7 rule score-INFINITY pingd lte 0 or not_defined pingd6.2、Lustre集群状态监控通过lctl get_param health_check监控Lustre集群健康状态当出现异常时将资源强制切换到另一集群节点上任一集群节点配置如下创建Lustre集群状态监控规则 pcs resource create healthLUSTRE ocf:lustre:healthLUSTRE --clone任一集群节点配置如下将规则应用到所有Lustre资源下 pcs constraint location zfs-mdtpool1 rule score-INFINITY lustred lte 0 or not_defined lustred pcs constraint location mdt1 rule score-INFINITY lustred lte 0 or not_defined lustredpcs constraint location zfs-ostpool1 rule score-INFINITY lustred lte 0 or not_defined lustred pcs constraint location ost1 rule score-INFINITY lustred lte 0 or not_defined lustredpcs constraint location zfs-ostpool2 rule score-INFINITY lustred lte 0 or not_defined lustred pcs constraint location ost2 rule score-INFINITY lustred lte 0 or not_defined lustredpcs constraint location zfs-ostpool3 rule score-INFINITY lustred lte 0 or not_defined lustred pcs constraint location ost3 rule score-INFINITY lustred lte 0 or not_defined lustredpcs constraint location zfs-ostpool4 rule score-INFINITY lustred lte 0 or not_defined lustred pcs constraint location ost4 rule score-INFINITY lustred lte 0 or not_defined lustredpcs constraint location zfs-ostpool5 rule score-INFINITY lustred lte 0 or not_defined lustred pcs constraint location ost5 rule score-INFINITY lustred lte 0 or not_defined lustredpcs constraint location zfs-ostpool6 rule score-INFINITY lustred lte 0 or not_defined lustred pcs constraint location ost6 rule score-INFINITY lustred lte 0 or not_defined lustredpcs constraint location zfs-ostpool7 rule score-INFINITY lustred lte 0 or not_defined lustred pcs constraint location ost7 rule score-INFINITY lustred lte 0 or not_defined lustred查看当前资源状态 [roottest-lustre2 ~]# pcs resourceResource Group: group_mdtzfs-mdtpool (ocf::heartbeat:ZFS): Started test-lustre2mdt (ocf::lustre:Lustre): Started test-lustre2Resource Group: group_ost1zfs-ostpool1 (ocf::heartbeat:ZFS): Started test-lustre3ost1 (ocf::lustre:Lustre): Started test-lustre3Resource Group: group_ost2zfs-ostpool2 (ocf::heartbeat:ZFS): Started test-lustre2ost2 (ocf::lustre:Lustre): Started test-lustre2Resource Group: group_ost3zfs-ostpool3 (ocf::heartbeat:ZFS): Started test-lustre3ost3 (ocf::lustre:Lustre): Started test-lustre3Resource Group: group_ost4zfs-ostpool4 (ocf::heartbeat:ZFS): Started test-lustre2ost4 (ocf::lustre:Lustre): Started test-lustre2Resource Group: group_ost5zfs-ostpool5 (ocf::heartbeat:ZFS): Started test-lustre3ost5 (ocf::lustre:Lustre): Started test-lustre3Resource Group: group_ost6zfs-ostpool6 (ocf::heartbeat:ZFS): Started test-lustre2ost6 (ocf::lustre:Lustre): Started test-lustre2Resource Group: group_ost7zfs-ostpool7 (ocf::heartbeat:ZFS): Started test-lustre3ost7 (ocf::lustre:Lustre): Started test-lustre3Clone Set: healthLNET-clone [healthLNET]Started: [ test-lustre2 test-lustre3 ]Clone Set: healthLUSTRE-clone [healthLUSTRE]Started: [ test-lustre2 test-lustre3 ]6.3、配置优化 6.3.1、设置故障恢复不回迁 how-can-i-set-resource-stickiness-in-pacemaker Redhat-6.4.资源元数据选项使用pcs创建pcs双机高可用出现A节点故障时A节点资源会自动迁移至B节点启动此时集群恢复正常当A节点故障恢复之后默认情况下会触发重平衡操作B节点会再次将部分正常的资源迁移到A节点以此将资源平均分配到两个节点上 pcs提供resource-stickiness选项设置资源粘性值这是一个资源元数据选项用于指定给定资源偏好保持在原位的程度默认值为0此处可设置参数值为100当节点故障恢复之后资源不会发生回迁操作避免资源来回切换导致集群无法恢复正常注集群恢复正常之后为保证资源负载可均摊到所有节点上可选择一个割接时间窗口手动move资源到另一个空闲节点上 # 设置pcs资源resource-stickiness默认参数值该参数值只对后面新建的资源有效 pcs resource defaults resource-stickiness100 # 设置已有资源group_ost3 resource-stickiness参数值 pcs resource meta group_ost3 resource-stickiness1006.3.2、设置资源超时时间默认情况下ost挂载启动超时时间为300s考虑到在极端情况下可能会出现启动超时情况可执行pcs resource update resource-id op start interval0s timeout600s命令适当延长启动超时时间 # 默认情况下ost挂载超时时间为300s [roottest-lustre2 ~]# pcs resource show group_ost4Group: group_ost4Resource: zfs-ostpool4 (classocf providerheartbeat typeZFS)Attributes: poolostpool4Operations: monitor interval5s timeout30s (zfs-ostpool4-monitor-interval-5s)start interval0s timeout60s (zfs-ostpool4-start-interval-0s)stop interval0s timeout60s (zfs-ostpool4-stop-interval-0s)Resource: ost4(classocf providerlustre typeLustre)Attributes: mountpoint/lustre/ost4/ targetostpool4/ost4Operations: monitor interval20s timeout300s (ost4-monitor-interval-20s)start interval0s timeout300s (ost4-start-interval-0s)stop interval0s timeout300s (ost4-stop-interval-0s) # 故障切换测试中出现ost4启动挂载超时导致资源未成功启动 Jun 16 14:55:02 Lustre(ost4)[40028]: INFO: Starting to mount ostpool4/ost4 Jun 16 15:00:02 [9194] test-lustre2 lrmd: warning: child_timeout_callback: ost4_start_0 process (PID 40028) timed out Jun 16 15:00:02 [9194] test-lustre2 lrmd: warning: operation_finished: ost4_start_0:40028 - timed out after 300000ms .... Jun 16 15:00:04 Lustre(ost4)[44092]: INFO: Starting to unmount ostpool4/ost4 Jun 16 15:00:04 Lustre(ost4)[44092]: INFO: ostpool4/ost4 unmounted successfully # 手动更新ost4启动超时时间为600s [roottest-lustre2 ~]# pcs resource update ost4 op start interval0s timeout600s [roottest-lustre2 ~]# pcs resource show ost4Resource: ost1 (classocf providerlustre typeLustre)Attributes: interval0s mountpoint/lustre/ost1/ targetostpool1/ost1 timeout600sOperations: monitor interval20s timeout300s (ost1-monitor-interval-20s)start interval0s timeout600s (ost1-start-interval-0s)stop interval0s timeout300s (ost1-stop-interval-0s)四、配置说明参考链接 Creating_Pacemaker_Resources_for_Lustre_Storage_Services [rootnode93 ~]# pcs resource list ocf:lustre ocf:lustre:healthLNET - LNet connectivity ocf:lustre:healthLUSTRE - lustre servers healthy ocf:lustre:Lustre - Lustre management五、维护操作 1、pcs集群管理查看pcs集群状态pcs cluster status 停止pcs集群所有节点/指定节点pcs cluster stop [--all | node-ip ] 启动pcs集群所有节点/指定节点pcs cluster start [--all | node-ip ] 禁用pcs集群所有节点/指定节点开机自启动pcs cluster disable [--all | node-ip ] 启用pcs集群所有节点/指定节点开机自启动pcs cluster enable [--all | node-ip ] 2、pcs资源管理查看pcs资源状态pcs resource show [resource -id] 注当不指定resource-id时可列出所有pcs资源状态当指定resource-id时可查看具体资源属性信息如启动超时时间、资源粘性设置等查看所有可用的资源代理pcs resource list 允许集群启用指定pcs资源pcs resource enable resource-id 停止、禁止集群启用指定pcs资源pcs resource disable resource-id 手动将运行的资源转移到另外一个节点pcs resource move resource-id destination-node 手动启动某一资源pcs resource debug-start resource-id 手动停止某一资源pcs resource debug-stop resource-id 清除所有资源失败操作记录重新执行资源分配pcs resource cleanup [resource-id] 注当不指定resource-id时可重新分配所有资源当指定resource-id时可重新分配指定资源当pcs资源出现FAILED test-lustre2 (blocked)状态时可手动对指定资源进行cleanup操作如操作失败可查看/var/log/cluster/corosync.log日志检查操作失败原因 # 参考示例如下对group-ost5资源进行cleanup操作其中zfs-ostpool5资源停止失败此时需检查test-lustre2节点zpool状态 Jun 16 14:25:17 test-lustre2 pengine[94332]: warning: Processing failed stop of zfs-ostpool5 on test-lustre2: unknown error Jun 16 14:25:17 test-lustre2 pengine[94332]: warning: Processing failed stop of zfs-ostpool5 on test-lustre2: unknown error Jun 16 14:25:17 test-lustre2 pengine[94332]: warning: Forcing zfs-ostpool5 away from test-lustre2 after 1000000 failures (max1000000) Jun 16 14:25:17 test-lustre2 pengine[94332]: notice: Scheduling shutdown of node test-lustre2 Jun 16 14:25:17 test-lustre2 pengine[94332]: notice: * Shutdown test-lustre2 Jun 16 14:25:17 test-lustre2 pengine[94332]: crit: Cannot shut down node test-lustre2 because of zfs-ostpool5: unmanaged failed (zfs-ostpool5_stop_0) Jun 16 14:25:17 test-lustre2 pengine[94332]: notice: Calculated transition 3, saving inputs in /var/lib/pacemaker/pengine/pe-input-154.bz2 # 如执行zpool list查看zpool状态处于SUSPENDED正常状态为ONLINE时则需手动执行zpool clear ostpool5恢复异常存储池之后再对资源进行cleanup操作

查看全文

http://www.zqtcl.cn/news/724530/