Installation Steps for raid1_data_integrity_patch --------------------------------------------------------------- Issue(s) addressed by this patch (PL-9146): An issue was introduced into the md/raid1 kernel driver that causes failed writes not to be reported. As a result, DataKeeper was unable to rewrite the failed data blocks when a mirror was resynchronized, causing the target to be out of sync with the source. This patch installs a fix developed by SIOS to the md/raid1 kernel driver on SPS for Linux versions 9.3.2 - 9.5.1. The SIOS fix has been submitted and accepted by the maintainer of the md/raid1 kernel driver. Bug reports have been opened with the Linux OS vendors to incorporate this fix into future kernels. NOTE: If you are installing SPS for Linux on a new cluster with any of the affected kernels, you will need to perform steps 1 - 5 and 9 after installing SPS for Linux and BEFORE creating your DataKeeper resources. ------------------------ Patch Description: This patch provides an updated md/raid1 kernel module that will properly report all writes that fail. LifeKeeper will verify that the proper md/raid1 module with the SIOS fix is loaded. NOTE: The updated md/raid1 kernel module requires Secure Boot to be disabled. --------------------------- Data Integrity check: It is important to verify that the resource is in-service on the server with the correct or best data before doing the full resync in step 12 of the instructions below. If there is corrupted data on all servers then a restore from backup may be necessary. ----------------------- Getting the patch: This patch can be found on the SIOS FTP site at the following location: http://ftp.us.sios.com/pickup/HOTFIX-PL-9146-raid1_data_integrity_patch/ To download the patch and associated files on Linux perform the following steps: wget http://ftp.us.sios.com/pickup/HOTFIX-PL-9146-raid1_data_integrity_patch/raid1_data_integrity_patch wget http://ftp.us.sios.com/pickup/HOTFIX-PL-9146-raid1_data_integrity_patch/raid1_data_integrity_patch.md5sum wget http://ftp.us.sios.com/pickup/HOTFIX-PL-9146-raid1_data_integrity_patch/readme.txt NOTE: Alternative download methods can be used but must include all files. --------------------------- IMPORTANT NOTE: Do NOT perform a rolling upgrade during initial installation of this patch. The instructions below require LifeKeeper to be running on all servers and DataKeeper resources out-of-service on all servers to install the patch. Do not bring DataKeeper resources in-service until all servers in the cluster have installed the patch. ----------------------- Patch Installation: On each cluster node perform the following steps: 1) Download the patch file and md5sum file: raid1_data_integrity_patch raid1_data_integrity_patch.md5sum 2) Verify the download by running the following command: # md5sum -c raid1_data_integrity_patch.md5sum 3) The raid1_data_integrity_patch must be executable: # chmod +x raid1_data_integrity_patch 4) Verify that you have the correct DataKeeper version that requires this patch: # rpm -q steeleye-lkDR steeleye-lkDR-9.3.2-6863.x86_64 steeleye-lkDR-9.4.0-6959.x86_64 steeleye-lkDR-9.4.1-6983.x86_64 steeleye-lkDR-9.5.0-7075.x86_64 steeleye-lkDR-9.5.1-7154.x86_64 Where one of these packages is installed. The patch is not intended or needed for any other version of LifeKeeper. 5) Verify that you are running an affected kernel version: # uname -r Distribution: Affected Kernels --------------------------------------- RHEL/CentOS/OEL 8.2: All RHEL 8.3: All OEL 7.x UEK 5: 4.14.35-2025.400.8 <= Kernel <= 4.14.35-2047.504.1 SUSE 12 SP 4: Kernel >= 4.12.14-95.51 SUSE 12 SP 5: 4.12.14-122.20 <= Kernel <= 4.12.14-122.74.0 SUSE 15 SP 1: Kernel >= 4.12.14-197.37.1 SUSE 15 SP 2: 5.3.18-22.2 <= Kernel <= 5.3.18-24.67.1 Note: Please check the following documentation page for the most up-to-date list of affected kernel ranges: https://docs.us.sios.com/Linux/current/LK4L/important-raid1-kernel-issue If the kernel version you are running is not affected, then this patch is not needed. 6) Resume any paused DataKeeper mirrors: # /opt/LifeKeeper/bin/mirror_action resume 7) Stop all DataKeeper resources in the cluster. On each server run: # /opt/LifeKeeper/bin/perform_action -a remove -t If there are DataKeeper resources in-service the patch will fail with the following error: ERROR: All DataKeeper resources must be out of service before applying the patch. Please refer to the patch procedure in the documentation. 8) LifeKeeper should be running on all nodes in the cluster while installing the patch. # /opt/LifeKeeper/bin/lcdstatus -q Output should show the list of resources. The DataKeeper resources should be OSU. 9) Disable Secure Boot by taking one of the following actions a) Disable Secure Boot in the UEFI configuration b) Disable signature verification with the “mokutil --disable-validation” command. See mokutil documentation for details. 10) Install the patch (self extracting binary) on all servers: a) Install using the default HADR packages delivered in patch # ./raid1_data_integrity_patch If the default HADR packages do not support the currently loaded kernel the following error will occur: ERROR: Unable to locate a kernel module package for running kernel . Please contact SIOS Customer Support (support@us.sios.com) Use the command provided in step 10b to install the patch using the SIOS provided HADR package. b) Execute the following, if you had encountered an error (see #10a) Install using a custom HADR package provided by SIOS # ./raid1_data_integrity_patch --addHADR The patch installs a patched raid1 kernel module, an nbd kernel module (on RHEL, CentOS, and OEL), and LifeKeeper changes to verify the proper md/raid1 module is loaded. The following rpm packages are installed: steeleye-lkHOTFIX-DR-PL-9510-9.5.1-7154.x86_64 HADR-generic-9.5.2-7273.x86_64 HADR---9.5.2-.x86_64 NOTE: is RHAS, SuSE, etc, is the version that the HADR modules are built for, and is the LifeKeeper HADR revision. 11) Bring the DataKeeper resources in-service on the server where the data is correct. This is most likely the server where the DataKeeper resource was last in service. # /opt/LifeKeeper/bin/perform_action -a restore -t NOTE: this will bring all resources in the hierarchy in-service. Include the ‘-b’ option to bring only the DataKeeper resource in-service, if you do not want all resources active. 12) It is important to verify that the data is correct. If there have been partial resyncs and switchovers then the data may be corrupt on both servers. If there is corrupted data on all servers then a restore from backup may be necessary. 13) Force a full resync on the server where each DataKeeper resource is in-service: # /opt/LifeKeeper/bin/mirror_action pause # /opt/LifeKeeper/bin/mirror_action fullresync 14) When the full resync is complete any inconsistencies between the source and target will be resolved. ----------------------- Upgrading SIOS Protection Suite for Linux after the patch is applied: Note: If upgrading both the Linux kernel and SIOS Protection Suite for Linux within the same maintenance window, please follow the steps in the “Performing a planned kernel upgrade after the patch has been installed” section below, then follow the steps in this section to perform the upgrade of SPS-L. To upgrade SIOS Protection Suite for Linux after applying the patch, perform the following steps. 1) Resume any paused DataKeeper mirrors. # /opt/LifeKeeper/bin/mirror_action resume 2) Either take all mirror resources out of service on all servers or bring all resources in-service on a single cluster server (the "primary server"). 3) Perform the following steps on each backup server. To avoid potential issues when using quorum functionality, only one backup server may be upgraded at a time. a) Uninstall the steeleye-lkHOTFIX-DR-PL-9510-9.5.1-7154.x86_64 package and delete the DR-PL-9146 directory. # rpm -e steeleye-lkHOTFIX-DR-PL-9510-9.5.1-7154.x86_64 # rm -rf /opt/LifeKeeper/SIOS_Hotfixes/DR-PL-9146 b) Mount the SIOS Protection Suite for Linux installation image and run the setup script, ensuring that all required Application Recovery Kits are selected for installation. # mkdir /media # mount sps.img /media -t iso9660 -o loop # /media/setup c) IMPORTANT: If upgrading to SIOS Protection Suite for Linux 9.5.1 or earlier, the PL-9146 patch (raid1_data_integrity_patch) must be reapplied. This step is not required when upgrading to a version of SIOS Protection Suite for Linux later than 9.5.1. # chmod +x raid1_data_integrity_patch # ./raid1_data_integrity_patch 4) If performing a rolling upgrade, perform a manual switchover of all protected resources to one of the upgraded backup servers. Repeat steps (a)-(c) given in step 3 to upgrade SIOS Protection Suite for Linux on the original primary server. 5) Resources may now be brought in-service on any desired server. ----------------------------- Uninstalling the patch: To uninstall the patch, perform the following steps on each cluster node: 1) Resume any paused DataKeeper mirrors. # /opt/LifeKeeper/bin/mirror_action resume 2) Take all DataKeeper mirrors out of service. # /opt/LifeKeeper/bin/perform_action -a remove -t 3) Remove the LifeKeeper HOTFIX (LifeKeeper startup raid1 check): # rpm -e steeleye-lkHOTFIX-DR-PL-9510-9.5.1-7154.x86_64 4) Find and remove the HADR package with the SIOS patched md/raid1 module: # rpm -qa | grep HADR- # rpm -e HADR---9.5.2-.x86_64 NOTE: Please be aware that uninstalling the patch while running on an affected kernel will expose you to potential data corruption. The , , and are specific to the HADR package that was installed and should match the relevant package found in the ‘rpm -qa | grep HADR-’ output. 5) Remove PL-9146 README: # rm -f /opt/LifeKeeper/SIOS_Hotfixes/DR-PL-9146/README.txt # rmdir /opt/LifeKeeper/SIOS_Hotfixes/DR-PL-9146 6) For RHEL, CentOS, and OEL, reinstall the currently installed version of SIOS Protection Suite for Linux. This step is required in order to reinstall the distribution-specific HADR package included with the particular SPS-L release. Resources may now be brought in-service on any desired server. ----------------------------- Performing a planned kernel upgrade after the patch has been installed: To upgrade the running Linux kernel after applying the patch, perform the following steps. 1) Resume any paused DataKeeper mirrors. # /opt/LifeKeeper/bin/mirror_action resume 2) Either take all mirror resources out of service on all servers or bring all resources in-service on a single cluster server (the “primary server”). 3) Perform the following steps on each backup server. To avoid potential issues when using quorum functionality, only one backup server may be upgraded at a time. a) Delete the DR-PL-9146 directory and uninstall the HADR---9.5.2-.x86_64 and steeleye-lkHOTFIX-DR-PL-9510-9.5.1-7154.x86_64 packages. # rm -rf /opt/LifeKeeper/SIOS_Hotfixes/DR-PL-9146 # rpm -e $(rpm -qa | grep HADR- | grep -v generic) # rpm -e steeleye-lkHOTFIX-DR-PL-9510-9.5.1-7154.x86_64 b) Upgrade the kernel and reboot. c) Important: If the upgraded kernel version is still in the range affected by PL-9146, then the PL-9146 patch must be reapplied. # ./raid1_data_integrity_patch If the raid1 kernel module provided by the upgraded kernel package is no longer affected by the issue described in PL-9146 (see https://docs.us.sios.com/Linux/current/LK4L/important-raid1-kernel-issue), then reinstallation of the PL-9146 patch is not required. However, users running RHEL, CentOS, or OEL RHCK must re-run the SIOS Protection Suite for Linux setup script for their currently installed SPS-L version to reinstall the required distribution-specific HADR package. This step is not required on SLES or OEL UEK. # mkdir /media # mount sps.img /media -t iso9660 -o loop # /media/setup 4) If performing a rolling kernel upgrade, perform a manual switchover of all protected resources to one of the upgraded backup servers. Repeat steps (a)-(c) given in step 3 to upgrade the kernel on the original primary server. 5) Resources may now be brought in-service on any desired server in the cluster. ----------------------------- Recovering from an unplanned kernel upgrade: Performing an inadvertent kernel upgrade (i.e., without following the steps in the “Performing a planned kernel upgrade after the patch has been installed” section) will cause the OS vendor-provided raid1 kernel module to be reloaded. The PL-9510 hotfix (installed as part of the PL-9146 patch) will perform a check during LifeKeeper startup to ensure that the SIOS-provided patched raid1 module is still loaded. When this check fails, LifeKeeper startup will fail until the issue is corrected. In this situation, the user must either: 1) roll back to the previous kernel, 2) reinstall the PL-9146 patch (if the upgraded kernel version is still within the range of kernel versions affected by PL-9146), or 3) uninstall the PL-9146 patch (if the upgraded kernel version is no longer affected by PL-9146). See the “Uninstalling the patch” section above for more details. Note: Please check the following documentation page for the most up-to-date list of affected kernel ranges: https://docs.us.sios.com/Linux/current/LK4L/important-raid1-kernel-issue If necessary, please refer to your Operating System documentation for steps to restrict automatic kernel updates. Also, you may refer to Solution 995 in the SIOS Self Service portal for steps to restrict automatic kernel updates on RHEL (log into the Customer Portal first to access).