An issue was introduced into the md/raid1 kernel driver that causes failed writes not to be reported.
As a result, DataKeeper was unable to rewrite the failed data blocks when a mirror was resynchronized,
causing the target to be out of sync with the source. This patch installs a fix developed by SIOS to
the md/raid1 kernel driver on SPS for Linux versions 9.3.2 - 9.5.1. The SIOS fix has been submitted
and accepted by the maintainer of the md/raid1 kernel driver. Bug reports have been opened with the
Linux OS vendors to incorporate this fix into future kernels.
NOTE: If you are installing SPS for Linux on a new cluster with any of the affected kernels, you will
need to perform steps 1 - 5 and 9 after installing SPS for Linux and BEFORE creating your DataKeeper resources.
This patch provides an updated md/raid1 kernel module that will properly report all writes that fail.
LifeKeeper will verify that the proper md/raid1 module with the SIOS fix is loaded.
NOTE: The updated md/raid1 kernel module requires Secure Boot to be disabled.
It is important to verify that the resource is in-service on the server with the correct or best data before
doing the full resync in step 12 of the instructions below. If there is corrupted data on all servers then
a restore from backup may be necessary.
This patch can be found on the SIOS FTP site at the following location:
http://ftp.us.sios.com/pickup/HOTFIX-PL-9146-raid1_data_integrity_patch/
To download the patch and associated files on Linux perform the following steps:
wget http://ftp.us.sios.com/pickup/HOTFIX-PL-9146-raid1_data_integrity_patch/raid1_data_integrity_patch
wget http://ftp.us.sios.com/pickup/HOTFIX-PL-9146-raid1_data_integrity_patch/raid1_data_integrity_patch.md5sum
wget http://ftp.us.sios.com/pickup/HOTFIX-PL-9146-raid1_data_integrity_patch/readme.txt
NOTE: Alternative download methods can be used but must include all files.
IMPORTANT NOTE: Do NOT perform a rolling upgrade during initial installation of this patch. The instructions
below require LifeKeeper to be running on all servers and DataKeeper resources out-of-service on all servers to
install the patch. Do not bring DataKeeper resources in-service until all servers in the cluster have installed the patch.
On each cluster node perform the following steps:
1) Download the patch file and md5sum file:
raid1_data_integrity_patch
raid1_data_integrity_patch.md5sum
2) Verify the download by running the following command:
# md5sum -c raid1_data_integrity_patch.md5sum
3) The raid1_data_integrity_patch must be executable:
# chmod +x raid1_data_integrity_patch
4) Verify that you have the correct DataKeeper version that requires this patch:
# rpm -q steeleye-lkDR
steeleye-lkDR-9.3.2-6863.x86_64
steeleye-lkDR-9.4.0-6959.x86_64
steeleye-lkDR-9.4.1-6983.x86_64
steeleye-lkDR-9.5.0-7075.x86_64
steeleye-lkDR-9.5.1-7154.x86_64
Where one of these packages is installed. The patch is not intended or needed for any other version of LifeKeeper.
5) Verify that you are running an affected kernel version:
# uname -r
Distribution: Affected Kernels
---------------------------------------
RHEL/CentOS/OEL 8.2: All
RHEL 8.3: All
OEL 7.x UEK 5: 4.14.35-2025.400.8 <= Kernel <= 4.14.35-2047.504.1
SUSE 12 SP 4: Kernel >= 4.12.14-95.51
SUSE 12 SP 5: 4.12.14-122.20 <= Kernel <= 4.12.14-122.74.0
SUSE 15 SP 1: Kernel >= 4.12.14-197.37.1
SUSE 15 SP 2: 5.3.18-22.2 <= Kernel <= 5.3.18-24.67.1
Note: Please check the following documentation page for the most up-to-date list of affected kernel ranges:
https://docs.us.sios.com/Linux/current/LK4L/important-raid1-kernel-issue
If the kernel version you are running is not affected, then this patch is not needed.
6) Resume any paused DataKeeper mirrors:
# /opt/LifeKeeper/bin/mirror_action <datakeeper tag> resume
7) Stop all DataKeeper resources in the cluster. On each server run:
# /opt/LifeKeeper/bin/perform_action -a remove -t <datakeeper tag>
If there are DataKeeper resources in-service the patch will fail with the following error:
ERROR: All DataKeeper resources must be out of service before applying the patch. Please refer to the patch procedure in the documentation.
8) LifeKeeper should be running on all nodes in the cluster while installing the patch.
# /opt/LifeKeeper/bin/lcdstatus -q
Output should show the list of resources. The DataKeeper resources should be OSU.
9) Disable Secure Boot by taking one of the following actions
a) Disable Secure Boot in the UEFI configuration
b) Disable signature verification with the “mokutil --disable-validation” command. See mokutil documentation for details.
10) Install the patch (self extracting binary) on all servers:
a) Install using the default HADR packages delivered in patch
# ./raid1_data_integrity_patch
If the default HADR packages do not support the currently loaded kernel the following error will occur:
ERROR: Unable to locate a kernel module package for running kernel <kernel>. Please contact SIOS Customer Support (support@us.sios.com)
Use the command provided in step 10b to install the patch using the SIOS provided HADR package.
b) Execute the following only if you encountered an error in step 10a.
Install using a custom HADR package provided by SIOS
# ./raid1_data_integrity_patch --addHADR <hadr-rpm-file>
The patch installs a patched raid1 kernel module, an nbd kernel module (on RHEL, CentOS, and OEL), and LifeKeeper changes to verify the proper md/raid1 module is loaded.
The following rpm packages are installed:
steeleye-lkHOTFIX-DR-PL-9510-9.5.1-7154.x86_64
HADR-generic-9.5.2-7273.x86_64
HADR-<VENDOR>-<KERNEL>-9.5.2-<REVISION>.x86_64
NOTE: <VENDOR> is RHAS, SuSE, etc, <KERNEL> is the version that the HADR modules are built for, and <REVISION> is the LifeKeeper HADR revision.
11) Bring the DataKeeper resources in-service on the server where the data is correct. This is most likely the server where the DataKeeper resource was last in service.
# /opt/LifeKeeper/bin/perform_action -a restore -t <datakeeper tag>
NOTE: This will bring all resources in the hierarchy in-service. Include the ‘-b’ option to bring only the DataKeeper resource in-service, if you do not want all resources active.
12) It is important to verify that the data is correct. If there have been partial resyncs and switchovers then the data may be corrupt on both servers.
If there is corrupted data on all servers then a restore from backup may be necessary.
13) Force a full resync on the server where each DataKeeper resource is in-service:
# /opt/LifeKeeper/bin/mirror_action <datakeeper tag> pause
# /opt/LifeKeeper/bin/mirror_action <datakeeper tag> fullresync
14) When the full resync is complete any inconsistencies between the source and target will be resolved.
Note: If upgrading both the Linux kernel and SIOS Protection Suite
for Linux within the same maintenance window, please follow
the steps in the “Performing a planned kernel upgrade after
the patch has been installed” section below, then follow the
steps in this section to perform the upgrade of SPS-L.
To upgrade SIOS Protection Suite for Linux after applying the patch,
perform the following steps.
1) Resume any paused DataKeeper mirrors.
# /opt/LifeKeeper/bin/mirror_action <datakeeper tag> resume
2) Either take all mirror resources out of service on all servers or bring
all resources in-service on a single cluster server (the "primary server").
3) Perform the following steps on each backup server. To avoid potential
issues when using quorum functionality, only one backup server may be
upgraded at a time.
a) Uninstall the steeleye-lkHOTFIX-DR-PL-9510-9.5.1-7154.x86_64 package
and delete the DR-PL-9146 directory.
# rpm -e steeleye-lkHOTFIX-DR-PL-9510-9.5.1-7154.x86_64
# rm -rf /opt/LifeKeeper/SIOS_Hotfixes/DR-PL-9146
b) Mount the SIOS Protection Suite for Linux installation image and run
the setup script, ensuring that all required Application Recovery Kits
are selected for installation.
# mkdir /media
# mount sps.img /media -t iso9660 -o loop
# /media/setup
c) IMPORTANT: If upgrading to SIOS Protection Suite for Linux 9.5.1 or
earlier, the PL-9146 patch (raid1_data_integrity_patch) must be
reapplied. This step is not required when upgrading to a version of
SIOS Protection Suite for Linux later than 9.5.1.
# chmod +x raid1_data_integrity_patch
# ./raid1_data_integrity_patch
4) If performing a rolling upgrade, perform a manual switchover of all
protected resources to one of the upgraded backup servers. Repeat steps
(a)-(c) given in step 3 to upgrade SIOS Protection Suite for Linux on
the original primary server.
5) Resources may now be brought in-service on any desired server.
To uninstall the patch, perform the following steps on each cluster node:
1) Resume any paused DataKeeper mirrors.
# /opt/LifeKeeper/bin/mirror_action <datakeeper tag> resume
2) Take all DataKeeper mirrors out of service.
# /opt/LifeKeeper/bin/perform_action -a remove -t <datakeeper tag>
3) Remove the LifeKeeper HOTFIX (LifeKeeper startup raid1 check):
# rpm -e steeleye-lkHOTFIX-DR-PL-9510-9.5.1-7154.x86_64
4) Find and remove the HADR package with the SIOS patched md/raid1 module:
# rpm -qa | grep HADR-
# rpm -e HADR-<VENDOR>-<KERNEL>-9.5.2-<REVISION>.x86_64
NOTE: Please be aware that uninstalling the patch while running on an affected
kernel will expose you to potential data corruption. The <VENDOR>, <KERNEL>,
and <REVISION> are specific to the HADR package that was installed and should
match the relevant package found in the ‘rpm -qa | grep HADR-’ output.
5) Remove PL-9146 README:
# rm -f /opt/LifeKeeper/SIOS_Hotfixes/DR-PL-9146/README.txt
# rmdir /opt/LifeKeeper/SIOS_Hotfixes/DR-PL-9146
6) For RHEL, CentOS, and OEL, reinstall the currently installed version of
SIOS Protection Suite for Linux. This step is required in order to reinstall
the distribution-specific HADR package included with the particular SPS-L release.
Resources may now be brought in-service on any desired server.
To upgrade the running Linux kernel after applying the patch, perform the following steps.
1) Resume any paused DataKeeper mirrors.
# /opt/LifeKeeper/bin/mirror_action <datakeeper tag> resume
2) Either take all mirror resources out of service on all servers or
bring all resources in-service on a single cluster server (the “primary server”).
3) Perform the following steps on each backup server. To avoid
potential issues when using quorum functionality, only one backup
server may be upgraded at a time.
a) Delete the DR-PL-9146 directory and uninstall the
HADR-<VENDOR>-<KERNEL>-9.5.2-<REVISION>.x86_64 and
steeleye-lkHOTFIX-DR-PL-9510-9.5.1-7154.x86_64 packages.
# rm -rf /opt/LifeKeeper/SIOS_Hotfixes/DR-PL-9146
# rpm -e $(rpm -qa | grep HADR- | grep -v generic)
# rpm -e steeleye-lkHOTFIX-DR-PL-9510-9.5.1-7154.x86_64
b) Upgrade the kernel and reboot.
c) Important: If the upgraded kernel version is still in the range
affected by PL-9146, then the PL-9146 patch must be reapplied.
# ./raid1_data_integrity_patch
If the raid1 kernel module provided by the upgraded kernel
package is no longer affected by the issue described in PL-9146
(see https://docs.us.sios.com/Linux/current/LK4L/important-raid1-kernel-issue),
then reinstallation of the PL-9146 patch is not required.
However, users running RHEL, CentOS, or OEL RHCK must re-run the
SIOS Protection Suite for Linux setup script for their currently
installed SPS-L version to reinstall the required
distribution-specific HADR package. This step is not required on
SLES or OEL UEK.
# mkdir /media
# mount sps.img /media -t iso9660 -o loop
# /media/setup
4) If performing a rolling kernel upgrade, perform a manual switchover
of all protected resources to one of the upgraded backup servers.
Repeat steps (a)-(c) given in step 3 to upgrade the kernel on the
original primary server.
5) Resources may now be brought in-service on any desired server in the cluster.
Performing an inadvertent kernel upgrade (i.e., without following the
steps in the “Performing a planned kernel upgrade after the patch has
been installed” section) will cause the OS vendor-provided raid1
kernel module to be reloaded. The PL-9510 hotfix (installed as part of
the PL-9146 patch) will perform a check during LifeKeeper startup to
ensure that the SIOS-provided patched raid1 module is still loaded.
When this check fails, LifeKeeper startup will fail until the issue is
corrected. In this situation, the user must either:
1) roll back to the previous kernel,
2) reinstall the PL-9146 patch (if the upgraded kernel version is
still within the range of kernel versions affected by PL-9146), or
3) uninstall the PL-9146 patch (if the upgraded kernel version is no
longer affected by PL-9146). See the “Uninstalling the patch”
section above for more details.
Note: Please check the following documentation page for the most up-to-date
list of affected kernel ranges:
https://docs.us.sios.com/Linux/current/LK4L/important-raid1-kernel-issue
If necessary, please refer to your Operating System documentation for
steps to restrict automatic kernel updates. Also, you may refer to
Solution 995 in the SIOS Self Service portal for steps to restrict
automatic kernel updates on RHEL (log into the Customer Portal first
to access).