вторник, 25 апреля 2017 г.

Hot plug/unplug SATA drives on Linux

From time to time I need these commands.

Before hot-unplugging a disk drive sdX make sure that no file systems from the disk are mounted, no swap active, no LVM volumes active (N - partition number, VG - volume group, LV - logical volume, PV - physical volume):
lsblk /dev/sdX                 # see what's there
swapon -s | grep sdX           # check for active swap
swapoff /dev/sdXN              # deactivate swap if found
mount | grep sdX               # check for mounted FS
umount /dev/sdXN               # umount them if found
pvs | grep sdX                 # check for PVs
pvdisplay -m /dev/sdXN         # find LVs if a PV found
umount /dev/mapper/VG-LV       # umount LVs if found
lvchange -an /dev/mapper/VG-LV # deactivate LVs if any 
If it is the last PV of a volume group, the VG itself should be deactivated:
vgchange -an VG
Then prepare the device for unplugging (it makes the kernel send power down command to the drive and unregister it):
echo 1 > /sys/block/sdX/device/delete
And then unplug it physically.

After hot-plugging a disk drive on a host adapter X:
echo - - - > /sys/class/scsi_host/hostX/scan 
Then you can mount file systems, activate swap, activate LVs and mount them if needed.

вторник, 18 апреля 2017 г.

Hanoi tower algorithm to move from any to any state (1993)

In the year 1993 I have developed an algorithm to solve a generalized Hanoi tower problem: move from any state to any state.

The algorithm is very simple and uses a nice state representation: a ternary number. The right digit represents the number of rod where the largest disk resides, then the number of the next disk and so on to the number of rod of the smallest disk. 64 bit unsigned integer can hold state of up to 40 disks.

For the original Hanoi tower problem we have this encoding: source=0000...0, destination=2222...2.

It was also possible to use strings (with the largest disk at the left), but it was harder to allocate memory in this case, so I chose the integer representation.

It is easy to note that for moving a large disk it is required to move all smaller disks to an intermediate rod. So the algorithm first reduces the source and destination states if some large disks are on their target positions, then recursively moves the smaller disks to an intermediate rod, then moves the large disk, then recursively moves the smaller disks to their final destinations.

Here is the algorithm encoded in C (written in 1993, adapted now for ANSI C):

#include <stdio.h>
void Move(int f,int t)
        printf("%c%c  ",f+'0',t+'0');

void Hanoi3(unsigned from,unsigned to,int n)
        unsigned via,f,t,v;
        int i;

                return; /* nothing to do */
        /* reduce equal positions of largest disks */
        while(f==t && n>=0);
                return; /* nothing to do */

        /* now f is the source position of the largest disk,
           and t is its target position */

        v=3-f-t; /* intermediate rod */
        via=0; /* intermediate state */
        for(i=0; i<n; i++)


среда, 22 февраля 2017 г.

Recovering after loss of a physical volume with LVM2 on Linux

Recently I have lost a hard drive which contained a physical volume of LVM2.

It contained an insignificant file system with scratch data and a part of a RAID1 logical volume.

To recover, it was necessary to repair the RAID1 LV and remove other partial LV. I was surprised that the network lacked information on how to repair a RAID1 LV.

So here are the commands:

lvconvert --repair /dev/mapper/VG-LVraid1
lvremove /dev/mapper/VG-LVpartial
vgreduce --removemissing VG

четверг, 16 февраля 2017 г.

A few glitches with Fedora 24 installations

I have noticed multiple times that when a process seems to hang, strace sometimes can kick it forward and cause it to proceed and either terminate as it should or fail with a signal.

There was a strange hang within systemd (PID 1), it apparently created a forked copy of itself which hung, and when I tried to strace the child process it failed with a signal. Then the parent (PID 1) ran pause() but any tries to send it SIGTERM or SIGUSR1 caused nothing, the pause() system call did not interrupt. I think that all signals were blocked. The systemd (PID 1) did not process parentless zombies either. Reboot was required.

One time when restarting mysqld it took forever to terminate, but finished in a second when I ran strace on it.

Another time, df hung due to unresponsive nfs. After I had unmounted the nfs filesystem using "umount -l -f" the df still lingered for minutes until I ran strace on it.

I suspect there is a bug in the kernel. And a bug in systemd.