среда, 22 февраля 2017 г.

Recovering after loss of a physical volume with LVM2 on Linux

Recently I have lost a hard drive which contained a physical volume of LVM2.

It contained an insignificant file system with scratch data and a part of a RAID1 logical volume.

To recover, it was necessary to repair the RAID1 LV and remove other partial LV. I was surprised that the network lacked information on how to repair a RAID1 LV.

So here are the commands:

lvconvert --repair /dev/mapper/VG-LVraid1
lvremove /dev/mapper/VG-LVpartial
vgreduce --removemissing VG

четверг, 16 февраля 2017 г.

A few glitches with Fedora 24 installations

I have noticed multiple times that when a process seems to hang, strace sometimes can kick it forward and cause it to proceed and either terminate as it should or fail with a signal.

There was a strange hang within systemd (PID 1), it apparently created a forked copy of itself which hung, and when I tried to strace the child process it failed with a signal. Then the parent (PID 1) ran pause() but any tries to send it SIGTERM or SIGUSR1 caused nothing, the pause() system call did not interrupt. I think that all signals were blocked. The systemd (PID 1) did not process parentless zombies either. Reboot was required.

One time when restarting mysqld it took forever to terminate, but finished in a second when I ran strace on it.

Another time, df hung due to unresponsive nfs. After I had unmounted the nfs filesystem using "umount -l -f" the df still lingered for minutes until I ran strace on it.

I suspect there is a bug in the kernel. And a bug in systemd.

kernel-4.9.9-100.fc24.x86_64
systemd-229-18.fc24.x86_64

вторник, 6 декабря 2016 г.

Graceful squid restart

Squid is a well-known open source http proxy server.

Sometimes it is necessary to change its configuration, for example to update ACL lists. Fortunately it has the command "squid -k reconfigure". Unfortunately, during reconfiguration squid refuses new connections. If the configuration is complex and ACLs are large it can take several seconds.

Some people recommend setting up multiple squid servers with a load balancer to solve the problem, but I believe it's an overkill for small installations.

So here is my approach.

To avoid service disruption, start another copy of squid on the same machine with identical configuration but with different TCP ports and without persistent storage:
a_conf=/etc/squid/squid.conf
b_conf=/etc/squid/squid-b.conf
b_pid=/var/run/squid-b.pid

sed -e 's/^http_port \([0-9]\+\)/http_port 1\1/' \
    -e
'/^cache_dir/d' < $a_conf > $b_conf
echo "pid_filename $b_pid" >> $b_conf

squid -f $b_conf
Then redirect new connections to the new port (i.e. 13128 by default) using iptables NAT.

Reconfigure the first copy of squid using "squid -k reconf", then remove NAT redirection and shutdown the second copy of squid using "squid -f $b_conf -k shutdown".

воскресенье, 13 ноября 2016 г.

Port scan reporting

Internet worms scan the internet infecting new hosts, creating botnets, abusing services.

For a long time I have had implemented port scan detection and blocking script for our local users. The perl script analyzed netflow information and when a certain level was exceeded it informed the user by e-mail and blocked the port.

It had to handle a few special cases, for example smtp servers (MTA) easily exceeded threshold on ports 25 and 113, so the script probed the port 25 on the suspected host before taking an action. A few IP addresses were added to a white list.

Recently an idea to inform other providers came to me. Remote scanners were detected anyway, I had just to add an action. Whois service gives an abuse e-mail address, so composing a letter template was the only really creative task.

To avoid sending the letter too often I have added dynamic firewall rules with a timeout to block the scan traffic. So the warnings are sent in 1, 2, 4, 8 ... 32 days if the activity does not stop.

The script works and while Chinese internet providers largely ignore the warnings, I've received many replies which indicate that a real problem was noticed and fixed due to my messages.

It would be nice though if all ISP implemented local scan detection themselves.

пятница, 28 октября 2016 г.

Internet scanning activity

Recently I have analyzed scanning activity from the Internet on our ASBR routers.

Here are the most popular TCP ports being scanned (>1% of all scan attempts):
PortServicePercentPurpose
22ssh12.6877Shell access
23telnet9.4989Shell access
1433SQL server7.13683Data compromise, exploits
21320SpyBot proxy5.80395Spam
3389Remote Desktop5.71959Desktop access
3128Squid Proxy3.88055Spam
8080potential HTML proxy3.82993Spam
3306MySQL3.74557Data compromise
445SMB1.56909File compromise
103?1.48473
8888potential HTML proxy?1.45099Spam
110POP31.36663E-Mail compromise, spam
20FTP-data1.33288FTP exploit?
8000DVR control port1.14729DVR access
3398?1.14729
79finger1.13042User identity leak
789?1.09668
3397?1.0798
119nntp1.06293NNTP exploit
3396?1.04606
21FTP1.04606File compromise
465smtps1.02919Spam

Some ports and their purpose are unknown to me.

P.S. I have found this useful resource on port scanning statistics.

P.P.S. Top scanning providers: chinanet.cn.net (21%), cnnic.cn (10%), chinaunicom.cn (8%), chinamobile.com (3%).

четверг, 11 августа 2016 г.

GSM modem woes

I've had a task to send SMS messages over a bunch of GSM USB modems.

Gnokii proved to work well with Nokia phones only, so I have written a custom perl script to send AT commands directly.

It took some time to program encoding the SMS in packet mode, but the main difficulty was unreliability of the modems and/or USB subsystem of linux.

Sometimes /dev/ttyUSBx devices stopped responding. Unplugging the modems by hand seemed unworthy thing to do, so I searched the internet for solutions.

At first I have found usbreset.c and fortunately it could recover the /dev/ttyUSBx devices, but it required /dev/bus/usb/... path to work on, so I searched again and found a method of mapping ttyUSB to /dev/bus/usb here.

Then I have added the following function to my perl script and called it during modem initialization if the modem did not reply to the first AT command.


sub usbreset($) {
        my ($device)=@_;
        use File::Slurp;
        require 'sys/ioctl.ph';
        my ($rdev)=(stat($device))[6];
        return if !$rdev;
        my ($major,$minor)=($rdev>>8,$rdev&255);
        my $path="/sys/dev/char/$major:$minor/../../../..";
        my $bus=read_file("$path/busnum");
        my $dev=read_file("$path/devnum");
        my $ctl=sprintf('/dev/bus/usb/%03d/%03d',$bus,$dev);
        open my $fd,'>',$ctl || return;
        # USBDEVFS_RESET from linux/usbdevice_fs.h
        my $res=ioctl($fd,_IO(ord('U'),20),0);
        return if !defined $res;
        return 1;
}

I'll have to monitor the modems for some time to find out if there are other problems with them.

Feel free to use this piece of code if you like (public domain).

P.S. using a quality USB hub solved the problem completely.

пятница, 18 марта 2016 г.

VLAN calculation algorithm design

I've had a task to automate calculation and setting up VLANs on switches of our regional network.

The task was naturally split into two phases:
  1. calculate VLAN numbers on the switches and trunks;
  2. synchronize switch configurations to the calculated result.
The first one was not quite obvious, a new algorithm needed to be designed. The second was done by another programmer.

I have designed and implemented the algorithm. Key points:
  • We have list of devices, trunks, client port VLANs on each device as input data. VLAN sets for each trunk is the output. The network has multiple loops for redundancy.
  • VLAN should be included on a trunk if it is two-way on the trunk, this means that we can spread VLANs from their endpoints on the graph and then calculate VLANs on trunks as intersection of sets of one direction and the other.
  • We spread VLAN set from each device as a bit vector. Each trunk has two associated  VLAN sets, one for each direction.
  • If we use STP protocol, then VLANs should be unified across each group of connected cycles (which have a common trunk).
  • If we use ERPS, then the VLANs should be unified starting from most outer half-loop to the base loop.
  • We can prune VLAN propagation if the trunk already contains all VLANs from the propagation set (an exception was necessary for multiply connected non-switch devices), thus runtime is reduced significantly.
I used perl and Bit::Vector module. The first version did not prune VLAN propagation and it took 1-2 minutes to complete on our network topology (more than 1600 devices), and after implementing the pruning it takes just 4 seconds (including database queries).

The following article was invaluable for understanding multi-ring ERPS topologies:
D. Lee, K. Lee, S. Yoo and J. K. K. Rhee, "Efficient Ethernet Ring Mesh Network Design," in Journal of Lightwave Technology, vol. 29, no. 18, pp. 2677-2683, Sept.15, 2011.