Fun with Linux server migrations, part 1

Server migrations with file system structure changes

Last night i completed a P2V migration of a 2 TB Linux file server. It was running on an old IBM x306 server with cheap SATA disks, and we were migrating it to a VMware environment with a SAS-connected disk array. This server is going to be rebuilt in the near future, so we didn't want to use the same amount of disk space (it was only about 60% full). Also, it was running Linux software RAID, which is not necessary under the new environment - the disk array handles RAID.

So i needed to rebuild the file systems and copy at the file level in order to migrate the server. Preserving the old personality but allowing for a new disk layout and a VM environment requires some care. I wanted to maximise my options in the case of something going wrong, so i made sure the system was plugged into a managed switch which i control. Here's the process i followed:

Create a new VM with the appropriate settings, including CPU, RAM, disk, and network. On ESXi 5, i prefer to use LSI Logic SAS emulation for disk controllers, and Intel E1000 emulation for NICs, because:
- both of these drivers are in the mainline Linux kernel, therefore
  - you don't end up with unmountable root file systems or unreachable networks when you first start up the VM, and
  - you don't have to run proprietary VMware drivers at all if you don't want
- they seem (anecdotally) to offer improved performance over the other emulated driver choices
Do a minimal install of the OS in the new VM; use a different IP address from the source server.
Set up file systems as desired. In this case, all non-system data is in /home, so i made that a separate virtual disk and created a file system on it.
From the target server, Pre-sync the data in /home. I used the command
```
rsync -avx sourceserver:/home/ /home/ --delete
```
The initial sync was the largest, but i ran it again several times over a week to ensure that the final sync was as short as possible.
Create an out-of-band network connection to the source server. You might already have this. In this case, the source server had a spare NIC which i put on our network management VLAN. Start an ssh session on the new network connection to ensure that the old system is still reachable while you're testing the new VM.
If the system runs a Red Hat-based distribution (this system uses CentOS 5), ensure that any MAC addresses are commented out in /etc/sysconfig/network-scripts/ifcfg-eth*. This ensures that when services are cut over, the new virtual NIC is not considered a new device, but takes on the settings of the old NIC.
Create an exclude file for the system data. I used these resources from OpenVZ and Slicehost to help me come up with an appropriate list of files to exclude. Here's what i ended up with:
```
/boot
/dev
/etc/fstab
/etc/lvm
/etc/mdadm.conf
/etc/modprobe*
/etc/modules
/etc/mtab
/etc/sysconfig/hwconf
/etc/udev
/lib/modules
/mnt
/net
/proc
/root/exclude.*
/sys
/tmp
/var/cache
/var/lock
/var/tmp
```
Some of the entries in the list above are not necessary due to the -x flag on rsync, which prevents it from crossing file system boundaries, but i wanted a fairly generic list that could be reused. This list should be a good start for CentOS 5 systems, but may need tweaking for other distros. The exclude file lists itself because i ran the rsync from the target and did not want to lose it when copying the root file system.
Ensure that an independent backup of the source server exists. Run it just before the outage window.

When the outage window arrives, shut down all services on the source and target which are not essential for the purposes of the copy. Here's a list of the ones i used for my system - your list will likely be different:

service acpid stop
service anacron stop
service apmd stop
service atd stop
service autofs stop
service bluetooth stop
service crond stop
service gpm stop
service hidd stop
service iscsid stop
service iscsi stop
service isdn stop
service netfs stop
service nfslock stop
service nfs stop
service pcscd stop
service portmap stop
service radiusd stop
service rawdevices stop
service rpcgssd stop
service rpcidmapd stop
service sendmail stop
service smartd stop
service smb stop
service syslog stop
service xfs stop
service ypbind stop
service yum-updatesd stop

Some of these might seem essential (e.g. syslog), but they're necessary for normal running of the system, not copying its personality to a new server. The basic idea is to minimise the amount of churn (especially logging) in the file systems being copied, while leaving networking and sshd running.

From the target server, run rsync with the delete flag for any non-root system partitions/LVs on the system drive. In my case, there was a separate /var partition. Note that the exclude file entries need to be relative to the partition being copied, so to copy /var, you might use an exclude file like this:
```
cache
lock
tmp
```
and a command like this:
```
rsync -avx sourceserver:/var/ /var/ --exclude-from=/root/exclude.var --delete
```
Be sure to run it with --dry-run first to make sure you're not trashing something you don't expect.
Copy the root partition/LV in a similar fashion:
```
rsync -avx sourceserver:/ / --exclude-from=/root/exclude.root --delete
```
The exclude file has the contents as shown in the main exclude list above. Again, don't forget --dry-run to test first.
Now the target VM has all the settings of the original server and is ready for the changeover. From the managed switch, disable the frontend port(s) leading to the source server, leaving the out-of-band port active. This prevents client traffic from going to the server.
After the rsyncs are finished, reboot the target VM, watching its startup with the VMware console. There will probably be a few services that will not be applicable under VMware (e.g. lm_sensors) - you can disable and/or remove these when convenient. The new VM should now have all the personality of the old server, including services, IP address, and data.
Once you've tested the target server and ensured that it is performing the source server's job appropriately, shut down the source server from ssh session you started on the out-of-band port earlier, then shut down the out-of-band port. This ensures that even if you're remote from the server and it is powered up again (either by mistake, or due to mains power loss and recovery), it won't be able to interfere with the operation of the new system.

This process went very smoothly for me last night. So smoothly, in fact, that i was a bit worried and ran a lot of extra tests afterwards to ensure that it really was successful. Fortunately, my fears were unfounded. ;-)