Fun with Linux server migrations, part 1

Server migrations with file system structure changes

Last night i completed a P2V migration of a 2 TB Linux file server.  It was running on an old IBM x306 server with cheap SATA disks, and we were migrating it to a VMware environment with a SAS-connected disk array.  This server is going to be rebuilt in the near future, so we didn't want to use the same amount of disk space (it was only about 60% full).  Also, it was running Linux software RAID, which is not necessary under the new environment - the disk array handles RAID.

So i needed to rebuild the file systems and copy at the file level in order to migrate the server.  Preserving the old personality but allowing for a new disk layout and a VM environment requires some care.  I wanted to maximise my options in the case of something going wrong, so i made sure the system was plugged into a managed switch which i control.  Here's the process i followed:

  1. Create a new VM with the appropriate settings, including CPU, RAM, disk, and network.  On ESXi 5, i prefer to use LSI Logic SAS emulation for disk controllers, and Intel E1000 emulation for NICs, because:

    • both of these drivers are in the mainline Linux kernel, therefore
      • you don't end up with unmountable root file systems or unreachable networks when you first start up the VM, and
      • you don't have to run proprietary VMware drivers at all if you don't want
    • they seem (anecdotally) to offer improved performance over the other emulated driver choices
  2. Do a minimal install of the OS in the new VM; use a different IP address from the source server.

  3. Set up file systems as desired.  In this case, all non-system data is in /home, so i made that a separate virtual disk and created a file system on it.

  4. From the target server, Pre-sync the data in /home.  I used the command

    rsync -avx sourceserver:/home/ /home/ --delete
    

    The initial sync was the largest, but i ran it again several times over a week to ensure that the final sync was as short as possible.

  5. Create an out-of-band network connection to the source server.  You might already have this.  In this case, the source server had a spare NIC which i put on our network management VLAN.  Start an ssh session on the new network connection to ensure that the old system is still reachable while you're testing the new VM.

  6. If the system runs a Red Hat-based distribution (this system uses CentOS 5), ensure that any MAC addresses are commented out in /etc/sysconfig/network-scripts/ifcfg-eth*.  This ensures that when services are cut over, the new virtual NIC is not considered a new device, but takes on the settings of the old NIC.

  7. Create an exclude file for the system data.  I used these resources from OpenVZ and Slicehost to help me come up with an appropriate list of files to exclude.  Here's what i ended up with:

    /boot
    /dev
    /etc/fstab
    /etc/lvm
    /etc/mdadm.conf
    /etc/modprobe*
    /etc/modules
    /etc/mtab
    /etc/sysconfig/hwconf
    /etc/udev
    /lib/modules
    /mnt
    /net
    /proc
    /root/exclude.*
    /sys
    /tmp
    /var/cache
    /var/lock
    /var/tmp
    

    Some of the entries in the list above are not necessary due to the -x flag on rsync, which prevents it from crossing file system boundaries, but i wanted a fairly generic list that could be reused.  This list should be a good start for CentOS 5 systems, but may need tweaking for other distros.  The exclude file lists itself because i ran the rsync from the target and did not want to lose it when copying the root file system.

  8. Ensure that an independent backup of the source server exists.  Run it just before the outage window.

  9. When the outage window arrives, shut down all services on the source and target which are not essential for the purposes of the copy.  Here's a list of the ones i used for my system - your list will likely be different:

    service acpid stop
    service anacron stop
    service apmd stop
    service atd stop
    service autofs stop
    service bluetooth stop
    service crond stop
    service gpm stop
    service hidd stop
    service iscsid stop
    service iscsi stop
    service isdn stop
    service netfs stop
    service nfslock stop
    service nfs stop
    service pcscd stop
    service portmap stop
    service radiusd stop
    service rawdevices stop
    service rpcgssd stop
    service rpcidmapd stop
    service sendmail stop
    service smartd stop
    service smb stop
    service syslog stop
    service xfs stop
    service ypbind stop
    service yum-updatesd stop
    

    Some of these might seem essential (e.g. syslog), but they're necessary for normal running of the system, not copying its personality to a new server.  The basic idea is to minimise the amount of churn (especially logging) in the file systems being copied, while leaving networking and sshd running.

  10. From the target server, run rsync with the delete flag for any non-root system partitions/LVs on the system drive.  In my case, there was a separate /var partition.  Note that the exclude file entries need to be relative to the partition being copied, so to copy /var, you might use an exclude file like this:

    cache
    lock
    tmp
    

    and a command like this:

    rsync -avx sourceserver:/var/ /var/ --exclude-from=/root/exclude.var --delete
    

    Be sure to run it with --dry-run first to make sure you're not trashing something you don't expect.

  11. Copy the root partition/LV in a similar fashion:

    rsync -avx sourceserver:/ / --exclude-from=/root/exclude.root --delete
    

    The exclude file has the contents as shown in the main exclude list above.  Again, don't forget --dry-run to test first.

  12. Now the target VM has all the settings of the original server and is ready for the changeover.  From the managed switch, disable the frontend port(s) leading to the source server, leaving the out-of-band port active.  This prevents client traffic from going to the server.

  13. After the rsyncs are finished, reboot the target VM, watching its startup with the VMware console.  There will probably be a few services that will not be applicable under VMware (e.g. lm_sensors) - you can disable and/or remove these when convenient.  The new VM should now have all the personality of the old server, including services, IP address, and data.

  14. Once you've tested the target server and ensured that it is performing the source server's job appropriately, shut down the source server from ssh session you started on the out-of-band port earlier, then shut down the out-of-band port.  This ensures that even if you're remote from the server and it is powered up again (either by mistake, or due to mains power loss and recovery), it won't be able to interfere with the operation of the new system.

This process went very smoothly for me last night.  So smoothly, in fact, that i was a bit worried and ran a lot of extra tests afterwards to ensure that it really was successful.  Fortunately, my fears were unfounded.  ;-)