VM timekeeping: Using the PTP Hardware Clock on KVM

Background

In my last post I described the setup I use to provide time synchronisation to the hosts I maintain in the NTP pool. I only recently learned about the PTP Hardware Clock (PHC) device driver available in KVM, and started testing it in earnest earlier this year.

The main reason to use the PHC is to more closely track the hypervisor host. This is possible because reading from the local PHC is designed to be very efficient and it incurs much less overhead than performing an NTP request and response over the network.

Obviously this requires the host to have a clock worth tracking. My previous post explained the setup I use, but the general guidelines for a VM host would be the same as for any quality NTP setup:

  1. Use at least four clock sources, with a diversity of reference clocks.
  2. Peers should be selected on the basis of reliable network connectivity and reliable timekeeping performance. If you're using a host from the public pool, checking its pool score page is highly recommended. Here's an example: 150.101.186.48.
  3. All other things being equal, closer hosts (in terms of network delay) are better than further away hosts, because they allow NTP to constrain its error estimations to a narrower range.

Performance improvements - chrony

So how much difference does it make using the PHC? The first system I enabled (after validation and playbook development on a test VM) was my chronyd pool server. It is using 2 vCPUs and 1 GB RAM on a host with a 6-core AMD Ryzen 5 Pro 3600 CPU, with a maximum clock speed of 3.6 GHz.

I was already quite happy with the time sync performance of this VM - it was reporting a system offset within ± 50 µs:

(One week graph of chronyd system offset)

The system frequency error ranged between about -19.44 and -19.64 ppm:

(One week graph of chronyd system frequency error)

And the root dispersion ranged between 80 and 280 µs:

(One week graph of chronyd system root dispersion)

After enabling the PHC device, the same VM actually reported a larger range of system frequency error, now ranging between -19.1 and -19.8 ppm:

(One week graph of chronyd system frequency error after enabling PHC)

But system offset reduced to ± 4 µs:

(One week graph of chronyd system offset after enabling PHC)

And maximum root dispersion less than 2 µs:

(One week graph of chronyd system root dispersion after enabling PHC)

Here are the graphs showing a few days on either side of this configuration change.

System offset:

(One week graph of chronyd system offset)

Frequency error:

(One week graph of chronyd system frequency error)

Root dispersion:

(One week graph of chronyd system root dispersion)

One point to note about this is that by default chrony sees the PHC device as stratum 0, rather than stratum 2 equivalent. More importantly, its figures for root dispersion and root delay are misleadingly low because they only account for the time taken to read the PHC device.

To address this, chrony allows the stratum and root delay of the PHC reference clock to be set manually in the config file. I set the root delay by looking at the host's root delay over a few days prior to the change, and picking a slightly higher value than the minimum (in my case I used 400 µs). Unfortunately, there's no way to obtain the host's root dispersion from the PHC device (hence why the AWS Nitro PHC driver reports clock error bound via a separate sysfs interface), and no mechanism that I'm aware of in chrony to adjust it (although it is influenced by the configured root delay).

Setting the stratum to an accurate value is also potentially problematic, because chrony uses stratum as part of its sync peer selection algorithm. I eventually settled on setting the PHC to stratum 1, so that it doesn't appear to be lower stratum than the stratum 1 servers on my local network, but is still likely to be selected as the sync peer under most circumstances.

Common prerequisite

Before setting up an NTP service to use the PHC device, there's one prerequisite: loading the ptp_kvm driver. It takes no options and should be available in all mainstream Linux kernel builds. Activating it is as simple as:

# modprobe ptp_kvm

On my systems this does not produce any kernel log message, because the core PTP driver is compiled into the default kernel. So the only indication that the module is loaded is the presence of a /dev/ptp0 device file, along with a symlink to it indicating that belongs to KVM:

# ls -la /dev/ptp*
crw------- 1 root root 248, 0 Apr 21 12:24 /dev/ptp0
lrwxrwxrwx 1 root root      4 Apr 21 12:24 /dev/ptp_kvm -> ptp0

To ensure that this is always loaded on boot, I add the driver name to /etc/modules on my Debian and Ubuntu systems:

ptp_kvm

Other Linux distributions may use a slightly different mechanism for this, e.g. a file in /etc/modules-load.d/ (which also works on Debian-based distros).

Chronyd configuration

Chrony has built-in support for the PHC device as a reference clock. To enable it, I use the following configuration line:

refclock PHC /dev/ptp0 poll 0 delay 0.0004 stratum 1

As mentioned above, the delay and stratum options are to tune these variables so that they're not reported as artificially low. The only other options are the name of the device file, and the poll interval (in powers of 2, so poll 0 means the reference clock is polled once every second). Chrony supports various other options for this and other reference clocks which can be found in man 5 chrony.conf.

After restarting chronyd, the PHC device shows up as a reference clock source:

# chronyc -n sources
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
#* PHC0                          1   0   377     1  -8135ns[-8131ns] +/-  200us
^+ 2001:44b8:2100:3f11::7b:6     1   8   377   185    +30us[  +34us] +/-  264us
^- 2001:44b8:2100:3f00::7b:102   2   8   377    36   -201us[ -200us] +/- 2192us
...

Because they're not NTP sources, chrony does not report the statistics for reference clocks in its measurements log. Instead, they are reported in the statistics and tracking logs. This means that at the moment I don't have measurements for them available via NTPmon. I may add support for the tracking log in future, but for the time being the changes in system offset and root dispersion are the main metrics I use to evaluate the effects of enabling the PHC device.

Ntpd configuration

The configuration of traditional ntpd for PHC support is slightly more complicated, because it doesn't have a reference clock driver for PTP devices. Instead, the phc2sys utility from the linuxptp package is used to provide a bridge between PTP devices and ntpd's shared memory driver. So a prerequisite for ntpd to use the PHC is to install this package:

# apt install linuxptp

By default phc2sys assumes that its time is coming via PTP from a supported NIC or similar, so the default systemd configuration for it is inappropriate. Instead, I created a local systemd service file:

# cat /etc/systemd/system/phc2sys.service
[Unit]
Description=Synchronize PTP hardware clock (PHC) to NTP SHM driver
Documentation=man:phc2sys

[Service]
CapabilityBoundingSet=cap_sys_time
EnvironmentFile=-/etc/default/phc2sys
ExecStart=/usr/sbin/phc2sys $PHC2SYS_OPTIONS
Restart=always
RestartSec=12s
Type=simple

[Install]
WantedBy=ntp.service

And a matching defaults file:

# cat /etc/default/phc2sys
PHC2SYS_OPTIONS=-E ntpshm -s /dev/ptp0 -O 0 -l 5

These options instruct phc2sys to use /dev/ptp0 as its time source and the NTP SHM driver as its destination, to use an offset of 0 between slave and master clocks (which is only relevant if you're using a PTP source which uses TAI rather than UTC), and to log at level 5 (LOG_NOTICE) rather than the default of level 6 (LOG_INFO). The log level adjustment is optional, but if it's not used, phc2sys will log a message like this every second:

Apr 20 22:05:06 ntp102 phc2sys[1132]: [213.454] CLOCK_REALTIME phc offset   -311328 s0 freq      +0 delay      0

Then to configure ntpd to use the SHM driver, I added these lines to /etc/ntp.conf:

server 127.127.28.0
fudge  127.127.28.0 stratum 1

As mentioned above, fudging the stratum is optional, but it means that the source appears is treated more like other sources on the network when the sync peer is selected.

After restarting ntpd the SHM source shows up in our list of sources:

# ntpq -np
remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
...
*127.127.28.0    .SHM.            1 l   49   64  377    0.000   +0.005   0.001
-2001:44b8:2100: 80.72.67.48      2 u   32   64  377    1.296   -0.148   0.283
+2001:44b8:2100: .PPS.            1 u    1   64  377    1.350   -0.018   0.168
+2001:44b8:2100: 80.72.67.48      2 u   22   64  377    1.180   +0.006   0.139
-2001:44b8:2100: 80.72.67.48      2 u   12   64  377    0.921   -0.018   0.136
...

Performance improvements - ntpd

Unlike chronyd, ntpd treats the SHM driver as a virtual NTP source, recording the same statistics in /var/log/ntpstats/peerstats as it does for NTP sources, and enabling the measurements to be directly compared.

Here's a Grafana dashboard snapshot for a one week period on either side of the point where I enabled the PHC device on my first pool server running ntpd, 150.101.186.48: https://snapshots.raintank.io/dashboard/snapshot/jQUiLFkHKNKJJSJZLz6ZiYpaKA3X2EH4

This VM uses 2 vCPUs and 512 MB RAM on a host with an older dual-core Intel Celeron 1037U CPU with a maximum clock speed of 1.8 GHz.

The individual peer values are a little hard to see in that dashboard and apparently can't be singled out, so here's a snapshot of the chronyd pool server mentioned above (which was already using the PHC on its own host), graphed alongside the PHC device:

(One week graph of ntpd source offsets)

Even though they aren't on the same VM hosts and therefore aren't tracking the same source clocks with their PHCs, their offset from each other still dropped by using the PHC.

Here are a few of the other highlights: system offset went from ± 225 µs to ± 50 µs:

(One week graph of ntpd system offset)

Frequency error, like chrony, didn't change much:

(One week graph of ntpd system frequency error)

Root dispersion went from a maximum of 40 ms to 2 ms:

(One week graph of ntpd system root dispersion)

Jitter (an ntpd-specific metric) went from a maximum of 0.00034 to 0.00015:

(One week graph of ntpd system jitter)

Next steps

In my next post I'll explore how using the ptp_kvm driver on AWS compares with using their Nitro-based microsecond-accurate time service.

Thanks

Special thanks to Dan Drown and Miroslav Lichvar for their advice and pointers as I learned about using the KVM PHC (although any inaccuracies in this post are mine!).