A bad runner’s journey into bad running, part 3 – pushing for the 10K

[I originally started this post several years ago, but never got around to finishing it at the time. I think this instalment is more than overdue, and I hope to finish a few more over the coming weeks.]

A big change in my running came around October 2015, when my doctor did a routine blood test, and diagnosed fatty liver (like I didn’t know that already from the shape of my waistline?). He told me I needed to lose weight. He didn’t say how much or over how long, but I needed to lose some, then come back to him for another blood test.

That was the kick I needed. With a lot of support from my wife, I fired up MyFitnessPal, changed my diet, and managed to drop from 90 kg to under 78 kg over the following 6 months. Over the same period I set myself a goal of completing a 10K race, the Bridge to Brisbane. Running a 10K road race was a lot different from the running I had done so far (which was still 99% on the beach), and I was still not up to 5K without stopping, but I felt up to the challenge.

Getting to 10K was still a mental battle for me. When I think back on it now it seems quite strange, but at the time I wanted to stop all the time. I had to distract myself from the effort of running by playing mental tricks, including:

  • observing the environment and doing things like counting how many parrot or raptor species I saw
  • rehearsing entire albums in my head, including reciting all the lyrics and humming all the guitar solos to myself
  • visualising myself doing other fun things, like singing my favourite love song to my wife, or finally standing up properly on a surfboard

In the 12 months leading up to the Bridge to Brisbane, I managed 5K without stopping, did a couple of practice 10K runs, and covered a total of 423 km in training. However, I also managed to pick up my first significant injury, some upper shin pain which was at its worst going down hills. The B2B is a relatively hilly course, and I ended up running up the hills and walking down them. I was still reasonably happy with my result, and raised $400 for my chosen charity, Destiny Rescue.

Once I knew I could do 10K, I started making bigger plans. But first, I needed to overcome the niggling shin injury from my newly-formed habit of running in shoes on hard surfaces. My brother gave me the advice I needed in this particular case, and I’ve followed it ever since: don’t try to slow down on downhill segments – fighting gravity is a waste of energy and it’s really hard on your legs. Instead, let gravity naturally speed you up, and only slow down to stay in control. Obviously, I need to take into account my skill level, my shoes, and the terrain, but generally I’ve found this to be great advice.

VyOS Certified Network Engineer

This morning before work I sat for (and passed) my attempt at the newly-minted
VyOS Certified Network Engineer certification. Mostly this post is just to let folks know that the certification is out there and encourage them to take it, but also I want to compare it to another certification I recently passed, the AWS Certified Solutions Architect Associate.

I’ve liked VyOS (and its predecessor Vyatta Core) for a long time. It’s always my first choice when I want to test a new BGP or OSPF scenario or set up an IPsec VPN. Its compelling value proposition to me is that it turns Debian Linux into a network appliance with a Juniper-like CLI. Or to put it another way, VyOS is to routing as Cumulus Linux is to switching – a router that makes sense to both network engineers and Linux geeks.

The certification is different from most others I’ve done, being 100% practical. There are no written examination requirements, no multiple-choice questions. It presents a practical scenario with a number of broken configurations, which need to be fixed in order to pass the certification. (I’ve been told this is how the Red Hat Certified Engineer test is structured as well, though I haven’t experienced it first-hand.) It uses a browser-based VNC client to hook up to a dedicated training/certification scenario platform (find all the details in their blog announcing the certification).

The announcement claimed:

We tried to avoid obscure options so that an experienced VyOS user and network admin can pass it without preparing specially for that certification, but it still requires a pretty broad experience.

I think the exam stands up pretty well to that claim. To prepare, I read through the blueprint, made sure I could get at least 90% of the sample questions right without additional study, labbed up a BGP + IPsec/VTI scenario between my home network and AWS (learning a little about compatibility between IKEv1 and IKEv2 along the way!), and then booked the exam. Experienced network and Linux admins should find the certification relatively straightforward, and easily achievable within the two hours finish time allotted.

I had a couple of administrative difficulties (mostly due to my time zone being a long way from theirs) and a couple of very minor technical gotchas in the exam. (I never realised I was so dependent upon Ctrl-W when it comes to VyOS command-line editing, and this doesn’t work in a browser-based emulator.) The VyOS team were very apologetic about the administrative dramas, but honestly they were not really even an inconvenience. Typos and errors and failed technology are quite common in certification exams, but because the VCNE exam is based on actual VyOS running in a VM, there’s not a lot of text to get wrong, and you don’t get the level of quirkiness that simulations offer.

Contrast this with the AWS Certified Solutions Architect – Associate, which is a traditional multiple-choice exam administered by Pearson. I studied it from a paper book (I’ve never really learned well from the video training that many people swear by) for about 3 months off & on, and although I passed well, I never felt that it tested my knowledge in the right ways. And the multiple-choice format has given rise to the whole question-dumping industry which lurks in the shadows of many vendor certification studies.

On the negative side for the VyOS exam, there was no IPv6, which I think is a serious gap in any network-oriented certification nowadays. I also found the IPsec problem a little on the easy side. It’s hard for me to judge, but I think that the difficulty might be on the low end of intermediate level, which is where this certification is aimed.

Overall I think the VyOS CNE exam was my most pleasant certification experience yet, and one which demonstrates skills which actually matter in real life. I’m really glad to see Sentrium getting enough traction in the marketplace that a certification platform is commercially viable, and I’m keen to keep going with the certifications they offer.

Pros & cons of chronyd & ntpd

A friend asked me today: what are the pros and cons of chronyd and ntpd? I’ve used both for a while, but never actually sat down to think about this question. So here are some initial thoughts:

  • ntpd (the older of the two implementations):
    • has had more time to mature
    • has had more time for the codebase to become more kludgy and less maintainable
    • supports a wider range of hardware
    • supports more different operating systems
    • has more features generally
    • was initially developed in an era where security wasn’t as big a concern, so its codebase wasn’t designed with security in mind

  • chronyd has almost the opposite benefits and drawbacks:
    • it’s a newer implementation (although I’d still call it a mature codebase)
    • it has fewer features (although most people don’t need that many features)
    • its hardware and OS support is more limited
    • its code was developed with security in mind
    • I personally find its monitoring output less easy to read – compare the following output:
root@db:~# ntpq -np  
remote refid st t when poll reach delay offset jitter
===============================================================================
ntp.gear.dyndns .POOL. 16 p - 64 0 0.000 0.000 0.004
+172.22.160.61 172.22.254.53 2 u 46 64 1 0.466 -0.116 0.431
*172.22.254.53 .PPS. 1 u 47 64 1 0.798 -0.097 0.985
+172.22.254.1 172.22.254.53 2 u 46 64 1 1.098 -0.180 0.735
-172.22.254.20 172.22.254.53 2 u 43 64 1 0.618 -0.216 0.062
172.22.254.2 172.22.254.53 2 u 42 64 1 0.292 -0.278 0.061
root@db:~# tail -n 1 /var/log/ntpstats/peerstats
58582 35057.111 172.22.254.20 1314 -0.000215695 0.000617722 0.187592440 0.000062219

root@maas:~# chronyc -n sources
210 Number of sources = 4
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^* 172.22.254.53 1 10 377 33m +36us[ +496us] +/- 1481us
^- 172.22.254.20 2 10 377 19 +447us[ +447us] +/- 18ms
^- 172.22.254.2 2 10 377 194 -115us[ -115us] +/- 6183us
^- 172.22.160.61 2 10 377 90 -726us[ -726us] +/- 28ms

root@maas:~# tail -n 1 /var/log/chrony/measurements.log
2019-04-09 09:42:54 172.22.160.61 N 2 111 111 1111 6 6 0.00 5.516e-04 1.142e-03 1.005e-05 8.240e-04 3.461e-02 AC16FE35 4B D K

I find chrony’s switching of units from milliseconds to microseconds (and in some cases nanoseconds) and its floating point formatting in logs difficult to parse visually. However, chrony’s excellent security track record means it has been adopted as the default NTP implementation on both Ubuntu and Red Hat. Both implementations claim excellent accuracy, and although I’ve never compared them empirically, they both seem to deliver on that claim. I tend to use whichever is the default for the distribution I’m working on.

Making NTP best practices easy with Juju charms

NTP: a behind-the-scenes protocol

As I’ve mentioned before, Network Time Protocol is one of those oft-ignored-but-nonetheless-essential subsystems which is largely unknown, except to a select few.  Those who know it well generally fall into the following categories:

  1. time geeks who work on protocol standardisation and implementation,
  2. enthusiasts who tinker with GPS receivers or run servers in the NTP pool, or
  3. sysadmins who have dealt with the consequences of inaccurate time in operational systems. (I fall mostly into this third category, with some brief forays into the others.)

One of the consequences of NTP’s low profile is that many important best practices aren’t widely known and implemented, and in some cases, myths are perpetuated.

Fortunately, Ubuntu & other major Linux distributions come out of the box with a best-practice-informed NTP configuration which works pretty well.  So sometimes taking a hands-off approach to NTP is justified, because it mostly “just works” without any special care and attention.  However, some environments require tuning the NTP configuration to meet operational requirements.

When best practices require more

One such environment is Canonical’s managed OpenStack service, BootStack.  A primary service provided in BootStack is the distributed storage system, Ceph.  Ceph’s distributed architecture requires the system time on all nodes to be synchronised to within 50 milliseconds of each other.  Ordinarily NTP has no problem achieving synchronisation an order of magnitude better than this, but some of our customers run their private clouds in far-flung parts of the world, where reliable Internet bandwidth is limited, and high-quality local time sources are not available.  This has sometimes resulted in time offsets larger than Ceph will tolerate.

A technique for dealing with this problem is to select several local hosts to act as a service stratum between the global NTP pool and the other hosts in the environment.  The Juju ntp charms have supported this configuration for some time, and historically in BootStack we’ve achieved this by configuring two NTP services: one containing the manually-selected service stratum hosts, and one for all the remaining hosts.

We select hosts for the service stratum using a combination of the following factors:

  • Reasonable upstream Internet connectivity is needed.  It doesn’t have to be perfect – NTP can achieve less than 5 milliseconds offset over an ADSL line, and most of our customer private clouds have better than that.
  • Bare metal systems are preferred over VMs (but the latter are still workable).  Containers are not viable as NTP servers because the system clock is not virtualised; time synchronisation for containers should be provided by their host.
  • There should be no “choke points” in the NTP strata – these are bad for both accuracy and availability.  A minimum of 3 (but preferably 4-6) servers should be included in each stratum, and these should point to a similar number of higher-stratum NTP servers.
  • Because consistent time for Ceph is our primary goal, the Ceph hosts themselves should be clients rather than part of the service stratum, so that they get a consistent set of servers offering reliable response at local LAN latencies.

A manual service stratum deployment

Here’s a diagram depicting what a typical NTP deployment with a manual service stratum might look like (click for a larger image).

To deploy this in an existing BootStack environment, the sequence of commands might look something like this (application names are examples only):

# Create the two ntp applications:
$ juju deploy cs:ntp ntp-service
    # ntp-service will use the default pools configuration
$ juju deploy cs:ntp ntp-client
$ juju add-relation ntp-service:ntpmaster ntp-client:master
    # ntp-client uses ntp-service as its upstream stratum

# Deploy them to the cloud nodes:
$ juju add-relation infra-node ntp-service
    # deploys ntp-service to the existing infra-node service
$ juju add-relation compute-node ntp-client
    # deploys ntp-client to the existing compute-node service

Updating the ntp charm

It’s been my desire for some time to see this process made easier, more accurate, and less manual.  Our customers come to us wanting their private clouds to “just work”, and we can’t expect them to provide the ideal environment for Ceph.

One of my co-workers, Stuart Bishop, started me thinking with this quote:

“[O]ne of the original goals of charms [was to] encode best practice so software can be deployed by non-experts.”

That seemed like a worthy goal, so I set out to update the ntp charm to automate the service stratum host selection process.

Design criteria

My goals for this update to the charm were to:

  • provide a stable NTP service for the local cloud and avoid constantly changing upstream servers,
  • ensure that we don’t impact the NTP pool adversely, even if the charm is widely deployed to very large environments,
  • provide useful feedback in juju status which is sufficient to explain its choices,
  • use only functionality available in stock Ubuntu, Juju, and charm helpers, and
  • improve testability of the charm code and increase test suite coverage.

What it does

  • This functionality is enabled using the auto_peers configuration option; this option was previously deprecated, because it could be better achieved through juju relations.
  • On initial configuration of auto_peers, each host tests its latency to the configured time sources.
  • The charm inspects the machine type and the software running on the system, using this knowledge to reduce the likelihood of a Ceph, Swift, or Nova compute host being selected, and to increase the likelihood that bare metal hosts are used.  (This usually means that the Neutron gateways and infrastructure/monitoring hosts are more likely to be selected.)
  • The above factors are then combined into an overall suitability score for the host.  Each host compares its score to the other hosts in the same juju service to determine whether it should be part of the service stratum.
  • The results of the scoring process are used to provide feedback in the charm status message, visible in the output of juju status.
  • if the charm detects that it’s running in a container, it sets the charm state to blocked and adds a status message indicating that NTP should be configured on the host rather than in the container.
  • The charm makes every effort to restrict load on the configured NTP servers by testing connectivity a maximum of once per day if configuration changes are made, or once a month if running from the update-status hook.

All this means that you can deploy a single ntp charm across a large number of OpenStack hosts, and be confident that the most appropriate hosts will be selected as the NTP service stratum.

Here’s a diagram showing the resulting architecture:

How it works

  • The new code uses ntpdate in test mode to test the latency to each configured source.  This results in a delay in seconds for each IP address responding to the configured DNS name.
  • The delays for responses are combined using a root mean square, then converted to a score using the negative of the natural logarithm, so that delays approaching zero result in a higher score, and larger delays result in a lower score.
  • The scores for all host names are added together.  If the charm is running on a bare metal machine, the overall score given a 25% increase in weighting.  If the charm is running in a VM, no weight adjustment is made.  If the charm is running in a container, the above scoring is skipped entirely and the weighting is set to zero.
  • The weight is then reduced by between 10% and 25% based on the presence of the following running processes: ceph, ceph-osd, nova-compute, or swift.
  • Each unit sends its calculated scores to its peer units on the peer relation.  When the peer relation is updated, each unit calculates its position in the overall scoring results, and determines whether it is in the top 6 hosts (by default – this value is tunable).  If so, it updates /etc/ntp.conf to use the configured NTP servers and flags itself as connecting to the upstream stratum.  If the host is not in the top 6, it configures those 6 hosts as its own servers and flags itself as a client.

How to use it

This updated ntp charm has been tested successfully with production customer workloads.  It’s available now in the charm store.  Those interested in the details of the code change can review the merge proposal – if you’d like to test and comment on your experiences with this feature, that would be the best place to do so.

Here’s how to deploy it:

# Create a single ntp service:
$ juju deploy --channel=candidate cs:ntp ntp
    # ntp service still uses default pools configuration
$ juju config ntp auto_peers=true

# Deploy to existing nodes:
$ juju add-relation infra-node ntp
$ juju add-relation compute-node ntp

You can see an abbreviated example of the juju status output for the above deployment at http://pastebin.ubuntu.com/25901069/.

Disclosure statement

I wrote most of this post as part of my day job as a Site Reliability Engineer for Canonical.  We’re hiring.