The School for Sysadmins Who Can’t Timesync Good and Wanna Learn To Do Other Stuff Good Too, part 1 – the problem with NTP

(With apologies to Derek Zoolander and Justin Steven.  And to whoever had to touch the HP-UX NTP setup at Queensland Police after I left. And to anyone who prefers the American spelling “synchronization”.)

(This is the first of a series on NTP.  Part 2 is an overview of how NTP works.)

The problem with NTP

In my experience, Network Time Protocol (NTP) is one of the least well-understood of the fundamental Internet application-layer protocols, and very few IT professionals operate it effectively.  Part of the reason for this is that the documentation for NTP is highly technical and assumes a certain level of background knowledge.

I first encountered NTP more than 20 years ago, and my first efforts with it were an unmitigated disaster due to my ignorance of how the protocol was designed to function.  Since then virtually every IT environment I’ve encountered has had a less-than-optimal NTP setup.

I am still far from an expert on NTP, but I’ve learned quite a lot about operating it since my early days.  I hope this series of posts will help you develop a working knowledge of NTP faster and get the basics of NTP configuration right in your environment.

Why learn NTP?

Why bother learning this rather obscure corner of Internet lore?  I mean, the Internet mostly works, despite this alleged widespread lack of expertise in time sync, right?

Here are some of the reasons you might want to learn more about NTP:

  1. You run Ceph, Mongodb, Kerberos, or a similar distributed system, and you want it to actually work.
  2. You want your logs to match up across multiple systems, potentially on multiple continents.
  3. You like learning about new things and tinkering with embedded systems.
  4. You think bandwidth-efficient, high-precision time synchronisation is just a fun, nerdy problem.
  5. You think this is cool:

    A scenario where the latter behavior [the PPS driver disciplining the local clock in the absence of external sources] can be most useful is a planetary orbiter fleet, for instance in the vicinity of Mars, where contact between orbiters and Earth only one or two times per Sol (Mars day). These orbiters have a precise timing reference based on an Ultra Stable Oscillator (USO) with accuracy in the order of a Cesium oscillator. A PPS signal is derived from the USO and can be disciplined from Earth on rare occasion or from another orbiter via NTP. In the above scenario the PPS signal disciplines the spacecraft clock between NTP updates.

    (Personally, they had me at “planetary orbiter fleet”. 🙂 )


In this series, I’ll describe a few best practices for setting up NTP in a standard 64-bit Ubuntu Linux 16.04 LTS environment.  Bear in mind this quite limited scope; this advice will not apply in all circumstances and intentionally ignores the less common use cases.  Further caveats:

    1. I have no looks.
    2. I am not an expert.   My descriptions of the algorithms are based on the documentation and operational experience.  I’m not a member of the NTP project; I’ve never submitted a patch; I’ve never compiled ntpd from source (I hate reading & writing C/C++).
    3. I’ve only worked with the reference implementation of NTP, and only on Linux, with only one reference clock driver (NMEA), and a limited range of configuration options.
    4. I will be glossing over a lot of detail.  Sometimes it’s because I don’t think it’s necessary in order to work with NTP successfully; sometimes it’s because I haven’t looked into that particular corner and so I don’t understand it; sometimes it’s because I have looked into that particular corner and I still don’t understand it. 🙂  But mostly it’s because I’m attempting to keep this series accessible for those who are newcomers.  If you’re an experienced NTP operator, you probably won’t find much of interest (if anything) until later in the series.
    5. We won’t cover much history or theory of time sync in this series.  If you’d like to know a little more about that, check out Julien Goodwin‘s previous LCA & SLUG talks:

Leave a Reply