Making NTP best practices easy with Juju charms

NTP: a behind-the-scenes protocol

As I’ve mentioned before, Network Time Protocol is one of those oft-ignored-but-nonetheless-essential subsystems which is largely unknown, except to a select few.  Those who know it well generally fall into the following categories:

  1. time geeks who work on protocol standardisation and implementation,
  2. enthusiasts who tinker with GPS receivers or run servers in the NTP pool, or
  3. sysadmins who have dealt with the consequences of inaccurate time in operational systems. (I fall mostly into this third category, with some brief forays into the others.)

One of the consequences of NTP’s low profile is that many important best practices aren’t widely known and implemented, and in some cases, myths are perpetuated.

Fortunately, Ubuntu & other major Linux distributions come out of the box with a best-practice-informed NTP configuration which works pretty well.  So sometimes taking a hands-off approach to NTP is justified, because it mostly “just works” without any special care and attention.  However, some environments require tuning the NTP configuration to meet operational requirements.

When best practices require more

One such environment is Canonical’s managed OpenStack service, BootStack.  A primary service provided in BootStack is the distributed storage system, Ceph.  Ceph’s distributed architecture requires the system time on all nodes to be synchronised to within 50 milliseconds of each other.  Ordinarily NTP has no problem achieving synchronisation an order of magnitude better than this, but some of our customers run their private clouds in far-flung parts of the world, where reliable Internet bandwidth is limited, and high-quality local time sources are not available.  This has sometimes resulted in time offsets larger than Ceph will tolerate.

A technique for dealing with this problem is to select several local hosts to act as a service stratum between the global NTP pool and the other hosts in the environment.  The Juju ntp charms have supported this configuration for some time, and historically in BootStack we’ve achieved this by configuring two NTP services: one containing the manually-selected service stratum hosts, and one for all the remaining hosts.

We select hosts for the service stratum using a combination of the following factors:

  • Reasonable upstream Internet connectivity is needed.  It doesn’t have to be perfect – NTP can achieve less than 5 milliseconds offset over an ADSL line, and most of our customer private clouds have better than that.
  • Bare metal systems are preferred over VMs (but the latter are still workable).  Containers are not viable as NTP servers because the system clock is not virtualised; time synchronisation for containers should be provided by their host.
  • There should be no “choke points” in the NTP strata – these are bad for both accuracy and availability.  A minimum of 3 (but preferably 4-6) servers should be included in each stratum, and these should point to a similar number of higher-stratum NTP servers.
  • Because consistent time for Ceph is our primary goal, the Ceph hosts themselves should be clients rather than part of the service stratum, so that they get a consistent set of servers offering reliable response at local LAN latencies.

A manual service stratum deployment

Here’s a diagram depicting what a typical NTP deployment with a manual service stratum might look like (click for a larger image).

To deploy this in an existing BootStack environment, the sequence of commands might look something like this (application names are examples only):

# Create the two ntp applications:
$ juju deploy cs:ntp ntp-service
    # ntp-service will use the default pools configuration
$ juju deploy cs:ntp ntp-client
$ juju add-relation ntp-service:ntpmaster ntp-client:master
    # ntp-client uses ntp-service as its upstream stratum

# Deploy them to the cloud nodes:
$ juju add-relation infra-node ntp-service
    # deploys ntp-service to the existing infra-node service
$ juju add-relation compute-node ntp-client
    # deploys ntp-client to the existing compute-node service

Updating the ntp charm

It’s been my desire for some time to see this process made easier, more accurate, and less manual.  Our customers come to us wanting their private clouds to “just work”, and we can’t expect them to provide the ideal environment for Ceph.

One of my co-workers, Stuart Bishop, started me thinking with this quote:

“[O]ne of the original goals of charms [was to] encode best practice so software can be deployed by non-experts.”

That seemed like a worthy goal, so I set out to update the ntp charm to automate the service stratum host selection process.

Design criteria

My goals for this update to the charm were to:

  • provide a stable NTP service for the local cloud and avoid constantly changing upstream servers,
  • ensure that we don’t impact the NTP pool adversely, even if the charm is widely deployed to very large environments,
  • provide useful feedback in juju status which is sufficient to explain its choices,
  • use only functionality available in stock Ubuntu, Juju, and charm helpers, and
  • improve testability of the charm code and increase test suite coverage.

What it does

  • This functionality is enabled using the auto_peers configuration option; this option was previously deprecated, because it could be better achieved through juju relations.
  • On initial configuration of auto_peers, each host tests its latency to the configured time sources.
  • The charm inspects the machine type and the software running on the system, using this knowledge to reduce the likelihood of a Ceph, Swift, or Nova compute host being selected, and to increase the likelihood that bare metal hosts are used.  (This usually means that the Neutron gateways and infrastructure/monitoring hosts are more likely to be selected.)
  • The above factors are then combined into an overall suitability score for the host.  Each host compares its score to the other hosts in the same juju service to determine whether it should be part of the service stratum.
  • The results of the scoring process are used to provide feedback in the charm status message, visible in the output of juju status.
  • if the charm detects that it’s running in a container, it sets the charm state to blocked and adds a status message indicating that NTP should be configured on the host rather than in the container.
  • The charm makes every effort to restrict load on the configured NTP servers by testing connectivity a maximum of once per day if configuration changes are made, or once a month if running from the update-status hook.

All this means that you can deploy a single ntp charm across a large number of OpenStack hosts, and be confident that the most appropriate hosts will be selected as the NTP service stratum.

Here’s a diagram showing the resulting architecture:

How it works

  • The new code uses ntpdate in test mode to test the latency to each configured source.  This results in a delay in seconds for each IP address responding to the configured DNS name.
  • The delays for responses are combined using a root mean square, then converted to a score using the negative of the natural logarithm, so that delays approaching zero result in a higher score, and larger delays result in a lower score.
  • The scores for all host names are added together.  If the charm is running on a bare metal machine, the overall score given a 25% increase in weighting.  If the charm is running in a VM, no weight adjustment is made.  If the charm is running in a container, the above scoring is skipped entirely and the weighting is set to zero.
  • The weight is then reduced by between 10% and 25% based on the presence of the following running processes: ceph, ceph-osd, nova-compute, or swift.
  • Each unit sends its calculated scores to its peer units on the peer relation.  When the peer relation is updated, each unit calculates its position in the overall scoring results, and determines whether it is in the top 6 hosts (by default – this value is tunable).  If so, it updates /etc/ntp.conf to use the configured NTP servers and flags itself as connecting to the upstream stratum.  If the host is not in the top 6, it configures those 6 hosts as its own servers and flags itself as a client.

How to use it

This updated ntp charm has been tested successfully with production customer workloads.  It’s available now in the charm store.  Those interested in the details of the code change can review the merge proposal – if you’d like to test and comment on your experiences with this feature, that would be the best place to do so.

Here’s how to deploy it:

# Create a single ntp service:
$ juju deploy --channel=candidate cs:ntp ntp
    # ntp service still uses default pools configuration
$ juju config ntp auto_peers=true

# Deploy to existing nodes:
$ juju add-relation infra-node ntp
$ juju add-relation compute-node ntp

You can see an abbreviated example of the juju status output for the above deployment at

Disclosure statement

I wrote most of this post as part of my day job as a Site Reliability Engineer for Canonical.  We’re hiring.

My essential Ubuntu applications

A few times recently i’ve had to think about the essential applications i use on my desktop.  The latest was Anthony Burke’s tweet, but the recent churn in the Linux desktop world and my unhappiness with Unity means that i need to be prepared for moving away from Ubuntu when Unity becomes the only option.  (I’m currently on Ubuntu 10.04 LTS “lucid lynx”, so this hopefully will be some time away.)  So here’s my list of desktop essentials, mostly for my benefit, but hopefully of use to others.  Many of these are simply Ubuntu’s default applications for their respective tasks and are provided in the base install.

Everyday essentials

Most of these applications are open all the time on my laptop:

  • Mozilla Firefox – recently updated to the fast-release versions on 10.04 LTS, which offers some great speed improvements.
  • Mozilla Thunderbird – I’ve been a Netscape/Thunderbird user for more than 10 years now, and i still can’t understand why people put up with a less-capable email client.
  • Pidgin – instant messaging for IRC, XMPP (Jabber, Google Talk), and Twitter (via
  • Amarok 1.4 – music player.  I use the older version because it pulls down my podcasts automatically, rescans my library automatically, and i can just type “198” into the search box and get a great selection of music from the 1980s, instead of having to write match queries (which the new version of Amarok seems to require).
  • Workrave – rest break software for preventing RSI
  • evince – the default PDF reader; it’s rare that i don’t have 2 or 3 PDF files open for reference, sometimes more like 20 or 30
  • Google Desktop – search for all of those local PDFs filling up my hard disk
  • OpenOffice – newer versions of Ubuntu & Debian have moved to LibreOffice now, but 10.04 LTS still uses the Sun/Oracle version
  • Tomboy – simple desktop notes
  • Getting Things Gnome (gtg) – my much-ignored todo list
  • icewm – I replace the default GNOME 2 window manager with icewm, which is a very simple, fast, customisable window manager
  • J-Pilot – my password database dates back to my PalmOS days, so J-Pilot is my password manager even though i don’t use PalmOS devices any more.  People starting fresh would more likely find KeePass or something similar a better choice.
  • By far the most-used applications on my laptop are gnome-terminal, ssh, vim, git, bash, and the suite of Linux/Unix shell script utilities.  I do most of my coding in vim, and rarely have less than 10-15 shell sessions open to different servers or network devices.  Most of my important work happens on the servers, not on the desktop.

Other applications

These are less frequently-used, and aren’t open all the time:

  • Liferea – RSS feed reader
  • Opera – i use this to keep my finance-related web sites separate from my main browser
  • Google Picasa – photo organiser which offers automatic sync to Picasa Web, and simple export to Facebook & Gallery; runs as a Windows application under WINE
  • baobab (Disk Usage Analyser) – great for tracking down where i need to trim back on my disk usage; its exploded pie charts are outstanding (Users of other operating systems might like to check out JDiskReport, which offers the same type of chart.)
  • Unison – file synchronisation
  • gns3/dynamips – Cisco IOS emulator
  • dia – diagram editor; much more rudimentary than Microsoft Visio, but still very usable.  I mostly work with pencil & paper when it comes to network diagrams and the like anyway.
  • GnuCash – double-entry accounting system; all of my business accounting happens here
  • Adobe Acrobat Reader 9 – for some reason, my bank produces PDFs that evince can’t read but Acrobat Reader can
  • Eclipse – IDE for Java coding
  • Audacity – sound editor
  • grip – old-school CD ripper which gives complete control over MP3/OGG encoding options


Back to the future for the Ubuntu desktop

The Register has a review of the Ubuntu 11.04 beta release which suggests there are some rocky times for existing Ubuntu users ahead.  The part the article that stuck out to me reads:

The highlight of the current launcher is the plethora of keyboard shortcuts, which let you to launch applications, open file browsers and call up system-wide searching without taking your hands off the keyboard.

This is basic functionality which X11 window managers have had for years.  I use IceWM which has had these features available through editing simple text configuration files for as long as i can remember (probably more than 10 years, since the SourceForge history for icewm’s 1.2 branch extends back to the year 2000).  And icewm provides many keyboard features which are simply not exposed to the user in current versions of the default Ubuntu GNOME desktop (e.g. go back to the previous virtual desktop used regardless of which number it was).

The paragraph continues:

There are also a few nice touches in the various indicator apps – for example you can simply hover your mouse over the volume indicator and use the scrollwheel to adjust the volume without ever actually clicking anything.

Again, this is basic functionality.  I use this feature in Amarok 1.4 (a really old version that i’m not supposed to admit that i still use – but that’s the subject of another blog post 😉 all the time.  Is it really so innovative?  Not only that, Ubuntu has been pulling functionality (like tooltips which tell you how much battery time is remaining) out of the indicator apps for the past several releases.

What this all suggests to me is that we’re about to embark on a period in Ubuntu’s history where functionality will be back to basics.  (Similar to what happened when Apple first released iOS and it lacked basic functionality like cut & paste.)  As for me, i’ll stick with Ubuntu classic desktop or perhaps take refuge on Debian while things settle down.  At the moment, Ubuntu 10.04 LTS actually fulfills all of my desktop/mobile computing needs, and i’m not prepared to iron out the bugs for them on a user interface which is targeted at users with very basic skills and with much more limited functionality needs than my own.


Getting Adobe Digital Editions to work in Ubuntu

Today we have a guest blog post from my wife Angela.  She is an avid Ubuntu user (in fact, so are our kids) and a great web researcher in general.  She wanted to borrow digital editions of books from our local library; however, the books are only available in the form of digital handcuffs.  So she set herself the task of getting Adobe Digital Editions (DE) working on Ubuntu lucid.  Without further ado, here’s her story:

  1. You need to install Wine (available in the package manager).
  2. On the Adobe Digital Editions page is a link to an installation TechNote (not far below the message that your system doesn’t meet the minimum requirements).
  3. Under the heading “Windows Solutions”, subheading “Manually install Adobe Digital Editions” is a link to download the latest installer for Windows. Click this and choose to SAVE it to your computer (not the RUN option).
  4. On my computer it appeared as “Adobe Digital Editions.desktop” which is an unknown file when you click on it. Right click on this icon and in the drop down box go to Properties -> Permissions and make sure the box “Allow Executing File as a Program” is ticked.
  5. Downloading an ePub book that runs in DE gives you an .acsm file. This is just a digital key or certificate. I found the first time I used it, I needed to drag and drop the .acsm file onto it for it to load into the library. After that I only had to click on the .acsm file, and it worked as it apparently does in Windows.

I have now installed this on three separate computers, and have noticed inconsistencies, probably related to the version of Ubuntu and/or Wine installed.  These are some possible bugs I have noticed – sadly I didn’t pay attention to which versions they correspond to:

  • For some reason DE did not always appear in the list of applications available in Wine (under the Applications drop down menu). Not sure why.
  • I noticed that although everything else seems to work okay (bookmarking the pages in the book I am reading, etc.) books “borrowed” from the library that have expired say on the bookshelf marked “expired” instead of disappearing. However this might be normal for Adobe DE even in Windows. 
  • On one install, the next time I restarted my computer after shutdown, the digital editions application icon on my desktop had lost the permission to allow it as an executable program (see above) and I had to re-enable this.
  • On one install it won’t close from the [x] in the window; I have to click on the little book icon on the upper left hand side and choose “close” from the drop down menu.
  • Occasionally I get an .acsm file that doesn’t work and find I need to go into Properties -> Permissions and tick the “Allow Executing File as a Program” option (this appears to be random and may be related to the certificate, not the DE install.)
  • An I/O error has almost always been a problem at my end, and I have needed to check the settings in the Wine registry.


An interesting performance difference between perl and awk

(I used to love Stan Kelly-Bootle‘s column in Unix World, so i thought i’d share an experience a little like the ones he used to write about.  Hope some old-timers out there can get into it…)

The task i was working on involved taking a file containing a very large directory listing (about 158 MB) and determining the total size of all the files listed in it.  The file’s contents looked like this:

$ head -5 transaction.list 
-rw-r--r-- 1 root root 6575 Aug 5 2009 file-7647833002.log
-rw-r--r-- 1 root root 8223 Aug 5 2009 file-8304157181.log
-rw-r--r-- 1 root root 6929 Aug 5 2009 file-7605687521.log
-rw-r--r-- 1 root root 6802 Aug 5 2009 file-8408844563.log
-rw-r--r-- 1 root root 6787 Aug 5 2009 file-8420786471.log

So to sum the size of the files, i thought i’d write a one-line awk script.  But then i second-guessed myself.  I thought: for a file this size, perl has to be faster, right?  So i wrote a perl one-liner instead.  When i ran it first, it took a lot longer than i expected, so i checked the time it took:

$ time perl -we 'my $sum = 0; while (<>) { my @F = split; 
$sum += $F[4]; } printf "%dn", $sum; ' transaction.list

real    0m8.062s
user    0m7.970s
sys    0m0.080s

This seemed a little excessive to me, so i went back and ran the awk script which i had originally intended to write, and it turned out to be more than 4 times faster:

$ time awk '{ SUM+=$5 } END {printf "%dn", SUM}' transaction.list

real    0m1.474s
user    0m1.390s
sys    0m0.040s

Then i thought, “obviously i’m just a hack and i don’t know how to make perl sing”.  So here was the next cut:

$ time perl -we 'my $sum = 0; while (<>) { my ($size) = /d+[^d]+(d+)/; 
$sum += $size; } printf "%dn", $sum; ' transaction.list

real 0m4.387s
user 0m4.300s
sys 0m0.070s

Nearly twice as fast as the first perl version, but still nearly 3 times slower than the awk version.

I couldn’t be bothered optimising it any further, but i wondered: is there an inherent performance limitation in perl’s split function, or is it just that the overhead in booting up the perl interpreter is higher?

I ran these scripts on my laptop, a Lenovo ThinkPad X200s, with an Intel Core 2 Duo SL9400 CPU and 4 GB RAM, running Ubuntu Linux 10.04 (lucid) 64-bit.  A few of my normal desktop apps were also running.  I ran the scripts a few times each in succession to ensure that i was getting reasonably reliable results.

Any thoughts?  How could i have written the perl version more efficiently?