I have my MikroTik configured as an NTP server for my local network – in particular my Raspberry Pis. Why? Honestly I’m not too sure; it’s not like I have common business needs like all domain joined computers & servers or all Lync Phone Edition devices requiring the same time within 5 minutes. NTP just feels like something that should be served locally wherever possible, like DNS. The MikroTik of course obtains its time from public NTP servers, which is given out to clients on the local network when requested – I guess it seems the right thing to do to lessen load on public NTP servers.
For those who don’t know, Raspberry Pis have no built-in hardware clock – if they reboot, they lose the time. Raspbian uses a package called fake-hwclock which saves the current time to a file on shutdown and loads it on startup. This obviously doesn’t ensure the time is correct, but it ensures that Linux never goes back in time – without this Linux would start with time 01/01/1970, which can cause issues. The time is then corrected forward when NTP is started.
Back in February – specifically on February 11th (this post heavily involves time after all) – I discovered that my Raspberry Pis were not automatically syncing time anymore. For example after a crash that had been undiscovered for a few hours, followed by a reboot, they would then be a few hours behind. The year was not 1970 so fake-hwclock was working fine, the problem was NTP.
I began troubleshooting using ntpd.
> sudo systemctl stop ntp > sudo ntpd -gq ntpd: no servers found
This was an odd result as my MikroTik’s NTP server was clearly up and running. For example, I tested it on Windows (yes, Windows) as follows:
> w32tm /stripchart /computer:192.168.100.254 Tracking 192.168.100.254 [192.168.100.254:123]. The current time is 11/02/2017 21:00:57. 21:00:57, d:+00.0003486s o:-00.3827807s [ *| ] 21:00:59, d:+00.0004585s o:-00.3827951s [ *| ] 21:01:01, d:+00.0004114s o:-00.3827643s [ *| ] 21:01:03, d:+00.0004050s o:-00.3827735s [ *| ]
On one Raspberry Pi, after I modified ntp.conf to use ntp.org’s pool instead of the MikroTik, the time would sync just fine. So what’s wrong with the MikroTik as a time source?
I was getting nowhere so I decided to run a packet capture of the NTP process.
This packet is the MikroTik’s reply to the Raspberry Pi. The Origin Timestamp is time that the Raspberry Pi sent its NTP request, according to its own clock – so you can see it has lost nearly 12 hours.
But the Reference Timestamp is over a month off! What’s this timestamp? RFC 1305 says it best: “This is the local time, in timestamp format, when the local clock was last updated.” The local clock is the MikroTik’s in this case – this means that our MikroTik has not updated from the NTP server for over a month!
It turned out that Linux’s NTP implementation was regarding the MikroTik as a bad source because the MikroTik’s time hadn’t been synchronised for too long, hence
ntpd: no servers found. This is the cause but not the root cause – why was the MikroTik out of sync?
I had configured the MikroTik NTP client to use the NTP servers 0.uk.pool.ntp.org and 1.uk.pool.ntp.org.
However RouterOS’s NTP client only supports IP addresses! When I entered the domain names when I first configured this months ago, RouterOS did a one-time resolution of the domain names into IP addresses. However both of these IPs now no longer serve NTP – indeed I tested them in Windows.
> w32tm /stripchart /computer:193.188.xxx.xxx Tracking 193.188.xxx.xxx [193.188.xxx.xxx:123]. The current time is 11/02/2017 21:51:13. 21:51:13, error: 0x800705B4 21:51:16, error: 0x800705B4
So first of all, to get things up and running for the short term, I simply entered the uk.pool.ntp.org domains into WinBox again so that the MikroTik would actually be using some active pool servers. After this the Raspberry Pis began to sync just fine. A packet capture showed that the MikroTik’s Reference Timestamp had updated.
Now for the long term. We basically need the MikroTik to periodically re-resolve the pool domain names to an IP address. For example as a manual one-off, this can be done in a one-liner:
> /system ntp client set primary-ntp=[:resolve 0.uk.pool.ntp.org] secondary-ntp=[:resolve 1.uk.pool.ntp.org]
We need to do this periodically; though I am unsure how often is best practice. The DNS records of *.uk.pool.ntp.org have a TTL of under 3 minutes, but I am sure it is not expected that an NTP client will relookup the records that often. I decided that I didn’t think I could go wrong by updating this twice per day:
/system scheduler add interval=12h name="ntp refresh" on-event=\ "/system ntp client set primary-ntp=[:resolve 0.uk.pool.ntp.org] secondary-ntp=[:resolve 1.uk.pool.ntp.org]" policy=read,write,test \ start-date=feb/11/2017 start-time=00:10:00
Over three months on, I haven’t had any problems. The Raspberry Pis have remained synced, and I no longer get “no servers found”.
> sudo systemctl stop ntp > sudo ntpd -gq ntpd: time slew +0.000659s