Bodet-Time
  • Home
  • Resources
  • Blog
  • Redundancy in the NTP protocol
Redundancy in the NTP protocol

Breakdowns and malfunctions are inevitable for both networks and equipment. This also applies to time synchronisation mechanisms on which rely several services and applications. While it is impossible to make a network completely fault-tolerant, it is possible, with good planning, to make it resistant to a significant number of breakdowns.

The most effective way to withstand breakdowns and malfunctions is to set up the redundancy of equipment and connections. In fact, a device breakdown will have no consequences if another takes it over immediately. For networks using NTP (Network Time Protocol) as time synchronisation system, it is possible and recommended to use redundancy mechanisms to make the system more robust.

When it comes to redundancy, it is paramount to assess all aspects of the network that can fail, such as each power supply, each server, each router, and so on. For example, it is important to make sure to have several power supply sources available, some of which are independent of the electrical grid.

When installing an NTP network, it is essential to know what measures are to be implemented for it to withstand breakdowns of different network equipment, such as servers and routers.

Redundancy of time sources

The first thing to do to ensure the continuity of time synchronisation during breakdowns is to synchronise one’s network with several time sources. Indeed, the NTP protocol is able to operate with several time sources. This redundancy will ensure continuity of operation during the breakdown and/or the failure of a time source. Not only it is important to use multiple time sources but also to use different types of time sources. Indeed, by only using public time servers such as stratum 1 servers, one becomes dependent on the Internet network and therefore on its breakdowns. Ideally, it is also recommended to use stratum 0 time sources such as GPS.

In the case where each clock synchronises when receiving a message, it is clear that a simple connection to several time sources does not prevent one or several time sources from suffering an offset. Therefore, it is important to select the best time source or to calculate a satisfactory average time to guarantee the best synchronisation with the reference clocks.

To do so, the NTP protocol uses a variant of the Marzullo’s algorithm, called the intersection algorithm. This algorithm aims at finding the best server to synchronise the local clock. Based on the features of different servers sending their timestamps, the algorithm will create a list of servers from which it will select the best one to use.

Then, it will filter this list to exclude servers that do not meet eligibility criteria, such as too much distance, or the fact that a servers belongs to a stratum which is too far down the NTP hierarchy.

Once the list has been cleaned, the algorithm searches for the best server from the list in order to update the local clock.

For each server, a time interval is considered which is based on the timestamp sent, extended according to the distance between the clock and the server. The main purpose is to find an intersection of intervals that includes at least half of the servers. If such an intersection exists, servers that make it up will be taken into account for the algorithm. These servers are then classified according to the quality of their clock, which depends on the offset, the distance to the server and the jitter (latency variation). This is an iterative algorithm, since it eliminates an outlier in each round, until all outliers have disappeared.

In cases where there are no more outliers, if there are enough servers left according to the algorithm’s settings (1 by default; for greater security, this value can be increased), the best remaining server is used to synchronise the local clock.

With the Marzullo’s algorithm, by connecting equipment to a sufficient number of time sources, it is unlikely to have no reliable time source available to synchronise network clocks.

Network redundancy

However, having multiple time sources is not sufficient to ensure continuity of service in the event of a breakdown. Indeed, what is happening inside the network must be taken into account. One solution for maintaining time synchronisation when reference clocks are out of service is peering.

Peering allows two devices to act alternatively as client and server for each other to maintain synchronisation. In the case of peering, clocks play a symmetrical role as both client and server. This slows down synchronisation, since servers exchange more messages. As both clocks are on an equal footing, their goal is to synchronise with each other and not with an external source. Peering allows a pair of clocks to maintain close proximity. Although peering is paramount in some applications, it does not guarantee consistent timestamping with an external source, which can become an issue for applications such as logging of data manipulations.

Another redundancy mechanism within an NTP network is the Anycast mode. Indeed, in the Anycast mode, several servers share the same Anycast address. As such, clients do not exactly know which server will respond to their request. Routers determine which server should respond to which request by choosing the closest server, either topologically or in terms of latency. The Anycast mode features more advantages in addition to redundancy. Indeed, since the routing network automatically chooses the closest server to respond to the request, the network load is reduced as well as transmission delays between the client and the server. Synchronisation accuracy is therefore better.

However, the Anycast configuration is more complex to set up as compared to the traditional client/server mode. It then necessary to be equipped with compatible time servers and hardware. Besides, it is paramount to make sure that servers that are connected to the same Anycast address are correctly synchronised with each other so that the timestamp returned does not rely on which server is responding.

For these reasons, it can be tempting to use “pool” public servers. This works on the same principle as Anycast since several public servers are connected to the same IP address. This enables to always have a server available to respond to a request.

However, while it is not important to be able to identify which server is responding when one is in control of all machines on one’s network, it becomes essential when one uses public servers providing time synchronisation. Consequently, this approach is not to be privileged.

Monitoring and security

The implementation of redundancy mechanisms must not overshadow network security. Indeed, it is important to secure connections between the different equipment in order to withstand breakdowns and attacks. Efficient monitoring enables to make the most of NTP redundancy. In fact, when monitoring server activity, it is possible to detect equipment breakdowns as early as possible. As a result, in the event of server failure, it is easier to synchronise equipment with the other servers of the network in order not to interrupt the service, while the faulty equipment is being repaired.

Expert in time management and present in over 140 countries, Bodet Time is a major French leader in time synchronisation and time frequency.

Do you need assistance to create an efficient, secure and highly available time distribution architecture?

Contact us

Share the article