Bodet-Time
  • Home
  • Resources
  • Blog
  • Computer redundancy and time servers
Computer redundancy and time servers

The breakdown of a service within a computer network is an issue that must be avoided where possible, and resolved as quickly as possible when it could not be avoided.

When the service in question turns out to be an essential component of several other services such as time synchronisation via a time server implementing the PTP or NTP protocol, then the breakdown can have disastrous consequences. Fortunately, there are several ways to avoid such breakdowns to happen.

Understanding the potential causes of failure

Before carrying out any action on the hardware or the software, it is necessary to identify risks, that is to say all possible causes for each failure. All the elements of a network must be analysed, from hardware (power supply, cables, etc.), to the configuration of the different software components, including time sources, their location in the network architecture, and so on. The more extensive and complex the network, the greater the number of potential points of failure. However, a minimalist network is not necessarily more robust than a network including several machines, as a single failure can paralyse the entire network.

Is redundancy the solution?

Once potential breakdowns have been identified, they can be avoided by using redundancy. For example, if one power supply fails, it will have no impact if a second takes over until the first one is repaired.

Creating redundancy does not necessarily mean exact replication. It is rather a service redundancy than an exact hardware redundancy: for example, a second, less powerful power supply can be used to compensate for the failure of the first one, backed up by an inverter and a battery.

Besides, the fact of duplicating each equipment to avoid any failure that may occur is not sufficient. As well as hardware, the means of communication are important. When a network architecture is being defined, it is required to create several paths between each point of the network, so that messages always flow smoothly. The main purpose is to increase tolerance to network failure, via good planning. Indeed, a degraded service is more tolerable than a complete breakdown.

External redundancy

Sometimes, the failure will be caused by a problem with the availability of a service external to the organisation’s computer system. Such external issues include power cuts, unavailability of an external cloud (and therefore associated data), the breakdown of a reference clock (or a degradation of its performance in the event of GNSS jamming).

As an example, if a reference clock breaks down, the impact is total when the latter stops functioning or starts to drift, since time synchronisation is useful to most services within an organisation.

Even if some protocols are more tolerant than other, it is necessary that all of them (NTP, PTP, IRIG, etc.) have several reference clocks. Having several reference clocks enables to effectively compensate for the failure of one of them, and also to identify if some of them start to drift without yet affecting the network. In protocols where this is possible, it is important to synchronise with several time reference sources so that a device failure does not have consequences further down the network.

Is redundancy the solution?

Anticipation for better resistance

Managing a network breakdown is more a question of anticipation than reaction. A service that is disrupted or stopped will always have consequences, even if it is repaired very quickly. With good anticipation, it is possible to make failures and breakdowns transparent to users. It also enables to provide high availability for services, which should be the purpose. Failures will definitely occur in a computer network, but it is necessary to limit their impact as much as possible.

It is then required to prevent against breakdowns, whether they come from inside or outside the network. External failures such as reference clock breakdowns or electrical network problems can only be anticipated by using redundancy. Indeed, in these specific cases, the organisation has no control over the repair of the service.

Prioritising critical services

The more critical the service in question, the more important it is to make it fault-tolerant. Time synchronisation of a network is a task which is relied upon by numerous applications. As such, it is paramount to ensure its continuous operation. Anticipating breakdowns starts by creating an architecture that avoids contention nodes: indeed, if all synchronisation routes pass through a single server, and the latter breaks down, then no machine will be synchronised. It is then better to create an architecture with two time servers which are not physically dependent on the same routes within the computer network.

Improving QoS

Some network protocols such as PTP prevent breakdowns to happen within the network by selecting master clocks. However, it is impossible to rely solely on this type of mechanism. To offer a reliable and highly available service, it should be considered that each part of the network can and will break down. It is then required to ensure that each individual breakdown do not stops the service. Once again, the only effective way of preventing potential breakdowns to happen is to use hardware redundancy.

Hardware redundancy features another advantage: the additional hardware is not intended to be dormant. Indeed, it can be used to reduce the load of each equipment within the network. This increases quality of service and availability of systems.

Improving QoS

To conclude

Adding hardware to the network to create redundancy makes it more complex to configure and maintain, but this is necessary to make it fault-tolerant. The initial cost will avoid many problems when a breakdown occurs. As specialists of fault tolerance say: before the breakdown it is too expensive, but after the breakdown it is too late.

Leader in time management and present in over 140 countries, Bodet Time is a major French leader in time synchronisation and time frequency. Do you need assistance to create efficient, secure and highly available time distribution architecture?

Contact us

Share the article