There are usually 2 goals:

  • recovery after a hardware failure
  • restoring some version in the past after a software failure or data changes by people

There are 2 main metrics:

  • RPO (Recovery Point Objective) – the amount of time from the last copy to the crash (how much data can be lost). For example, we may lose 20 hours of data. RPO=20h
  • RTO (Recovery Time Objective) – the amount of time from the accident to the restoration of work (how long it will take to restore from the last copy). For example, we have to recover an hour after an accident. RTO=1h

There are copies:

  • complete (contains all data)
  • incremental (contains data changes since the previous copy) - less reliable, but saves space

Some copies can be stored for a day, another for a week, and the third “always”.

It is usually necessary to store copies in various physical locations for greater safety.

Separately, you need to take care of the impossibility of changing already created copies, because otherwise a virus or hacker can spoil them. This usually means writing to removable media (CDs, tapes). You can also encrypt and upload to the cloud from under your account with minimal rights.

It is extremely important to test backup recovery from time to time. It may turn out that, for example, the backup file is empty or broken. Or that something has changed in the program and you need to develop and document a new recovery procedure, because the old one no longer works.

A copy schedule is usually set up, for example:

  • copies are made every hour
  • the first full copy in days, the rest are incremental
  • copies by default live for a day, a full copy of mon lives for a week, a copy from the 1st of the month lives for a year, and a copy from the first of the year lives without restriction
  • monthly and annual copies are additionally sent to the cloud
  • each monthly copy is checked for restoration

Usually, for each information system, it is necessary to separately define and prescribe all these details.