The Foolproof Method of Maintaining your Backup System

The Foolproof Method of Maintaining your Backup System

As you might expect, setting up backup is just the beginning. You will need to keep it running into perpetuity. Similarly, you cannot simply assume that everything will work. You need to keep constant vigilance over the backup system, its media, and everything that it protects.

Monitoring Your Backup System

Start with the easiest tools. Your backup program almost certainly has some sort of notification system. Configure it to send messages to multiple administrators. If it creates logs, use operating system or third-party monitoring software to track those as well. Where available, prefer programs that will repeatedly send notifications until someone manually stops it or it detects problem resolution.

Set up a schedule to manually check on backup status. Partially, you want to verify that its notification system has not failed. Mostly, you want to search through job history for things that didn’t trigger the monitoring system. Check for minor warnings and correct what you can. Watch for problems that recur frequently but work after a retry. These might serve as early indications of a more serious problem.

Testing Backup Media and Data

You cannot depend on even the most careful monitoring practices to keep your backups safe. Data at rest can become corrupted. Thieves, including insiders with malicious intent, can steal media. You must implement and follow procedures that verify your backup data. After all, a backup system is only valuable if the data can be restored when needed.

Keep an inventory of all media. Set a schedule to check on each piece. When you retire media due to age or failure, destroy it. Strong magnets work for tapes and spinning drives. Alternatively, drill a hole through mechanical disks to render them unreadable. Break optical media and SSDs any way that you like.

Organizations that do not track personal or financial information may not need to keep such meticulous track of media. However, anyone with backup data must periodically check that it has not lost integrity. The only way you can ever be certain that your data is good is to restore it.

Establish a regular schedule to try restoring from older media. If successful, make spot checks through the retrieved information to make sure that it contains what you expect.

Use this article as a basic discussion on testing best practices. We will revisit the topic of testing in a dedicated post towards the end of this article series.

The activities in this article will take time to set up and perform. Do not allow fatigue to prevent you from following these items or tempt you into putting them off. You need to:

  • Configure your backup system to send alerts on failed jobs at least
  • Establish an accountability for manually verifying that the backup program is functioning on a regular basis;
  • Configure a monitoring system to notify you if your backup software ceases running;
  • Establish a regular schedule and accountability system to test that you can restore data from backup. Test a representative sampling of online and offline media.

Too many organizations do not realize until they’ve lost everything that their backup media did not successfully preserve anything. Some have had backups systems sit in a failed state for months without discovering it. A few minutes of occasional checking can prevent such catastrophes.

Monitoring backup, especially testing restores, is admittedly tedious work. However, it is vital. Many organizations have suffered irreparable damage because they found out too late that no one knew how to restore data properly.

Maintaining Your Systems

The intuitive scope of a business continuity plan includes only its related software and equipment. When you consider that the primary goal of the plan is data protection, then it makes sense to think beyond backup programs and hardware. Furthermore, all the components of your backup belong to your larger technological environment, so you must maintain it accordingly.

Fortunately, you can automate common maintenance. Microsoft Windows will update itself over the Internet. The package managers on Linux distributions have the same ability. Windows also allows you to set up an update server on-premises to relay patches from Microsoft. Similarly, you can maintain internal repositories to keep your Linux systems and programs current.

Maintaining Your Systems

In addition to the convenience that such in-house systems provide, you can also leverage them as a security measure. You can automatically update systems without allowing them to connect directly to the Internet. In addition to software, keep your hardware in good working order.

Of course, you cannot simply repair modern computer boards and chips. Instead, most manufacturers will offer a replacement warranty of some kind.

If you purchase fully assembled systems from a major systems vendor, such as Dell or Hewlett-Packard Enterprise, they offer warranties that cover entire systems as a whole. They also have options for rapid delivery or in-person service by a qualified technician. If at all possible, do not allow out-of-warranty equipment to remain in service.

Putting It into Action

Most operating systems and software have automated or semi-automated updating procedures. Hardware typically requires manual intervention. It is on the system administrators to keep current.

  • Where available, configure automated updating. Ensure that it does not coincide with backup, or that your backup system can successfully navigate operating system outages.
  • Establish a pattern for checking for firmware and driver updates. These should not occur frequently, so you can schedule updates as one-off events.
  • Monitor the Internet for known attacks against the systems that you own. Larger manufacturers have entries on common vulnerabilities and exposures (CVE) lists. Sometimes they maintain their own, but you can also look them up at: https://cve.mitre.org/. Vendors usually release fixes in standard patches, but some will issue “hotfixes”. Those might require manual installation and other steps.
  • If your hardware has a way to notify you of failure, configure it. If your monitoring system can check hardware, configure that as well. Establish a regular routine for visually verifying the health of all hardware components.

To properly protect your virtualization environment and all the data, use Hornetsecurity VM Backup to securely back up and replicate your virtual machine.

We ensure the security of your Microsoft 365 environment through our comprehensive 365 Total Protection Enterprise Backup and 365 Total Backup solutions.

For complete guidance, get our comprehensive Backup Bible, which serves as your indispensable resource containing invaluable information on backup and disaster recovery.

To keep up to date with the latest articles and practices, pay a visit to our Hornetsecurity blog now.

Final Words

Maintenance activities consume a substantial portion of the typical administrator’s workload, so these procedures serve as a best practice for all systems, not just those related to backup. However, since your disaster recovery plan hinges on the health of your backup system, you cannot allow it to fall into disrepair.

FAQ

What is a data backup system?

A data backup system is a method or process designed to create and maintain duplicate copies of digital information to ensure its availability in the event of data loss, corruption, or system failures.

What is an example of a data backup?

An example of a data backup is storing copies of files, documents, or entire systems on external hard drives, cloud services, or other storage media. This safeguards against potential data loss and facilitates recovery if the original data is compromised.

How do companies backup their data?

Companies use a variety of methods to backup their data, including regular backups to external servers, cloud-based solutions, tape drives, or redundant storage systems. Automated backup software is often employed to streamline and schedule the backup process, ensuring data integrity and accessibility.

Hornetsecurity, as cloud security experts, is here to assist global organizations and empower IT professionals with the necessary tools, all delivered with a positive and supportive attitude.

How to Get the Absolute Most Out of Your Backup Software

How to Get the Absolute Most Out of Your Backup Software

In the past, we could not capture a consistent backup. Operations would simply read files on disk in order as quickly as possible.

But, if a file changed after the backup copied it but before the job completed, then the backup’s contents were inconsistent. If another program had a file open, then the backup would usually skip it.

How to Get the Absolute Most Out of Your Backup Software

Microsoft addressed these problems with Volume Shadow Copy Services (VSS). A backup application can notify VSS when it starts a job. In response, VSS will pause disk I/O and create a “snapshot” of the system.

The snapshot isolates the state of all files as they were at that moment from any changes that occur while the backup job runs. The backup signals VSS when it has finished backing up, and VSS merges the changed data into the checkpoint and restores the system to normal operation.

With this technique, on-disk files are completely consistent.

However, it cannot capture memory contents. If you restore that backup, it will be exactly as though the host had crashed at the time of backup. For this reason, we call this type of backup “crash-consistent”. It only partially addresses the problem of open files.

How to Get the Absolute Most Out of Your Backup Software

VSS-aware applications can ensure complete consistency of the files that they control. Their authors can write a component that registers with VSS (called a “VSS Writer”). When VSS starts a snapshot operation, it will notify all registered VSS writers. In turn, they can write all pending operations to disk and prevent others from starting until the checkpoint completes.

Because it has no active I/O (sometimes called “in-flight”) at the time the backup is taken, the backup will capture everything about the program. We call this an “application-consistent” backup.

As you shop for backup programs, keep in mind that not everyone uses the terms “crash-consistent” and “application-consistent” in the same way. Also, Linux distributions do not have a native analog to VSS. Research the way that each candidate application deals with open files and running applications.

Hypervisor-Aware Backup Software

If you employ any hypervisors in your environment, you should strongly consider a backup solution that can work with them directly.

You can back up client operating systems using agents installed just like physical systems if you prefer. However, hypervisor-aware backup applications can appropriately time guest backups to not overlap and employ optimization strategies that greatly reduce time, bandwidth, and storage needs.

When it comes to your hypervisors, investigate applications with the same level of flexibility as Hornetsecurity VM Backup.

You can install it directly on a Hyper-V host and operate it from there, use a management console from your PC, or make use of Hornetsecurity’s Cloud Management Console to manage all of your backup systems from a web browser. Such options allow you to control your backup in a way that suits you.

Agent-Based Versus Agentless

Usually, backup solutions require you to install a software component on each system you want to protect. That software will gather the data from its system and send it directly to media or to a central system. You saw examples of both in the “The Golden Rules to Choosing a Backup Provider” article. The software piece that install on the targets is called an “agent”.

Other products can back up a system without installing an agent. You won’t find much in that category for taking complete backups of physical servers. Some software will back up networked file storage.

These “agentless” products rule the world of virtualization. Hornetsecurity VM Backup serves as a prime example. You install the software in your Hyper-V or VMware environment, and it backs up virtual machines without modifying them.

While VM Backup and similar programs can interact with guest operating systems to give them an opportunity to prepare for a backup operation, they can also work on virtual machines without affecting them.

Without such an agentless solution, you would need to place some piece of software inside every virtual machine. That introduces more potential failure points, increases your attack surface, and burdens you with more overhead.

You need to schedule all backup jobs carefully so that they do not interfere with each other. Agentless systems coordinate operations automatically. They also have greater visibility over your data, making it easier for them to perform operations such as deduplication for smaller, faster backups.

Standard Physical Systems Backup Software

Few organizations have moved fully to virtualized deployments. So, you likely have physical systems to protect in addition to your virtual machines. Some vendors, such as Hornetsecurity, provide a separate solution to cover physical systems.

Others use customized agents or modules within a single application. However, some companies have chosen to focus on one type of system and cannot protect the other.

Single Vendor vs. Hybrid Application Solutions

In small environments, administrators rarely even consider using solutions that involve multiple vendors. Each separate product has its own expertise requirements and licensing costs. You cannot manage backup software from multiple vendors using a single control pane.

You may not be able to find an efficient way to store backup data from different manufacturers. Using a single vendor allows you to cover most systems with the least amount of effort.

On the other hand, organizations with more than a handful of servers almost invariably have some hybridization – in operating systems, third-party software, and hardware. Using different backup programs might not pose a major challenge in those situations. Using multiple programs allows you to find the best solution for all your problems instead of accepting one that does “enough”.

I once had a customer that was almost fully virtualized. They placed high priority on a granular backup of Microsoft Exchange with the ability to rapidly restore individual messages. Several vendors offer that level of coverage for Exchange in addition to virtual machine backup.

Unfortunately, no single software package could handle both to the customer’s satisfaction.

To solve this problem, we selected one application to handle Exchange and another to cover the virtual machines. The customer achieved all their goals and saved substantially on licensing.

Putting It in Action

Using the above guidance and the plan that you created in earlier articles in this series, you have enough information to start investigating programs that will satisfy your requirements.

Phase one: Candidate software selection

Begin by collecting a list of available software. You will need to find a way to quickly narrow down the list.

To that end, you can apply some quick criteria while you search, or you can build the list first and work through it later. Maintain this list and the reasons that you decided to include or exclude a product.

Create a table to use as a tracking system. As an example:

Phase one: Candidate software selection

It might seem like a bit much to create this level of documentation, but it has benefits:

  • Historical purposes: Someone might want to know why a program was tested or skipped
  • Reporting: You may need to provide an accounting of your selection process
  • Comparisons: Such a table forms a feature matrix

Because this activity only constitutes the first phase of selection, use criteria that you can quickly verify. To hasten the process, check for any deal-breaking problems first. You can skip any other checks for that product. While the table above shows simple yes/no options, you can use a more nuanced grading system where it makes sense.

Keep in mind that you want to shorten this list, not make a final decision.

Phase two: In-depth software testing

You will spend the most time in phase two. Phase one should have left you with a manageable list of programs to explore more completely. Now you need to spend the time to work through them to find the solution that works best for your organization.

Keep in mind that you can use multiple products if that works better than a single solution.

For this phase, you will need to acquire and install software trials. Some recommendations:

  • Install trialware on templated virtual machines that you can quickly rebuild;
  • Use test systems that run the same programs as your production systems;
  • Test backing up multiple systems;
  • Test encryption/decryption;
  • Test complete and partial restores.

Extend the table that you created in phase one. If you used spreadsheet software to create it, consider creating tabs for each program that you test. You could also use a form that you build in a word processor.

Make sure to thoroughly test each program. Never assume that any given program will behave like any other.

Phase three: Final selection

Hopefully, you will end phase two with an obvious choice. Either way, you will need to notify the key stakeholders from phase one of your selection status. If you need additional input or executive sign-off to complete the process, work through those processes.
Phase three: Final selection

Unless you choose a completely cloud-based disaster recovery approach, you will still need to acquire hardware. Remember that, due to threats of malware and malicious actors, all business continuity plans should include some sort of in-house solution that you can take offline and offsite.

To properly protect your virtualization environment and all the data, use Hornetsecurity VM Backup to securely back up and replicate your virtual machine.

We ensure the security of your Microsoft 365 environment through our comprehensive 365 Total Protection Enterprise Backup and 365 Total Backup solutions.

For complete guidance, get our comprehensive Backup Bible, which serves as your indispensable resource containing invaluable information on backup and disaster recovery.

To keep up to date with the latest articles and practices, pay a visit to our Hornetsecurity blog now.

Conclusion

Optimizing your backup software is crucial for ensuring the integrity and consistency of your data. When dealing with virtualization and hypervisors, consider solutions that are hypervisor-aware and agentless, as they can offer greater flexibility and efficiency.

For organizations with both physical and virtual systems, it’s essential to select a solution that can cover both adequately.

When deciding between a single-vendor or hybrid approach, weigh the pros and cons carefully to meet your unique needs, as the phased approach to selecting the right backup software involves candidate selection, in-depth testing, and final selection, ensuring you make the best choice for your organization’s data protection and recovery needs.

FAQ

What is the backup software?

Backup software is a type of computer program designed to create and manage copies of data, files, or entire systems for the purpose of data protection, disaster recovery, and data preservation. These software applications automate the process of backing up data to ensure that it can be restored in case of data loss, hardware failure, or other unforeseen events.

What is an example of backup software?

An example of backup software is our Hornetsecurity VM Backup, which is a comprehensive backup solution provided by Hornetsecurity. Hornetsecurity VM Backup is a virtual machine backup solution provided by Hornetsecurity.

It’s designed specifically for virtualized environments and focuses on creating backups of virtual machines. This type of backup software is essential for protecting and recovering data in virtualized server environments.

What is free backup software?

Free backup software refers to backup solutions that are available at no cost, typically with limited features compared to their paid counterparts. These free backup software options are suitable for individuals or small organizations with basic backup needs.

The Golden Rules to Choosing a Backup Provider

The Golden Rules to Choosing a Backup Provider

The connection point is usually when you have received the bulk of your hardware and software purchase and can put it to use. If you have not even submitted orders yet, that’s ideal. If you already have everything, that’s fine as well.

You must design the architecture, which you might find easier to perform before you decide what to buy.

In simple terms, you must move on from deciding what to protect to deciding how to protect it. For some things, your organization might choose to use printed hard copies. Those survive power outages and need no technical expertise and can last essentially forever. You will need to find a way to adequately keep these items safe.

Consider their risk from events such as fire, flood, and theft. If the contents of the documents are vital but not a risk to security, then perhaps creating and distributing multiple copies is the best answer. Technology may not help much for these types of problems.

To guard your digital information, you need three major things:

  • Backup software
  • Backup storage
  • Security strategy

If you start by selecting your backup application, that can guide you toward the most appropriate hardware platform and security approach. You could also start with a physical storage system that you like, but this may restrict your options for software solutions.

In the past, companies rarely put much thought or effort into backup security. Soon, they learned – the hard way – that bad actors found enough value in data backups to steal them. That prompted the backup industry to introduce security features into their products.

Later, ransomware authors began targeting backup applications to prevent them from saving victims’ data, or even worse, corrupting that data so it can’t be recovered.

This article focuses on the topic of choosing the right backup and disaster recovery provider for you and your business.

Choosing the right backup and recovery software

Your software selection will have a monumental long-term impact on your disaster recovery and business continuity operations. Once you successfully implement your choice of application(s), inertia will set in almost immediately.

Most vendors offer renewal pricing substantially below their first-year cost, which makes loyalty attractive. Switching to another provider might prove prohibitively expensive. Even if you get attractive pricing from a competitor, you still need to invest considerable time and effort to make the switch. For these reasons, you should not rush to a determination.

At its core, every single backup application has exactly one purpose: make duplicate copies of bits. Any reasonably talented scripter can build a passable bit duplication system in a short amount of time. Due to the ease of satisfying that core function, the backup software market has a staggering level of competition.

With so many available choices, you get some good and some bad news. The good news: you have no shortage of feature-rich, mature options to choose from. The bad news: you have no shortage of feature-rich, mature options to choose from.

You likely will not try out more than a few vendors before you either run out of time or become overwhelmed. In the upcoming sections, you will find many pointers to help you quickly pare down your options to a reasonable subset before installing your first trial package.

Backup application features

To distinguish themselves in a marketplace crowded with dozens of other companies trying to sell a product that performs the same fundamental role, backup program manufacturers spend a great deal of time on the supporting features.

Like anyone else, they tend to brag about whatever they feel that they do especially well. So, you can often get an overall feeling about a product just by looking at its marketing literature.

If they frequently use words like “simple” and “easy”, then you should expect to find a product that will not need a lot of effort to use. If you see several references to “fast” and “quick” and the like, then the application likely focuses on optimizations that reduce the amount of time to perform backup or restore operations.

Businesses that work from a value angle tend to use words like “affordable” and “economical”. Words like “trusted” and “leader” tend to indicate a mature product with a dedicated following.

So, if you go to the homepage of a backup vendor and see phrases containing words that speak to you, then you are almost certainly in that company’s target market. At the very least, they think that they have something to offer that fits your needs.

You will have to do more work to determine if their product lives up to the promise. However, if you see nothing that addresses your primary concerns, take that as a warning sign.

For instance, if you mostly want a stable product with responsive support that you can afford, you might want to avoid a company that prides itself on bleeding-edge capabilities, places its support links after everything else, and makes it difficult to even find pricing.

It’s important to match the scope of the solution with your unique deployment characteristics and business requirements rather than simply opt for the cheapest or most feature rich.

Trial and free software offerings – what to look for

Every major backup application manufacturer offers a trial, and most offer a limited but free version of their product. You should take advantage of these opportunities. With so many quality products on the market, avoid anything that you cannot try prior to purchase.

As you test software, use your plan from the earlier article as your guide. If the program cannot satisfy anything on that list, then you must gauge the importance of that deficit. Find out if the program provides an alternative method to achieve the goal.

If it does not, then you must choose between augmenting this program with another or skipping the product altogether.

As for free software, it works perfectly well for trial purposes. However, exercise extreme caution if you intend to use it long-term. Commercial software companies need income to survive, so they invariably build their free tiers in some way that showcases the power of their software but still makes the paid tiers desirable.

You can even find a few completely free programs provided by contributors out of the goodness of their hearts. These are rarely enterprise-ready and almost never maintained for very long. In all cases, you cannot expect to receive significant support for free products.

Think long and hard before deciding to entrust your organization’s disaster recovery and business continuity to such tools.

Security considerations for backup

Organizations have always needed to consider the security of their data, whether on a live system or on backup media. However, “security” and “backup” mostly stayed separate. When security crossed into the backup conversation, it mostly meant protecting the media from data thieves. The world has changed.

Various disasters have always threatened systems and data. The appearance of ransomware has forced the world to rethink the nature of those threats. Once upon a time, backup was the security blanket for catastrophe. Backup has become a target. At the same time, nothing else can guarantee survival of a ransomware infestation.

Hornetsecurity VM Backup v9, for example, offers ransomware protection through immutability. Know what the software will handle and what will fall to you before deciding.

As you look through your software options, you will find considerable differences in deployment and management behaviors. Take note of their installation requirements and procedures. Common options:

  • Per-host installation, data direct to storage, no centralization
Security considerations for backup : Per-host installation, data direct to storage, no centralization
  • Per-host installation, data direct to storage, managed from a central console
Security considerations for backup : Per-host installation, data direct to storage, managed from a central console
  • Central installation, agents on hosts, data direct to storage
Security considerations for backup : Central installation, agents on hosts, data direct to storage
  • Central installation, agents on hosts, data funneled through a central system
Security considerations for backup : Central installation, agents on hosts, data funneled through a central system
  • Appliance-based installation, agents on hosts, data stored on or funneled through appliance
Security considerations for backup : Appliance-based installation, agents on hosts, data stored on or funneled through appliance

You will find other architectures. Before you purchase anything, ensure that you understand how to deploy it. If you need to rack a physical appliance or make capacity for a virtual appliance, you do not want that to catch you by surprise.

If your preferred program requires a dedicated server instance, that may have licensing implications beyond the backup application’s cost.

To properly protect your virtualization environment and all the data, use Hornetsecurity VM Backup to securely back up and replicate your virtual machine.

We ensure the security of your Microsoft 365 environment through our comprehensive 365 Total Protection Enterprise Backup and 365 Total Backup solutions.

For complete guidance, get our comprehensive Backup Bible, which serves as your indispensable resource containing invaluable information on backup and disaster recovery.

To keep up to date with the latest articles and practices, pay a visit to our Hornetsecurity blog now.

Conclusion

Selecting the right backup and disaster recovery provider is a critical decision for the long-term security and resilience of your data. It’s essential to move beyond merely choosing what to protect and focus on how to protect it.

This article has highlighted the key considerations, including architecture design and the choice between digital and hard-copy data protection. To safeguard your digital information, you need a robust combination of backup software, storage solutions, and a security strategy.

It’s crucial to make an informed decision, as your software selection will significantly impact your disaster recovery and business continuity operations.

FAQ

What is a cloud backup service provider?

A cloud backup service provider is a company that offers cloud-based storage and data backup solutions to help users protect and recover their digital information.

What is backup as a service?

Backup as a service (BaaS) is a cloud computing service that provides data backup and recovery capabilities. It allows users to back up their data to a remote, cloud-based server managed by a third-party provider.

What is an example of an online backup provider?

There is no better example than Hornetsecurity as we are a leading backup provider worldwide!

How to Make the Undeniable Business Case for Backup

How to Make the Undeniable Business Case for Backup

With the input of business-oriented personnel, you can determine how IT will deliver an appropriate business continuity design. To that end, you need to discover the capabilities of the technologies available to you.

Once you know that, you can predict the costs. You can take that analysis back to the business groups to build a final plan that balances what your organization wants for disaster recovery against its willingness to pay for it.

Mapping out your backup requirements will then help you plan software subscriptions to fulfil your needs. Hornetsecurity recognizes the necessity for multiple backup solutions and as such provides data backup and recovery service for all your critical Microsoft 365 services (Exchange mailboxes, SharePoint, OneDrive, Teams etc.) and also virtual machine backup.

Discovering the Technological Capabilities of Data Protection Systems

At this point, you have an abstract list of high-level business items. Few backup solutions target Line of-Business (LOB) applications. So, you need to break that list down into items that backup and replication programs understand.

To attract the widest range of customers, their manufacturers specify services and products that most organizations use. Common protections include:

  • Windows Server and Windows desktop;
  • UNIX/Linux systems;
  • Database servers;
  • Mail servers;
  • Virtual machines;
  • Cloud-based resources;
  • Physical hardware configurations.

You’ll need to create a map from the prioritized business-level items to their underlying technologies. Bring in technical experts to ensure that you don’t miss anything. Gather input on what needs to happen in order to recover the various systems in use at your organization.

Many require more effort than a simple restore-from-backup procedure. Some examples:

  • Active Directory;
  • Log-based SQL recovery;
  • Mail servers;
  • Multi-tier systems;
  • Cluster nodes.

Take input from line-of-business application experts as well as server and infrastructure experts. Seek out the experience of those that have faced a recovery situation with the systems that you rely on most. You might find exceptions or special procedures that would surprise generalists.

First Line of Defense: Fault-Tolerant Systems

Ideally, you would never need to enact a recovery plan. While you can never truly eliminate that possibility, you can reduce its likelihood with fault-tolerant systems. “Fault-tolerance” refers to the ability to continue functioning with a failed component.

Most fault-tolerant systems largely function at a low level, usually on the internal components of computer systems. To provide protection, they usually employ some method of hardware-level data duplication.

In the event of a failure, they use the redundant copy to continue providing expected functionality. Examples include multiple power supplies, disks, Network Interface Cards (NICs) and so forth.

However, until someone replaces the defective part, the system does not provide redundancy. Further failures will result in an outage and possibly data loss.

QR codes – The criminal’s new best friend

Storage technologies make up the bulk of fault tolerant systems. Not coincidentally, they also have the highest failure rate. You can protect short-term storage (main system memory) and long-term storage (spinning and solid-state disks).

System memory fault tolerance

To provide full fault tolerance, memory controllers allow you to pair memory modules. Every write to one module makes an identical copy to the other. If one fails, then the other continues to function by itself.

If the computer also supports memory hot-swapping and technicians have a way to access the inside without unplugging anything, then a replacement can be installed without halting the system.

Of course, system memory continues to be one of the more expensive components, and each system has a limited number of slots. So, to use fault-tolerant memory, you must cut your overall density in half.

Doubling the number of hosts presents more of a cost than most organizations want to undertake. Fortunately, memory modules have a low rate of total failure. It is much more likely that one will experience transient problems, which can be addressed with cheaper solutions.

Server-class computer systems usually support error-correcting code (ECC) memory modules. ECC modules incorporate technologies that allow for detection and correction of memory errors.

Some vendors provide proprietary technologies to defend against problems.

In most cases, you will choose ECC memory over fully fault-tolerant schemes. ECC cannot defend against module failure, but such faults occur rarely enough to make the risk worthwhile. ECC costs more than non-ECC memory, but it still has a substantially lower price tag than doubling your host purchase.

Hard drive fault tolerance

Hard drives, especially the traditional spinning variety, have a high failure rate. Since they hold virtually all of an organization’s live data, they require the most protection. Due to the pervasiveness of the problem, the industry has produced an enormous number of fault-tolerant solutions for hard drives.

RAID (redundant array of independent disk) systems make up the bulk of hard drive fault tolerance designs. These industry-standard designs use a combination of the following technologies to protect data:

Mirroring

Every bit written to one disk is written to the same location on at least one other disk. If a disk fails, the array uses the mirror(s).

Mirroring

Striping

Every bit written to one disk is written to the same location on at least one other disk. If a disk fails, the array uses the mirror(s).
Striping

Parity

Parity also uses a striping pattern, with a major difference. One or more blocks in each stripe holds parity data instead of live data. The operating system or array controller calculates parity data from the live data as it writes the stripe.

If any disk in the array fails, it can use the parity data in place of the live data. A parity array can continue to function with the loss of one disk per parity block per stripe.

Parity

If you wish to use RAID, you can choose from a number of “levels”. Each level of RAID provides its own balance of redundancy, speed, and capacity. With the exception of RAID-0 (pure striping for performance, no redundancy), all RAID levels require you to sacrifice some available space for protection.

Disks present a relatively low expense when compared to system memory, and you have many expansion options beyond the base capacity of a system chassis. So, while RAID presents a higher cost per stored bit than single disk systems, it is usually not prohibitive.

You have several choices when it comes to RAID. Many levels have fallen out of favor due to insufficient protection in comparison to others, and some simply consume too much space for cost efficiency. You will typically encounter these types:

  • RAID-1 – A simple mirror of two disks. Provides adequate protection, slightly lower than normal write speeds, higher than normal read speeds, and a 50% loss of capacity.
Parity RAID-1
  • RAID-5 – A stripe with a single parity block. Requires at least three disks. Each stripe alternates which disk holds the parity data so that in a failure scenario, parity calculations only need to occur for 1/n stripes. Can withstand the loss of a maximum of one disk. Provides adequate protection, above normal write speeds, above normal read speeds, and a loss of 1/nth capacity. Not recommended for arrays that use very large disks due to the higher probability of additional disk failure during rebuilds and the higher odds of a failure occurring between patrol reads (scheduled reads that look for bit failures).
Parity RAID-5
  • RAID-6 – Like RAID-5, but with two parity blocks per stripe. Requires at least four disks. Safer than RAID-5, but with similar concerns on large disks. Slower than RAID-5 and a capacity loss of 2/n.
  • RAID-10 – Disks are first paired into mirrors, then a non-parity stripe is written on one side of the mirror set, which is then duplicated to the corresponding mirror disk. Can function with the loss of one disk in each mirror but cannot lose two disks in the same mirror. Provides better performance and a higher safety rate than parity schemes, but at a loss of 50% of total drive capacity.
Parity RAID-5

Due to the preponderance of drive failures and reduced performance of standardized redundancy schemes, many vendors have introduced proprietary solutions that seek to address particular shortcomings of RAID.

Whereas RAID works at the bit and block levels, most vendor-specific systems add on some type of metadata-level techniques to provide protection or performance enhancements.

You have an overwhelming number of choices when it comes to fault-tolerant disk storage, so keep a few anchor points in mind:

  • Storage vendors naturally want you to buy their highest-cost equipment. Use planning tools to predict your capacity and performance needs before you start the purchasing process. Businesses frequently overestimate their space and performance requirements.
  • You can almost always expand your storage after initial implementation. You do not need to limit yourself to the capacity of a single chassis as you do with system memory.
  • Solid-state disks have a substantially lower failure rate than spinning disks. You can leverage hybrid systems that incorporate both as a way to achieve an acceptable balance of performance, redundancy, and cost.

The most important point: downtime costs money. Storage redundancy directly reduces the odds of an unplanned outage.

Advanced storage fault tolerance

The advent of affordable, truly high-speed networking (ten gigabit and above) has brought exciting new options in storage protection. Today’s networking speeds exceed even high-end storage equipment.

Once the sole purview of high-end (and very high-cost) storage area network (SAN) devices, you can now acquire chassis-level, and even datacenter-level, storage redundancy at commodity prices.

These technologies depend on real-time, or synchronous, replication of data. In the simplest design, two storage units mirror each other.

Systems that depend on them can either connect to a virtual endpoint that can fail over as needed or they connect to one unit at a time in an active/passive configuration. In more complex designs, control systems distribute data across multiple storage units and broker access dynamically.

We discuss real-time replication more completely in the article titled “How to Use Replication to Easily Achieve Business Continuity”.

The most advanced examples of these technologies appear in relatively new hyper-converged solutions. These use software to combine the compute layer with the storage layer on standard server-class computing hardware.

In most cases, they involve a hypervisor to control the software layer and proprietary software to control storage.

While costs for distributed storage and hyper-converged systems have declined dramatically, they remain on the higher end of the expense spectrum.

Unlike traditional discrete systems, you will need significant infrastructure and technical expertise to support them properly. You can consider the duplicated data in this fashion as a “hot” copy. It’s updated instantaneously and you can fail over to it quickly.

Some synchronous replication systems even allow for transparent failover or active/active use.

Application and operating system fault tolerance

At the highest layer, you have the ability to mirror an operating system instance to another physical system. To make that work, you must run the instance under a hypervisor capable of mirroring active processes.

It’s a complex configuration with many restrictions. Few hypervisors offer it, it won’t work universally, it won’t survive every problem, and the performance hit might make it unworkable for the applications that you want to protect most.

At a more achievable level, some applications allow a measure of fault tolerance through tiering. For instance, you can often run a web front-end for a database. You can use load balancers that instantly move client connections from one web server to another in the event of failure.

Some database servers also allow for multiple simultaneous instances that can instantly redirect connections to a functioning node. These technologies have greater functionality and feasibility than operating system fault tolerance.

In most cases, when an application offers its own in-built redundancy option (Exchange Server Database Availability Groups, or SQL Server Always On availability groups for example), these are always preferred over generic OS or Hypervisor high availability options, see below.

Caveats of fault tolerance

As you explore options for fault tolerance, you’ll quickly notice that it comes at a substantial cost. Almost all the technologies will require you to purchase at least two of everything. Most of them will necessitate additional infrastructure.

All of them depend on expertise to install, configure, and maintain. Those costs always need to be scoped against the cost of equivalent downtime.

The primary purpose of fault tolerance is to rely on duplicates to continue functioning during a failure. That has a negative side effect: your fault-tolerant solution might duplicate something that you don’t want.

For example, if ransomware attacks your storage system, having RAID or a geographically redundant SAN will not help you in any way. Even in the absence of a malicious actor, redundant systems will happily copy accidental data corruption or delete all instances of a vital e-mail on command.

While fault tolerance will serve your organization positively, it cannot stand alone. You will always need to employ a backup solution for asynchronous data duplication. However, you have options between fault tolerance and backup. Those technologies reside in the high availability category.

Second Line of Defense: High Availability

You can’t use fault tolerance for everything. Some systems have no way to implement it. Some have a prohibitively high price tag. Instead, you can deploy high-availability solutions. High availability has a more nebulous definition than fault tolerance. It applies less to actual technologies and more to outcomes.

Where fault tolerance means working through a failure without interruption, high availability measures actual uptime against expected uptime.

As an example, your organization sets a target of 99.99% annual availability for a system that they want always to work. To achieve that, you would need to ensure that the system does not experience more than a few minutes of total downtime in the course of a year.

365 days times 99.99% equals 364.9635 days of uptime, which allows a little less than 48 minutes. That’s an aggressive goal.

When you build high availability goals, ensure that you distinguish whether or not you include planned outages in the metric. If you include them, then you may substantially reduce your tolerance for failures.

If systems expected achieve 99.99% uptime require five minutes per month to fail from active systems to backup systems during patch cycles and you include that in the metric, then they will violate the availability expectation by 12 minutes per year even without unexpected outages.

Along with adjusting for planned maintenance, you can also set the scope of availability. As an example, you can keep the 99.99% goal, but indicate that it only applies from 6:00 AM to 6:00 PM on weekdays. You could exclude company holidays.

Take care to follow two critical steps:

  1. Clearly outline any non-obvious exceptions. If you set an expectation of 99.99% in large font and subtly list conditions below, then you will eventually experience the wrath of someone that feels deceived and betrayed. Avoid that from the beginning.
  2. Define a precise standard for “uptime”. Favor the user experience in these results, but also have something that you can objectively measure. For instance, “customer can place a complete order on the website” works well as an abstract goal, but how do you measure that? If a system failure would have prevented a customer from ordering, but no customer tried, does that count as an outage? If a customer order fails, how do you know if the system was at fault?

From the technology angle, any tool that specifically helps to improve uptime falls under the high availability umbrella. All fault-tolerant technologies qualify. However, you also have some that allow a bit of downtime in exchange for reduced cost, wider application, and simpler operation. Among these, clustering is generally the most common.

High Availability with Clustering

Clustering involves using multiple computer or appliance nodes, usually in an active/passive configuration, to host a single-instance resource. Some examples that depend on Microsoft’s failover clustering technology:

Microsoft SQL

A clustered Microsoft SQL database runs on one of many nodes. In a planned failover, the database becomes unavailable for a few seconds while its active node stops and one of the passive nodes start. In the event of active node failure, the database is offline for a few seconds while a passive node starts it. Active transactions might drop in an unplanned failover.

Hyper-V

A clustered virtual machine can quickly move online (Live Migration) or offline (Quick Migration) to another node in a planned failover. If its active node fails, the virtual machine crashes but another node can quickly restart it.

File server

The standard clustered Microsoft file server hosts through an active node, with planned and unplanned failovers occurring quickly. Microsoft also provides a scale-out file server, which operates in a more fault-tolerant mode.

Storage Spaces Direct

Commonly called “S2D”, Storage Spaces Direct is Microsoft’s distributed file system offering. It works on Windows Server for plain storage needs. Azure Stack HCI also implements it to provide a complete hyper-converged infrastructure solution.

You will find clustering technologies in other operating systems, hypervisors, and physical appliances. Remember that these differ from fault-tolerance in that they allow some downtime. However, they greatly reduce downtime risks when compared to standalone systems.

High Availability with Clustering

Caveats of clustering

Clustering provides a duplicate of the compute layer. It ensures that a clustered workload has somewhere to operate. It does not make any copies of data. Without additional technology, a critical storage failure can cause the entire cluster to fail. Because of the necessity of hardware duplication, clustering costs at least twice as much as operating without a cluster. You might also need to purchase additional software features in order to enable a clustered configuration. Clustering requires staff that know how to install, configure, and maintain it. You must also take care that the backup solution you choose can properly protect your clustered resources. Solutions such as Hornetsecurity’s VM Backup protect virtual machine clusters. You can sometimes successfully employ a backup solution that doesn’t interoperate with your high-availability solution, but it will require significantly more administrative effort.

High Availability With Asynchronous Replication

You can employ technologies that periodically copy data from one storage unit to another. Asynchronous replication can use a snapshotting technique to maintain complete file system consistency. Some replication applications use a simple file-copy mechanism, which works well enough for basic file shares but not for applications. Some applications have their own asynchronous replication built in. Microsoft’s Active Directory will automatically send updates between domain controllers. Most SQL servers have a set of replication options. Microsoft Hyper-V can create, maintain, and control virtual machine replicas. You can consider data created by asynchronous replication as a “warm” copy. It does require some sort of process to bring online after a failure, but you can place it in service quickly.

Caveats of asynchronous replication

Unlike clustering, asynchronous replication requires some human interaction to switch over to a copy after a failure. Clustering technologies use some sort of control technique to prevent split-brain situations in which two copies run actively and simultaneously. Most replication systems have no built-in way to do that. So, if you choose to implement replication, ensure that you plan accordingly.

Replication shares the main drawbacks of clustering: it requires duplicated hardware, special software, and expertise. It also does not protect against data corruption, including ransomware.

The Universal Fail-Safe – Backup

Out of all available disaster recovery and business continuity technologies, only backup is both sufficient on its own and necessary in all cases. You can safely operate an organization without any fault-tolerance or high availability technologies, but you cannot responsibly omit data backup and recovery service.

Please note that the following section contains many terms you will need to know to understand. The glossary contains all the definitions you’ll require.

Before you start shopping, ensure that you understand common backup terms:

  • Full backup – A complete, independent duplication of data that you can use to recover all data without any dependency on any other data.
  • Differential backup – An abbreviated backup that only captures data that changed since the most recent full backup. Usually operates at the file level.
  • Incremental backup – An abbreviated backup that only captures data that changed since the most recent backup of any kind. Usually operates at the file level.
  • Media – Storage for backups. Intended as a catch-all word whether you save to solid state drives, magnetic disks, tapes, optical discs, or anything else.
  • Delta – In backup parlance, delta essentially means “difference”. Most backup vendors use it to mean a measurement of how a file or a block has changed since the last backup. You can reasonably expect the term “delta” to designate technology that operates below the file level.
  • Crash-consistent – A crash-consistent backup captures a system’s data at a precise point in time. It carries the name “crash-consistent” because, if you restore to such a backup, the system will act exactly as though it had crashed when the backup was taken. A crash-consistent backup does not protect any running processes, nor does it give them any opportunity to save active data. However, it captures all files exactly as they were at that moment.
  • Application-consistent – An application-consistent backup interacts with applications to give them an opportunity to save active data for the backup. All other data, including that of applications that the backup applications cannot notify, will save in a crash-consistent state.
  • Restore – The act of retrieving data from a backup. Restoration can return data to a live system or to a test system. Most tools allow you to choose between complete and partial restores.
  • Rotation – Re-using backup media, usually by overwriting older backups. Some backup software has intricate rotation options.

Not everyone agrees on the definitions of “crash-consistent” and “application-consistent”, and some vendors have introduced their own labels.

Ensure that you understand how any given vendor uses these terms when you study their products and talk to their representatives. Also have them explicitly define what they mean by “delta” in their solutions.

As you explore backup solution choices, you need to use the plan created by your business teams as a guideline. You want to try to satisfy all requirements for data protection and retention. Consider these critical components of data backup and recovery service technologies:

  • Backups must create a complete, standalone duplicate of data;
  • Backups must maintain multiple unique, non-interdependent copies of data;
  • Backups should complete within your allotted time frame;
  • Backups should provide application-consistent options;
  • Backups should work with the type of backup media that you want to use;
  • Backups should work with your cloud providers, both to protect your cloud resources and to back up to your.

The above list only constitutes a bare minimum. Realistically, all backup vendors know that they need to hit these targets, so only a few will miss. Usually, those are the built-in free options or small hobbyist-style projects. You will find the greatest variances among the last two items.

Products will distinguish themselves greatly in operation and in optional features. You should avail yourself of trial software to experience these for yourself. Some things to look for:

Ease of operation (especially restores)

In a disaster, you cannot guarantee the availability of your most technically proficient staff, so your backup tool should not require them.

Speed of operations

Backup and restore operations need to complete in a reasonable amount of time. However, they cannot sacrifice vital functionality to achieve that. Most backup vendors utilize some sort of deduplication technology to reduce time and capacity needs, but you absolutely must have a sufficient number of non-interdependent copies of your data.

Retention lengths

Most backup applications allow an infinite number of backups – except in their free editions. If your organization won’t allow you to spend money on backup software, that might prevent you from achieving their requirements.

Support for the products that you use

As mentioned earlier in this article, very few backup applications know anything about line-of-business software. However, they should handle your operating systems and hypervisors. Some will have advanced capabilities that target common programs, such as mail and database servers. If you choose a solution that does not natively handle your software, ensure that you know how to use it to perform a proper backup and restore.

Offsite support

Because you will use backup to protect against the loss of your primary business location, your backup tool needs to have some method that allows you to take backup data offsite. Traditionally, that meant some sort of portable media.

Today, that also means transmitting to an alternative location or a cloud provider.

Support for alternative hardware

After a disaster, you probably won’t have the luxury to restore data to the same physical hardware that it protected. Make sure that your backup application can target replacement equipment.

Technical support options

Hopefully, you’ll never need to call support for your backup product. However, you don’t know who might need to perform a restore. That task might fall to a person that will need help. You also need to consider future product updates and the possibility of bugs that need attention. Ensure that you understand your backup provider’s support stance and process. Check public sites and forums for reviews by others, although remember that happy people rarely say anything, and angry people often exaggerate. Look for complaints that highlight specific problems. If possible, try to talk to someone in support before purchase. Consider data created by backup as a “cold” copy. You must take some action to transition the data from its backup location before you can use it in production. It usually has a much higher time distance from the failure point than replication.
High Availability with Clustering

Closing the Planning Phase

You have now seen all the basic concepts and have enough knowledge to tackle the planning phase of your disaster recovery strategy.

To properly protect your virtualization environment and all the data, use Hornetsecurity VM Backup to securely back up and replicate your virtual machine.

We ensure the security of your Microsoft 365 environment through our comprehensive 365 Total Protection Enterprise Backup and 365 Total Backup solutions.

For complete guidance, get our comprehensive Backup Bible, which serves as your indispensable resource containing invaluable information on backup and disaster recovery.

To keep up to date with the latest articles and practices, pay a visit to our Hornetsecurity blog now.

Conclusion

In conclusion, crafting an unassailable business case for backup is a multifaceted endeavor. By collaborating with business-focused experts, understanding available technology capabilities, and predicting costs, you pave the way for a robust business continuity strategy.

This process harmonizes the organization’s disaster recovery aspirations with its financial constraints, ensuring a comprehensive, cost-effective solution. With a well-founded plan, you can confidently safeguard your business against unforeseen disruptions.

FAQ

What is data backup and recovery services?

Data backup and recovery services ensure your data’s safety and availability in case of loss. It involves duplicating and archiving computer data to prevent data loss due to corruption or deletion.

What does a data recovery service do?

Data recovery service providers specialize in recovering lost data by understanding data storage and restoration techniques.

Is it safe to use a data recovery service?

Opting for data recovery software is a safer choice compared to physical recovery attempts. Hornetsecurity offers a comprehensive solution for your data backup and recovery needs.

Unraveling RTO vs. RPO: Key Factors in Disaster Preparedness

Unraveling RTO vs. RPO: Key Factors in Disaster Preparedness

With sufficient funding and infrastructure, any system could theoretically achieve near-constant uptime through any situation. Reality dictates a more conservative outlook. To establish a workable budget and a practical plan, you will need to determine your organizational tolerances for outages and loss. This article explores the related terminology and processes.

Now, you will need to have the key employees of each system explore the question more deeply. If the term “business impact” does not convey the desired level of urgency, ask questions such as

“How much does this system cost us per hour when offline?” and

“How many hours of work would we need to recover after losing one hour’s worth of data?”

Establishing Recovery Time Objectives

Establishing Recovery Time Objectives

The simple question, “How long can we operate without this system?” can get your teams started. The term “recovery time objective” (RTO) applies to the goals set by this enquiry. RTOs establish the desired maximum amount of time before a system returns to a defined usable state.

Complex systems can have different RTOs. For instance, you might set an RTO of four hours to restore a core electronic records system after a major failure but set a separate objective of one hour after a minor glitch. Your objectives might also set differing levels for acceptable functionality.

Establishing Recovery Time Objectives

Your organization might consider a functioning receipt printer in the customer service area as a meaningful success metric; it can have its own RTO as a part of a larger recovery objective.

RTOs should feature prominently in your long-term disaster recovery planning. Rely on the managers and operators of the individual systems to provide guidance. Use executives to resolve conflicting priorities. You may also need them to grant you the ability to override decisions in order to ensure proper restoration of functionality.

Establishing Recovery Point Objectives

RTOs apply mainly to functionality. Events that trigger recovery actions also tend to cause data loss. Your organization will need to establish tolerance. Of course, no one wants to lose anything, which will make these discussions difficult.

Because most backups occur at specific time intervals, you use them as the basis for “recovery point objectives” (RPO). An RPO sets the maximum acceptable time span between the latest backup and the data loss event.

This determination coincides with the work to determine the business impact of an outage. A system’s downtime not only prevents its users from retrieving or utilizing its contents but also requires post-recovery work: staff will need to recreate any data that was not in the backup and complete postponed operations.

Establishing Recovery Point Objectives

You will need to establish multiple RPOs for most systems. Not all events will have the same impact, so you must set expectations accordingly. For instance, you have options for continuously created replicas and backups.

Those work well as buffers against physical hardware failure. They work poorly against malicious attacks, especially encrypting ransomware.

You can establish a tiered recovery approach to address the various risks. As an example:

  1. First-line hardware failure or malfunction: RPO of zero hours, using continuous replication
  2. Corrupted data: RPO of 1 hour after corruption detection, using on-site hourly backups
  3. Site destruction: RPO of 24 hours, using off-site daily backups and cloud hosting providers

Consider the possible outcomes of each risk category as you work out RPOs. You don’t just need the data to restore; you need something to restore it to.

If you need to acquire replacement hardware or bring in third parties for assistance, that might add time. If you have a secondary site available, add an RPO item for recovery to that location. Also include an item that addresses cross-facility challenges. Do not forget to account for the availability of critical staff.

Defining Retention Policies

Your teams have a final major decision to make: how long to keep data. These decisions are highly dependent on the nature of the business and the data. For European-based operations you may also have to consider GDPR requirements. If you don’t have immediate answers, use two major guidelines:

  1. Legal requirements. As an example, you may need to keep records of taxable events for a number of years.
  2. How long will the data have value?

Ensure that the burden of answering these questions does not fall solely to IT. Because it involves the word “data”, some will see a natural responsibility for technology staff to take ownership. However, IT typically does not hire legal experts, and corporate liability tends to fall on the shoulders of executives.

As to the second point, tech departments may have some business knowledge, but “value” often means a subjective assessment that should, at the very least, involve the data owner.

Use the answers from these questions to create “retention policies”. A retention policy dictates how long data must be retrievable. You will likely need more than a single company-wide policy. “Forever” may seem like an obvious answer for some things but ensure that everyone understands that data storage has an associated cost.

Data retention has two tiers:

  1. live storage
  2. backup storage

In disaster recovery planning, IT often only considers the backup tier. However, remember that a backup captures existing data, regardless of its age. So, if a live database has records that go back a decade, then the most recent backup contains ten-year-old information.

Therefore, both your current live data and yesterday’s backup satisfy a ten-year retention policy.

Defining Retention Policies

To accommodate both the live and the backup tiers, retention policies must consider two things:

  1. Purge policies for active data
  2. Probability of unnoticed undesired deletion

Some electronic records systems prevent true deletion from databases without a purge action. It might move “deleted” records into a historical table or it may have a flag that removes them from visibility in client applications. Such safeguards reduce the probability of accidents.

They can help against malicious deletion as well. Remember that individuals with administrative access can usually override application-level security. For the greatest safety, assume that you will not achieve your retention policies for live data.

You can relax that expectation for non-critical data. Factor in the results of impact analysis from the earlier exercises.

Ask, “If we lost this data forever, how would it impact the organization?

Adjusting RTOs, RPOs, and Retention Policies to Match Practical Restraints

Shorter RTOs and RPOs almost always require greater financial and technical resources. Short backup intervals consume more media space and network bandwidth. Lengthy retention policies increase storage and administrative costs. Layered approaches to cover the various risk profiles can multiply those needs.
Adjusting RTOs, RPOs, and Retention Policies to Match Practical Restraints

Backup operations place a load on the production system, which might add more strain than your current equipment allows. Replication and continuous backup technologies need more technical expertise than typical nightly backups. Staff must periodically test the validity of backup data, adding effort and overhead.

Make all these constraints clear during early planning meetings. As executives and department heads express their wishes for speedy RTOs and short RPOs, ensure that they understand that costs will rise accordingly. They may need to adjust their expectations to match.

Your plans also need to factor in time and expense to re-establish infrastructure after a failure. You may need to replace physical systems. Vital foundational infrastructure, such as domain controllers, automatically take precedence over anything that depends on them. Adjust RTOs and RPOs for dependent systems accordingly.

The backup software that you choose will play a role in your RTO and RPO restrictions. Hornetsecurity’s VM Backup V9 provides highly customizable backup scheduling options as well as Continuous Data Protection (CDP). You need fine-grained flexibility such as this to balance your backup needs against your available resources.

Reviewing Recovery Objectives

The major activities of this article include input from all sectors of the business. Through interviews, questionnaires, and meetings, you can assemble an organizational view of what you need to protect. Next, you need to determine how you will implement that protection.

You have not completely finished working with the non-technical departments, but you can allow them to gather the necessary data while you move to a different phase of the project.

To properly protect your virtualization environment and all the data, use Hornetsecurity VM Backup to securely back up and replicate your virtual machine.

For complete guidance, get our comprehensive Backup Bible, which serves as your indispensable resource containing invaluable information on backup and disaster recovery.

To keep up to date with the latest articles and practices, pay a visit to our Hornetsecurity blog now.

Reviewing Recovery Objectives

Conclusion

In conclusion, understanding the crucial distinctions between RTO (Recovery Time Objective) and RPO (Recovery Point Objective) is fundamental to disaster preparedness. While achieving constant uptime is ideal, practicality and budget constraints necessitate a balanced approach.

By clearly defining your organization’s tolerance for outages and data loss, you can develop a resilient disaster recovery strategy that aligns with your resources and objectives. Carefully considering RTO and RPO will empower your business to navigate unforeseen challenges more confidently and efficiently.

FAQ

What is the difference between RTO and RPO?

RTO (Recovery Time Objective) refers to the maximum acceptable downtime for a system or application after a disruption occurs. It represents the time it takes to fully restore functionality to a level that allows normal operations to resume. In layman’s terms, RTO answers the question, “How quickly do we need our systems back online?”

RPO (Recovery Point Objective) defines the maximum allowable data loss after a disruption. It represents the point in time to which data must be recovered to ensure minimal business impact. In essence, RPO answers the question, “How much data are we willing to lose in a disaster?”

Moreover, RTO is your organization’s goal for the maximum time it should take to restore normal operations following an outage or data loss while RPO is your goal for the maximum amount of data the organization can tolerate losing.

What are RTO and RPO examples?

They are strictly numeric time values. For example, an RTO for a reasonably critical server might be one hour, whereas the RPO for less-than-critical data transaction files might be 24 hours and might also support the use of backup tape storage equipment.

RTO Example

Suppose a financial institution sets an RTO of 4 hours for its online banking system. In the event of a system failure or disaster, they must restore full functionality within 4 hours to meet their RTO.

RPO Example

A hospital establishes an RPO of 1 hour for its electronic health records system. This means that even in the event of a failure, they cannot afford to lose more than 1 hour’s worth of patient data.

Which is more important RTO and RPO?

RPO designates the variable amount of data that will be lost or will have to be re-entered during network downtime. RTO designates the amount of “real-time” that can pass before the disruption seriously and unacceptably impedes the flow of normal business operations.

RTO Importance

RTO is often more critical for systems and applications where downtime can have immediate and severe consequences. Industries such as finance or e-commerce may prioritize minimizing downtime to ensure continuous operations.

RPO Importance

RPO is crucial when data integrity and compliance are paramount. Organizations dealing with sensitive information, like healthcare or legal firms, may prioritize minimizing data loss to ensure accuracy and legal compliance.

How to Air-Gap and Isolate Your Backups

How to Air-Gap and Isolate Your Backups

In this article, following on from How to Secure and Protect Backup Data, we look at how to isolate your backups using air-gapping, firewall, performing risk analysis for your backup strategy, encrypting backups for additional protection and round off with some frequently asked questions.

Shielding Backup Systems With Firewalls

Your backup application should have the ability to reach out to other systems, but almost nothing needs to reach into its system. You can put up barriers to external access easily using firewalls. Every modern operating system includes a native firewall. Several third parties provide add-on software firewalls.
Shielding Backup Systems With Firewalls

Your antimalware software might include a firewall module. Hardware firewalls bolster software firewalls immensely. Most smaller organizations typically employ them only at the perimeter, but they can add substantial security to your internal networks as well. Even inexpensive devices provide isolation and protection.

You can also configure routers and switches with VLANs or network address translation to provide additional isolation layers.

Air-Gapping for Isolation

Among all the methods of isolating, air-gapping represents the strongest. It can even stand in for immutability solutions. However, it also requires the most effort to implement. Before choosing this route, take the time to understand its ramifications. It should not be undertaken on a whim or without input from executive decision makers.

The simplest description of air-gapping is that there is no remote connectivity into a given system. A most extreme example is offline root certification authorities. Administrators create them, publish their public keys and revocation lists, then take them offline and disconnect them. Some even take extra steps, such as removing their hard drives and placing them in locked storage.

To access such a system, a human must perform manual steps that involve physical actions and security measures.

You cannot realistically apply such a drastic procedure to your backup system. However, it serves as a beginning. Start there and add the minimum elements to make the backup operational. Backup servers need to be powered on and have some way to retrieve data from and push restores to their targets, but nothing else.

To make maintenance easier, the system should have some way of sending notifications to administrators. With all of that configured, you can function without any way to access the backup server remotely. So, you can set it up to only allow access from a physical console.

Air-Gapping for Isolation

The more that you use virtualization in your environment, the easier it is to implement air-gapping. You can configure the hypervisor and backup in one network and everything else in another. If they have no overlap or interconnect, then you have created a proper air gap.

You may even choose to go so far as to create an Active Directory domain just to hold these systems. That way, you can benefit from centralized management without connecting your production network to your management network.

The greatest risk with an air-gapped system is its enormous inconvenience. Preventing remote connections includes blocking valid administrative duties, too. It makes patching very difficult. It has no ability to transfer data to a remote location either, which means that you lose replication capability. You have only two choices: cope with these restrictions or do something that breaks the air-gapping.

A poorly air-gapped system is more vulnerable than one that was designed from the start to participate on the network. If you cannot commit to a completely disconnected system into perpetuity, then connect your backup system and build defenses around it.

Caring for Offline Data

The cold data that lives on data tapes and detached hard drives often does not get the protection that it deserves. Usually, IT departments start out with a protocol to care for them, but over time, they lose diligence. We covered encryption in an earlier section, which can serve as a last-ditch safeguard. However, you must make every effort to prevent unauthorized access.

Technological advances, reduced costs, and increased convenience have made fully online backup systems viable. Today, you can easily replicate backup data to geographically remote locations without an exorbitant investment.

However, the ever-growing threat of ransomware means that you must periodically create offline copies. In the past, that could only mean data that was completely inaccessible by any automated means.

That is still the safest option. However, you can take advantage of modern technology to create alternative approaches. You could manually upload backup data to a location that requires two-factor authentication, for instance.

Whatever measures you put in place, ensure that they isolate the remote site in such a way that no compromise of your online backup system or password vault will put offline data at risk.

Putting It in Action

Think of security as a continual process, not a one-time event. We will cover the hardware portions in the next article on deployment. This portion will cover these security actions:
  • Perform a risk analysis;
  • Set policies for software-level/media redundancy;
  • Establish backup encryption policy;
  • Determine practices and policies for creating and protecting offline data.
You saw the concepts behind these activities throughout this chapter. Now you must put them into practice.

Risk analysis activities

Much of risk analysis involves asking questions. You should gather input from multiple sources. Usually, one person does not have the visibility to know all likely risks. You can use formal meetings, informal queries, or any other approach that works for your organization. Categorize and list what everyone comes up with. Share them with key stakeholders, as they might bring up other ideas. Some starting thoughts:

  • Internal vs. external categories;
  • Malicious vs. accidental damage;
  • Targeted risks (e.g. employee data, client data, account numbers, etc.);
  • Equipment failures;
  • Weather;
  • Electrical outages.

You must keep this list up to date with an explanation of how your solution mitigates each. One of the best things you can do to protect your company is to train your employees with the Security Awareness Service.

Risk analysis activities

Creating backup redundancy policies

You will need to have made your software and hardware selections before you can craft your redundancy approach. The features and media types used in your systems will have great influence on your decisions. However, your primary goal must be to create sufficient standalone copies to survive loss or damage.

To qualify as “standalone”, a backup copy must not require any other backup data to exist in order for you to restore it. Furthermore, to provide security, the copy must only exist in an offline, disconnected state. Due to the inconvenience of offline backups, you can either build a schedule that mixes online with the occasional offline complete backup, or you can set a special schedule for offline backups.

You also need to plan for how long these offline copies will exist. The cost, longevity, and ease of storing the media tends to have the greatest control over that. If possible, simply keep them until you can no longer restore from them. Reality often dictates otherwise. If you can schedule full backups, then you might come up with a schedule such as:

  • Monthly: full, offline
  • Weekly: full, online

Some modern backup software that relies heavily on deduplication technology does not allow for scheduled full backups. Instead, these depend on other policies to set the oldest possible backup and calculate deduplication from that point forward.

Therefore, they consider full backups to be special, so you will need to perform them manually. The inconvenience of such backups, especially for an already busy IT department, will likely prevent their weekly occurrence. Create a palatable policy that balances the security of multiple full copies against the ease of creating and storing them.

Establishing an encryption policy

You will need to build your backup encryption policy around the way that your backup hardware or software utilizes encryption. Most software requires a single secret key for encryption. You have three major points for this type:

  1. Where will you store the key?
  2. Who will have access to the key?
  3. How will you ensure that the key will survive catastrophe?

Remember to include the loss of your backup encryption key in your risk analysis! The location of your key directly dictates access. Since you need it to remain available in the event of a total loss, then your best option is likely a cloud-based password vault.

There are multiple software companies that provide such services. Microsoft’s Azure has a “Key Vault” product and Amazon Web Services offers “AWS Secrets Manager”. Find the solution that works for your organization.

Any backup created with a particular encryption key will always need that key. So, if you change the key, you still need to keep a historical record for as long as a key has protected data.

Your hardware may offer some encryption capabilities. These features are manufacturer dependent. You will need to learn how it works before you can create a policy. If you prefer, you can simply use the software’s protection and forgo hardware-level encryption.

Shielding backup with physical and network protections

Leverage your infrastructure and network systems to build a layer of protection around your backup systems easily and efficiently. You will need to defend at layers one, two, and three.

Implement layer one (physical) protections

  1. Place backup hosts and devices in secure locations
  2. Create a chain-of-custody process for backup media
  3. If possible and cost effective, do not directly share switching hardware between backup systems and other systems

Implement layer two (Ethernet) protections

  1. Establish a VLAN just for your backup systems, or
  2. Create a chain-of-custody process for backup media
  3. Use dedicated physical switches for your backup systems and connect them to the rest of your production network through a router

Implement layer three (TCP/IP) protections

  1. If you isolate with a VLAN or dedicated router, create an IP subnet just for backup
  2. Set up a firewall at the edge of the backup network that blocks all externally initiated ingress traffic
  3. Configure the software firewall on backup hosts with a similar configuration to the previous firewall
All, or most of the previous isolation techniques should fit within even modest budgets. For greater protection, you have additional options.
  • Install intrusion prevention and detection solutions
  • Configure network monitoring
Data moving into your backup network will fit easily recognizable patterns. With even a rudimentary monitoring system, you should have no trouble spotting suspicious traffic.
Implement layer three (TCP IP) protections

Adding immutability

If you use tape technology, then you have two immutability choices: you can manually activate the write-protection mechanism on your tapes, or you can purchase tapes specially built as WORM media. Establish a policy for which tapes to protect. Some tapes use a sliding write-protect tab that allows an operator to enable or disable protection.

If your organization employs them, the policy must clearly state expectations. As you work through the Defining Backup Schedules chapter, you will encounter natural times to set aside unwritable tapes.

Newer software-based solutions require upfront configuration work. Services, like Hornetsecurity VM Backup V9, depend on your cloud provider’s immutable storage offerings. In most cases, these require little configuration effort.

First, you set up the cloud storage. Next, decide between a policy that restricts changes to select accounts or one that prevents all changes. If you allow any changes, separate the accounts with modification privileges from those involved in routine backup operations. Otherwise, you lose the primary reason for immutable storage.

Finally, select a retention policy. That policy dictates when data becomes writable. Work with your cloud provider to discover other capabilities. Remember that your provider will happily provide you with multiple storage objects so that you can use a variety of configurations simultaneously. Make certain that everyone involved in backup operations understands these configurations and their implications.

Fully isolating backup systems

Perform a complete risk analysis before you even consider an air-gapped approach. If you do not face significantly high exposure threats from malicious actors it may be overkill. Complete isolation looks simple, but it presents substantial long-term challenges for administrators.

Review the discussion above and consult with executives, key stakeholders, and others in your IT department with deployment or maintenance responsibilities. Due to full isolation, this approach only works for hypervisor-based backups. To create an air-gapped backup system:

  • Designate an IP subnet for your air-gapped network;
  • Decide on a workgroup or management domain configuration;
  • If you will use a management domain, create and configure it before proceeding;
  • Connect your hypervisor and backup hosts to a physical network that has no uplink;
  • Ensure that the physical network links that you use for virtual machines does not provide any layer two or layer three connectivity to the hypervisor’s management operating system;
  • Create a policy and an accountability process for acquiring and applying software updates.

The only practical risks to a properly air-gapped system are internal actors and breakout attacks against your hypervisors or container hosts. It still makes some sense to use anti-malware software as well as intrusion prevention and detection systems.

Fully isolating backup systems
It cannot connect to the Internet in any way. If you cannot permanently guarantee absolute isolation, then you should instead follow the steps in the previous section to allow your backups to participate on the network with adequate protections.

To properly protect your virtualization environment and all the data, use Hornetsecurity VM Backup to securely back up and replicate your virtual machine.

For complete guidance, get our comprehensive Backup Bible, which serves as your indispensable resource containing invaluable information on backup and disaster recovery.

To keep up to date with the latest articles and practices, pay a visit to our Hornetsecurity blog now.

Conclusion

Robust encryption, access controls, and authentication measures should be integrated into backup systems to thwart potential attackers. It is essential to implement secure offsite storage and adopting immutable backups can offer an added layer of protection against ransomware attacks.

FAQ

Are cloud backups air-gapped?

Historically, air gapping has been associated mainly with tape backups. However, modern cloud backup solutions offer a virtual equivalent of the air-gapped tape concept, enhancing security and resilience.

What is the 321 rule for backups?

The 3-2-1 backup strategy holds immense value. It entails maintaining a minimum of three data copies – two on local devices stored on different media and one off-site, ensuring robust data protection.

What is air gap in storage?

The concept of air gap in storage revolves around a backup and recovery strategy. It ensures that critical data remains offline and disconnected from the internet, reinforcing security.

What is the air gap backup for ransomware?

Air gapping serves as a robust defense against ransomware threats. By isolating secondary or tertiary backup copies from the public domain, it effectively enhances data security and reduces ransomware risks.