It’s inevitable; there will come a day when one of your customers has an incident that impacts operations. A domain controller burns to a crisp, and half the office is hit with ransomware, or a hurricane wipes the building out completely. Even if the incident in question only impacts one person, it still affects your customer’s ability to do business. You need to be able to respond and resolve the issue swiftly. Just like every customer problem you’re tasked with solving, there’s a bit of diligence work to understand the nature of the issue, a period determining the correct course of action. Then, the time it takes to actually rectify the problem (and don’t forget to include any unforeseen consequences, hiccups, etc., that may show themselves while you’re cleaning up the incident). If your techs either don’t know how to resolve an incident or simply take too long, your customer’s operations suffer – as does your reputation with the customer. So, what can you do to improve your ability to respond to incidents? Here’s a simple 5-step guide to set up a reliable incident response procedure.

The Process

The following steps will help make the response more of a known process to follow and less like putting out a fire. Consider performing this for each service you offer.

Step 1 – Build a List of Known Incidents

You should already know what areas of your customer’s tech you are responsible for based on the services you offer. The MSP community already has a ton of empirical and anecdotal data on what kinds of incidents are most common for each of your services. Begin by building out a list of those incidents you believe you should have a plan for.  You shouldn’t limit this list to only those major catastrophic-type events such as the loss of a major system or a widespread ransomware attack; include issues big and small – we’ll hone down the list in the next step.

Step 2 – Talk to Your Customer

Discuss your list of common incidents with your customers, seeking to understand which ones they think are most impactful to operations. In many ways, you may already have some of this detail if you plan your customer’s DR strategy by first discussing which systems, applications, and data would hurt the business most if they were unavailable.  At the end of this step, you should, at the very least, have a prioritized list of incidents, with your customer’s concerns at the top of the list.

Step 3 – Build Incident Response Plans

If your use of a ticketing system is mature, you likely have a knowledge base of processes to follow that address specific errors, user issues, etc. In many ways, you’re doing the same thing here: build a plan of how to respond to each incident. One of the differences with response plans is you should also include contingency planning.   For example, if you can’t get the CEO’s laptop back up and running, what’s the plan? Or if the on-prem recovery of a tier 1 workload isn’t possible because the local hardware is damaged, then what? Think of these plans as containing both tactical and strategic steps so that your techs are prepared for anything.

Step 4 – Build an Incident Plan for the Unknown

It sounds a little odd, making a plan for the unknown. But it’s important to have a structured plan of what high-level steps need to be taken in the case an incident occurs you haven’t thought of. For example, what should you do if not one but three of your customers’ critical applications fail simultaneously? Or how about if you believe your customer has become the victim of data manipulation (where the data isn’t deleted or encrypted; it’s maliciously changed, putting the entire data set into question), but you’re not sure of the extent of the attack?  In instances like these, your plan should include some form of triaging the affected systems/applications/data, communicating with the customer, prioritizing where to place your response focus, and planning the tactical next steps. Step 5 – Regularly Review and Update the Incident Plans Continual improvement is key to maintaining an effective incident response capability. As you resolve actual incidents, gather your team to review what went well and where there were challenges. Use these real-world experiences to refine your incident response plans. This should involve regularly updating the list of known incidents as new threats emerge and revisiting your strategies for both known and unknown scenarios.  Make sure to involve customer feedback in these updates, especially if an incident directly impacts them. This step ensures your plans are up-to-date with the latest technological threats and business priorities and remain aligned with your customer’s evolving needs and expectations. In essence, it turns incident response from a reactive task to a proactive, dynamic process.

Tools to Help You Along the Way

It’s all well and good to tell you that you should be doing all of the above, but what about tool sets to help with this? We work in technology; shouldn’t there be some applications out there that can help with this process? Well, yes! Consider the apps and tools below when working on this list and for ongoing operations. Ticketing System – This was mentioned above, but the importance of this tool can’t be stressed enough. A ticketing system (Such as Connectwise) contains a historical list of all incidents with a given customer. Not only will you use this tool to track issues as they arise, but this is also a good place to look for repeating issues.  Maybe customer A’s print server goes down on the 3rd Friday of every month. Not only will this tool give you an idea that this is happening, but it can also be used to find and squash repeating bugs. Wiki & Documentation – Let’s face it: no one wants to read about extended fix information inside of a trouble ticket. Ticketing systems are great at tracking issues, but some lack the ability to do meaningful, lengthy documentation. Consider a wiki or professional documentation tool for this, such as IT Glue. Defined Playbook – You’ll have a better idea of what to plan for if you’re only offering a defined set of products/services. Try to define a primary and secondary offering for each product area. Pick 2 AV vendors, two encryption vendors, and two storage vendors, etc. Once your team becomes familiar with these defined products, responses will improve, and you’ll also be more prepared for the unknown.

Wrap-Up

Your ability to address an incident is judged by how quickly you respond to the issue and how accurately you resolve it. Without a plan, your techs are spending their time googling the problem, hoping to find answers. But, by proactively building response plans (following the five steps above), your team will know what to do (or, at the very least, have some kind of idea of how to start dealing with the problem) in just about any circumstance. What about you? Have you put any incident response plans into place? Do you feel they’ve prepared you adequately? Why or why not? Thanks for reading!