In my BCDR world I talk to people often about practice hard and realistic and do sandtable BCDR exercises. But sometimes I have to explain why and what I mean. So I thought I should put it on my blog in case it helps people.
Practice Hard and Realistic
I often tell people in BCDR situations to practice hard, and realistic. Partly this is so that they are successful and have good test failovers, and learn so that they are good at doing failovers.
But there is something else. I have been in disasters, when I was a long way away from the disaster but our datacenter was caught up in the disaster. My co-workers were all high on adrenaline and thinking very narrow, and easily distracted.
So by practicing hard, and often, and realistic, you end up with muscle memory and that is very helpful when you are distracted and on adrenaline .
Most people do not do well on adrenaline so the muscle memory helps a little. The best way to get used to it is to be on it a lot. Like a solider or mountaineer. Once you get used to it, it becomes quite a good tool.
BCDR Sandtable Exercises
I suggest that BCDR people do sand table exercises before they do the actual BCDR work so they get an idea of what might go wrong and what they might have to do.
In my military life as an Airborne Infantryman the sandtable was actually a sand table and they used to make models on it like the town or city we sere going to parachute into. It was very useful.
In BCDR no sand is required. You do need a few roles:
- Someone to be the disaster. They say what went wrong.
- IT Operations people – who experience the issues.
- Management, who can issue the order to failover
- BCDR people who can advise and assist.
It sort of plays like the disaster person says the data centre has lost its roof, and most of the computers. The BCDR people ask the IT Ops people how long to fix, and how long to get the DR site working. Then either the BCDR people, or the IT Ops people ask the management people to order the failover. The management people might ask a question or two then order the failover. The IT people do the failover, and provide info to the BCDR people to distribute to all the teams about how to access it.
The disaster person can complicate things! They can do thinks like say why? Or say something did not work.
The first few times this is done it is only about practicing decisions and discussions. But then you can do a test failover as part of it. Once you are good, you can, maybe once per year, failover over one or two apps to the DR site and let them run there for a week as part of the exercise.
This really helps out in a variety of ways. It helps you make sure everything is covered, and it helps you find problems. I have used these exercises with customers and great success. It helps find issues and problems that you can build in mitigation to your plan for. It is also good for management to practice giving the go for the failover. Which is good practice!
Comments and questions welcome!
=== END ===