SRM Testing (and external resources)

Hello all,

This is something I have been asked about a lot.  We have had labs on it, and I wrote blogs on it when I was working at VMware.  But I still get asked.  So I am going to write it on this blog.  I do not have access to Site Recovery Manager (SRM) at this time so no screenshots.  So hopefully this will still help.  BTW, if I do get access to SRM, I will add some screenshots, and maybe test some scripts to add to this article.  But in the meantime, if you know a bit about SRM, and a bit about PowerCLI this should help you move in the right direction.

We all know that practice makes perfect.  The more often you install software the smoother it goes.  When I was a soldier the more often I practiced sneaking around in the dark, the more quiet I got.  I used to help customers practice DR drills and people improved with practice.  This is why SRM has built in the ability to test.  You can test very often and that means when something bad happens you can function still smoothly.

SRM provides automatically a bubble network during a test.  This means that a VM can talk to another VM in the test only if it is on the same host.  The bubble network is host specific and completely isolated.  Since most customers have a multiple of hosts this is an issue.  You solve that first issue by using a VLAN and creating a new vSS or vDS and using that network in the SRM config for test network.  Now your virtual machines on each host can talk to each other.  It is of critical importance that the virtual machines on the test network never talk to virtual machines outside of the test network (really bad things can happen if they do).  After all, the VM’s on the test network are copies of what is on the production network.

The next issue has to do with resources like Active Directory not being available on the isolated test network.  This means that users cannot log in to desktops or servers in the test to confirm things work.  We cannot have a production AD Domain Controller in the test network AND on the production network as the machines talking to it on the test network will be deleted and gone after the test is done.  How do we solve this?  Easy in fact.  We have a script that will shut down nicely a virtualized DC, and than clone it (and delete the clone if it exists already).  The script than changes the network on the clone to be the network for the test network.  Than both copies of the DC are powered up.  This means a ‘fresh’ copy of the DC is now on the test network and people can log in.  If this DC had DHCP and DNS than that takes care of that requirement for the test network.  DNS is easy but the DHCP is a little harder.  You may need to have a script on the desktop of that DC that checks to see if it is running in an SRM test and it enables the DHCP if so, but not running in a SRM test means that DHCP doesn’t run – this means DHCP is actually not running by default (this script runs at every boot of this VM).  This is pretty easy and simple so it is safe to have run each time the VM starts.

Now we add the cloneDC script to the start of the SRM test recovery or if you do a lot of testing you might just might have a scheduled task that executes the script every day or every couple of days.

So we now have current AD inside our private test network.  So people can log in with their current passwords.  Plus, we do not corrupt the production world since we delete the test network DC every now and again.  We have DNS and DHCP in the test network.  What are we missing?  Still several things.

We need desktops.  I helped one customer to have desktops in the test network in an interesting way.  They wanted their customers in a cafeteria to be able to do testing at lunch.  So we had a few laptops in the cafertiera with View clients on them.  They could use View to connect to a view server where the desktops were two legged – one in the test network, and one in the production network.  I had trouble making this work.  My first idea was to have the View server on both networks.  But had troubles and had to do the desktops.  There were some interesting work to do this.  A View professional could easily make this work, and Google could help too I think.  And yes, other VDI products could be used for this with no issue.

So we have everything we need to test Exchange, or SharePoint, or a variety of other things.  Oracle is common in these test cases.  But, don’t forget you can do patch testing in this, as well as DR testing!  Very much better to do patch testing in this environment rather than a typical lab.

There is one thing still missing – for some customers.  And it is sort of tough in a way.  You need some sort of resource in the test that you think is impossible.  Like Oracle running in a AIX world, or some sort of thing in a HP UX world.  Both are in fact not that hard to get into a test.  I was lucky in both cases for very smart people to help me.  It turns out both product have their own version of virtualization.  That means in both you can have a ‘slice’ of the machine.  It will have its own network – connected to the test network, and it will have some sort of copy from the other part of the machine.  So an in – memory kind of copy.  So you can copy a part of a database from the public side to the private side.  This will allow a test network application to talk to an AIX or HP UX application that is running in a LPAR or partition so you can do very realistic testing.

I know some of you will be unhappy with me for not doing a step by step with all of the information.  I do not have the resources to do that – although I would love too if I could.  But I have provided hints, and the fact it is possible, and for some of you I know that is all that is needed.

Have fun!  Test until you are confident you can do the same process or behaviour while you are breathing a hint of smoke, and than you are good.  That means when there is an earthquake, or something is burning, you can do the exact right thing to successfully fail over to the DR location.  I have the experience where I can say that adrenaline can really mess you up, and practice is what allows you to move through it was you should.  And smoke in your noise is definitely something cause’s you to get excited.  I had a sound of water in my ears before I had SRM in my life, in the datacenter, and I can assure you it does make your heart beat fast!

SRM is a very important product.  I know that VMware has completely and seriously dropped the ball on it.  The last number of versions have had almost no new features.  It seems that VMware – even when I was an employee and talked to people about it – they do not realize it is the stickiest product VMware has.  When people protect their financial or email with SRM, they are not likely to replace VMware with someone else.  Maybe enough of you who understand when I say SRM is important you can tell your Sales SE, or PSO guys, to tell VMware.  And tell them frequently, eventually they might learn.

Feel free to leave comments – I most appreciate that.


=== END ===

3 thoughts on “SRM Testing (and external resources)

  1. Great hints and fantastic points Michael. I do find previous versions of SRM sadly lacking in which version 5.5 helped fill some of them. However the biggest problem with the product is the lack of understanding around what it provides and the time required to keep it useful. Many organizations think you implement and you are protected and many others neglect to place it in the heart of the provisioning process and that regular testing is critical. Thanks again for the info.

  2. Michael – thanks! This is the final point in my testing process – the shutdown/clone/restart in test network is spot-on. Can you share the script you use to perform that process?

    1. Glad you liked the article. But the scripts I used – half a dozen times, came from the customer. i would not write them since I knew that was not as good as helping the the customer to write them. They are not too hard to do.

      Sorry not more help,


Leave a Reply