In the past, during my time at VMware, I ran VSAN in my home lab and in fact used a number of versions and it worked great. So lately when I needed more storage in my home lab, and I wanted quiet storage too, I bought a bunch of HD and SSD. So a fair bit of money for me. I got VSAN working and all my workloads vMotion or SvMotioned to it. I noticed I had some yellow in my VSAN health.
Plus while I passed two of the three proactive tests I did fail one.
It is an odd one as one host passes (79.27 MB/s) and one fails (4.88 MB/s). Of course, while writing this and redoing the test to confirm things, it turns out that test is now passed. But of course it is missing my third host. I have a 10 GB switch, and 10 GB network cards – more expense, but also I did not think I should have network issues!
But I continued on as VSAN worked for me previously and I was very excited to have all this storage, and likely even more excited to have VSAN in the lab again. And then it came crashing down on me – and you can read about it here. If I had checked the HCL I could have avoided this – maybe even avoid the big expenditure I cannot use! However, in checking the HCL now I find it is very confusing. The Dell PERC 6/i Adapter looks like it is supported even if it shows Warning in the Health app. It does connect to a VMware KB article that will help you find more info on firmware for your adapter – for Dell however it connects to generic web site and I cannot find any VSAN support info. However that is vSphere support. For VSAN support you need to go here, (you may need to toggle the mode by clicking on build your own, and select I/O controller) and it shows no support at all for the Dell PERC 6/i adapter at all. So I guess the warning in the VSAN Health should be error!
I cannot get three new controllers that are supported. Too much money. So I guess VSAN is now out of my lab. Not sure what to do with the extra HDD and SSD I have!
So while it is obvious, make sure you check the darn HCL before you buy or build! Even if you think you know better.
Very sad to see you go VSAN – may we meet again when I am better prepared!
- 8/18/16 – I deleted the diskgroups, and disabled VSAN (had to enable to delete the diskgroups) and reboot each host. Have I mentioned how much I like maintenance mode? After the three hosts in this cluster rebooted – without in fact the long delay on a VSAN message – they have now run the night. I logged in remotely via View this morning and all looks good. So it seems like VSAN was actually destabilizing my hosts. I guess that controller should not be on the HCL and should not say warning, but danger!
- 8/16/16 – I got everything off of VSAN – nothing lost. And so no virtual machines on it, roughly 12 hours later, maybe a little less and I have the issue again. The host gets disconnected. This time same as last time. So not sure, but it doesn’t seem like a VSAN issue in that there is no load. I also discovered this morning that there was a bad disk on this host so I removed it from the disk group on this host. And let it sit all day. Now it is broken again. I was about to put one VM on VSAN to see if that would work for longer than a day. But I guess not IO related since nothing was on VSAN! I have now disable VSAN and restart so far 2 host, and yet the problem came back. Maybe 3 or 4 hours later, and it is the same host. Very odd. And twice now the issue has repair itself in a few minutes. But watch a host restart I see more VSAN messages than I think I should. It slows down boot quite a lot. So I am going to delete the diskgroups.
=== END ===