Beware the long snapshot!
Note to self: If you make a snapshot on a production vm in your virtual infrastructure, don't keep it much longer than a day or two at the most.
Not sure how else to say this... but, oops. We have this mail server that we use for our ISP customers. It is running in our Vi. It seemed to perform quite well until we moved a couple thousand pop accounts to it. We could not figure out where the slow-down was. We added more memory, more priority (I bet you didn't know that priority was a resource!)... The memory helped a bit, but it was still sluggish.
It seemed like this could have been a case of one of those kind of servers that is not meant for consolidation. But, as a last ditch effort, we decided to add a second virtual processor. Before doing the deed we made a snapshot of the VM just in case things went badly. Everything went fine just as one would expect. Performance did improve, but not to the extent that made us change our minds about rephysicalizing (another new word). We thought we'd give it a month to settle down and look at some long term trends before taking the plunge back to physical from virtual.
After getting back from VMWorld 2006 we thought it would be a good idea to get our Vi up-to-date. Seems we were a little early in adopting Vi3. The newest patch (3.0.1 for ESX and 2.0.1 for VC) contained over 500 bug fixes, so I was told... and that this patch would greatly improve the overall performance of our virtual infrastructure. When it came time to VMotion this mail server off a host so we could upgrade the host, it gave an error stating something about there being an active snapshot... yeah, kinda forgot about that.
This is where the "Note-to-self" from above comes in. Apparently it is a bad idea to leave a snapshot in place for much longer than a day or two. We were running it for about two months. After a little discussion, we decided to delete the snapshot since it seemed that running on vSMP was ok and after all this time we were not going to revert back. Easy, right? sure.... till the task times out. The vmdk snapshot file for the mail-store drive had grown to about 35GB. When we deleted the snapshot, the 35GB file was locked and a new snapshot file was created and used until the 35GB of changes were incorporated back into the original 150GB vmdk. I guess on a very disk-busy drive, that takes a while. I paniced and called VMware. They said that it could take as long as 8 hours to finish. So we waited and hoped nothing crazy happened in the interim.
Two hours later, it was done and it finished without a hitch. The mail server was then VMotioned off and the host got its update applied.
Now that we've learned our lesson, the mail server is performing perfectly. The second processor was the answer but we did not realize the difference in performance because of the overhead of the too-long-lived-snapshot. So, in the end we learned that snapshots are short-term friends and we will not have to put our mail server back in the physical world. That leaves just a few servers to go before we've totally virtualized all our servers. Woo Hoo!
Not sure how else to say this... but, oops. We have this mail server that we use for our ISP customers. It is running in our Vi. It seemed to perform quite well until we moved a couple thousand pop accounts to it. We could not figure out where the slow-down was. We added more memory, more priority (I bet you didn't know that priority was a resource!)... The memory helped a bit, but it was still sluggish.
It seemed like this could have been a case of one of those kind of servers that is not meant for consolidation. But, as a last ditch effort, we decided to add a second virtual processor. Before doing the deed we made a snapshot of the VM just in case things went badly. Everything went fine just as one would expect. Performance did improve, but not to the extent that made us change our minds about rephysicalizing (another new word). We thought we'd give it a month to settle down and look at some long term trends before taking the plunge back to physical from virtual.
After getting back from VMWorld 2006 we thought it would be a good idea to get our Vi up-to-date. Seems we were a little early in adopting Vi3. The newest patch (3.0.1 for ESX and 2.0.1 for VC) contained over 500 bug fixes, so I was told... and that this patch would greatly improve the overall performance of our virtual infrastructure. When it came time to VMotion this mail server off a host so we could upgrade the host, it gave an error stating something about there being an active snapshot... yeah, kinda forgot about that.
This is where the "Note-to-self" from above comes in. Apparently it is a bad idea to leave a snapshot in place for much longer than a day or two. We were running it for about two months. After a little discussion, we decided to delete the snapshot since it seemed that running on vSMP was ok and after all this time we were not going to revert back. Easy, right? sure.... till the task times out. The vmdk snapshot file for the mail-store drive had grown to about 35GB. When we deleted the snapshot, the 35GB file was locked and a new snapshot file was created and used until the 35GB of changes were incorporated back into the original 150GB vmdk. I guess on a very disk-busy drive, that takes a while. I paniced and called VMware. They said that it could take as long as 8 hours to finish. So we waited and hoped nothing crazy happened in the interim.
Two hours later, it was done and it finished without a hitch. The mail server was then VMotioned off and the host got its update applied.
Now that we've learned our lesson, the mail server is performing perfectly. The second processor was the answer but we did not realize the difference in performance because of the overhead of the too-long-lived-snapshot. So, in the end we learned that snapshots are short-term friends and we will not have to put our mail server back in the physical world. That leaves just a few servers to go before we've totally virtualized all our servers. Woo Hoo!
Syndicated via RSS From: http://www.vmwarez.com/
Comments Off

