Summary
-Install VMware Tools
-Using Paravirtual disk controller
-Use VMXNET3 network interfaces
-Don’t use snapshots as long term backups
-One VM per vAPP
-Do not oversize CPU, RAM or disk
-Avoid creating ‘large’ disks
Are there specific best practices for my application?
VMware maintains a list of best practice documents specific to some applications, (search here). It is always good to review the specific best practices before getting started, but in my opinion the best practices defined here are generally valid for any application to be deployed.
Guest operating system
Install VMware Tools!
There is a lot to gain by having VMware Tools installed.
-Provides enhanced support for virtualized hardware optimized to interact with vSphere (VMXNET3, paravirtual controller, ballooning, etc).
-Provides data to the hypervisor about the virtual machine (VM) that is useful for diagnosing an incident.
For example, IP addresses and hostname of the virtual machine.
-Multiple integrations can be made at the vSphere level by having VMware Tools, e.g., HA heartbeat can be configured taking into account the state of VMware Tools.
Most general purpose operating systems support it (Windows, RedHat-based Linux, Debian-based Linux, etc), and the effort to install VMware Tools is usually minimal.
-Avoid having virtual disks larger than 1TB.
-At the platform management level, having such large virtual machines makes space maintenance, backups and recoveries of virtual machines more difficult.
-If that amount of space is necessary anyway, consider using technologies at guest system level, such as LVM for Linux.
-Yes, 1TB is an ‘arbitrary’ value, but it represents the size of a virtual disk large enough to not be able to meet the recovery time objective (RTO) of the service, and/or not be able to meet other SLAs of the service due to performance and capacity of the storage associated with it.
-Use the Paravirtual disk controller. Controller recommended by VMware, and that better manages IO and CPU resources.
RAM y CPU
-Do not allocate more RAM or CPU to the virtual machine than is strictly necessary for its workload.
-It is easy to add resources to a VM. Even in hot, using CPU or RAM HotAdd, but taking resources away from it usually requires a shutdown of the virtual machine or making configuration changes at the level of the applications installed in the virtual machine.
-Adding more CPU resources than necessary can be counterproductive to the performance of that virtual machine and all virtual machines associated with the ESXi host from which the virtual machine is running.
-Allocating more RAM or CPU to a virtual machine than a physical NUMA node negatively affects the performance of the virtual machine.
Network resources
-Use VMXNET3 type network interfaces.
-It has native integration with vSphere and is recommended by VMware.
-vSphere has a default limit of up to 10 NICs per virtual machine. This must be taken into account especially in the case of deploying virtual firewalls.
Virtual machine operation
Snapshots are not long-term backups!
Make a snapshot prior to a guest OS-level maneuver.
-Delete snapshots when:
-Many days have passed since validating that a change or a maneuver on the virtual machine’s operating system. The number of days is variable depending on the characteristics of the business, but usually 7 days is enough.
-The snapshot exceeds a predefined size. It depends on the characteristics of the virtual hardware, but in our experience with enterprise hardware 10GB is enough.
-A space alarm is generated at the datastore level where the virtual machine is hosted.
-Delete snapshots in a low load schedule for the virtual machine, with respect to its usual load.
-Especially important, if the virtual machine has a database, if possible, shut down the virtual machine before deleting the snapshot and consolidating it.
-If using vAPPs (for example, in vCloud Director), have only one virtual machine per vAPP.
-If using a virtual machine backup solution, it is possible that the solution backs up and restores at the vAPP level, so restoring one virtual machine may require restoring the entire vAPP.
-Turning off and on a vAPP affects all virtual machines within the vAPP.
-In vCloud Director, the task of cloning a vAPP requires cloning all virtual machines in the vAPP.
To conclude
Best practices are useful as a guide to improve our experience with technology, but sometimes business requirements dictate otherwise. In the case of vSphere and virtual machines it is no different.
In our experience, complying with these best practices can avoid incidents related to the availability and performance of the virtual machines where they are applied.
- vSphere 6.7 Performance Best Practices
- Reference Documentation for using vCloud Director
- VMware vSphere 6.7 Host Resources Deep Dive, Denneman and Hagoort
- VMware vSphere 6.7 Clustering Deep Dive, Denneman, Epping and Hagoort
- VMware vCloud Director 5.1 best performance practices
- vSphere Best Practices for Oracle Database
- VMware Techpapers, lists of best practices and other documentation can be found.