Skip to content

VMware vSphere

ZenPacks.zenoss.vSphere

This ZenPack provides support for monitoring VMware vSphere. Monitoring is performed through a vCenter server using the vSphere API.

This ZenPack supersedes an earlier ZenPack named ZenPacks.zenoss.ZenVMware. If you have ZenPacks.zenoss.ZenVMware installed on your system, please read the Transitioning from ZenVMware section below.

The features added by this ZenPack can be summarized as follows. They are each detailed further below.

  • Initial discovery and continual synchronization of relevant components.
  • Performance monitoring.
  • Event management.
  • Optional auto-discovery and monitoring of VM guest operating systems.
  • vSAN Cluster Health monitoring (on vSAN-enabled clusters only, not individual ESXi hosts)
  • Service impact and root cause analysis. (Requires Zenoss Service Dynamics)
  • Operational reports.

Applications Monitored: VMware vSphere (5.5, 6.0, 6.5, 6.7, 7.0, 8.0)

Commercial

This ZenPack is developed and supported by Zenoss Inc. Commercial ZenPacks are available to Zenoss commercial customers only. Contact Zenoss to request more information regarding this or any other ZenPacks. Click here to view all available Zenoss Commercial ZenPacks.

Releases

Version 4.2.0 - Download

Version 4.1.2 - Download

Compatibility

When monitoring vCenter with this zenpack, it is important to also verify the version of ESX/ESXi installed on the hosts it manages. The ESX hosts must be running a compatible version according to the VMware product Compatibility Matrix as well as one that has not reached End Of General Support in VMware's Lifecycle Matrix

In particular, ESX/ESXi 4.x is known to cause serious problems when monitored through a current version of vCenter.

We also recommend that monitored vSphere environments be deployed in accordance with VMware's recommendations and best practices, such as:

The vSphere ZenPack can be used to monitor VDI installations (VMware Horizon View, VMWare VDM) as they are considered regular vSphere deployments. VDI sources will not look any different from regular VMs.

Discovery

The following components will be automatically discovered through the vCenter address, username, and password you provide. The properties and relationships will be continually maintained by way of a persistent subscription to vSphere's updates API.

Datacenters

  • Attributes
    • Resource Pool Count
    • vApp Count
    • Cluster Count
    • Standalone Resource Count
    • Host Count
    • Virtual Machine Count
    • Network Count
    • dvPortgroup Count
    • dvSwitch Count
  • Relationships
    • Clusters and Standalone Compute Resources
    • Resource Pools
    • vApps
    • VMs
    • Hosts
    • Networks
    • dvSwitches
    • dvPortgroups
    • Datastores
    • Disk Groups
    • Cache Disks
    • Capacity Disks.

Clusters and Standalone Compute Resources

  • Attributes
    • Resource Pool Count
    • vApp Count
    • Effective Host Count
    • Host Count
    • CPU Core Count
    • CPU Thread Count
    • Effective CPU
    • Total CPU
    • Effective Memory
    • Total Memory
    • Overall Status
    • vSAN Cluster UUID
    • vSAN Disk Type
    • vSAN Version
  • Relationships
    • Datacenter
    • Host(s)
    • Resource Pools.

Resource Pools

  • Attributes
    • CPU Limit
    • CPU Reservation
    • Memory Limit
    • Memory Reservation
    • VM Count
    • Overall Status
  • Relationships
    • Datacenter
    • Cluster or Standalone Compute Resource
    • Parent Resource Pool
    • Child Resource Pools
    • vApps
    • VMs.

vApps

  • Attributes
    • CPU Limit
    • CPU Reservation
    • Memory Limit
    • Memory Reservation
    • VM Count
    • Overall Status
  • Relationships
    • Datacenter
    • Cluster or Standalone Compute Resource
    • Resource Pool
    • VMs.

VMs

  • Attributes
    • Guest Name
    • UUID
    • Is Template
    • CPU Count
    • Memory
    • Operating System Type
    • MAC Addresses
    • Overall Status
    • Power State
    • Connection State
    • VMware Tools Version
    • VMware Tools Status
    • Hardware Version
    • Committed Storage
    • Uncommitted Storage
    • IP
    • DNS Name
    • Inventory Path
    • Snapshots
    • Number of Snapshots
    • Policy Compliance Status
    • vSAN VM UUID
  • Relationships
    • Datacenter
    • Resource Pool or vApp
    • vNICs
    • Host
    • Datastores
    • Vdisks.

vNICs

  • Attributes
    • MAC Address
  • Relationships
    • VM
    • Network or dvPortgroup.

Hosts

  • Attributes
    • Overall Status
    • Connection State
    • Power State
    • Hostname
    • Management IP Addresses

Hosts

  • Attributes
    • Overall Status
    • Connection State
    • Power State
    • Hostname
    • Management IP Addresses
    • Total Memory
    • In Maintenance Mode
    • Hypervisor Version
    • Vendor
    • Model
    • UUID
    • CPU Model
    • CPU Speed
    • CPU Package Count
    • CPU Core Count
    • VM Count
    • Powered On VM Count
    • Suspended VM Count
    • Powered Off VM Count
    • vSAN Node UUID
    • vSAN Fault Domain
    • vSAN Disk Type
    • vSAN Health Status
    • vSAN Node State
    • vSAN Master State
  • Relationships
    • Datacenter
    • Cluster or Standalone Compute Resource
    • VMs
    • Host NICs
    • Datastores
    • LUNs
    • Disk Groups
    • Cache Disks
    • Capacity Disks

Host NICs

  • Attributes
    • MAC Address
    • Link Status
    • Link Speed
  • Relationships
    • Host

Networks

  • Attributes
    • Accessible?
    • IP Pool Name
    • VM Count
  • Relationships
    • Datacenter
    • Attached vNICs

dvSwitches

  • Attributes
    • UUID
    • Type
    • Host Count
  • Relationships
    • Datacenter
    • dvPortgroups

dvPortgroups

  • Attributes
    • Accessible?
    • IP Pool Name
    • VM Count
    • Key UUID
  • Relationships
    • Datacenter
    • dvSwitch
    • NSX Virtual Switch
    • Attached vNICs

Datastores

  • Attributes
    • Datastore Type
    • Capacity
    • Free Space
    • Uncommitted Space
    • URL
    • NAS Remote Host
    • NAS Remote Path
    • NAS Username
    • Local Filesystem Path
    • Snapshots
    • Number of Snapshots
  • Relationships
    • Datacenter
    • LUNs
    • Attached Hosts
    • Attached VMs
    • Storage Provider
    • Vdisks
    • Disk Groups

RDMs

  • Attributes
  • Relationships
    • VM
    • LUN
    • Host
    • Storage Provider

LUNs

  • Attributes
    • LUN Name
    • LUN Key
    • LUN ID
    • Operational Status
    • Type
    • Vendor
    • Model
  • Relationships
    • Host
    • Datastores
    • Storage Provider

Vdisks

  • Attributes
    • vDisk Id
    • File Name
    • Capacity
    • vSAN Disk Object
  • Relationships
    • Datastore
    • VM

Note

Vdisks are not modeled by default and automatically discovered only if zVSphereModelVdisks property is set to true.

Disk Groups (vSAN)

  • Attributes
    • Disk Group UUID
    • Disk Group Type
  • Relationships
    • Datacenter
    • Host
    • Datastore
    • Capacity Disks
    • Cache Disks

Note

Only for vSAN enabled configurations, with zVSphereModelVSAN set to true.

Capacity and Cache Disks (vSAN)

  • Attributes

    • Disk UUID
    • SSD
    • Capacity
    • Display Name
    • Disk Key
    • Vendor
    • Model
  • Relationships

    • Datacenter
    • Host
    • Disk Group

Note

Only for vSAN enabled configurations, with zVSphereModelVSAN set to true.

Note

Current version of vSphere ZP doesn't support Datastore Clusters discovery and monitoring.

Performance Monitoring

The following metrics will be collected every 5 minutes by default. Any other vSphere metrics can also be collected by adding them to the appropriate monitoring template.

Clusters

  • cpuUsagemhz: cpu.usagemhz.AVERAGE
  • cpuUsage: cpu.usage.AVERAGE
  • memConsumed: mem.consumed.AVERAGE
  • memBalloon: mem.vmmemctl.AVERAGE
  • memTotal: Cluster.totalMemory (Property)
  • effectiveMemory: Cluster.effectiveMemory (Modeled)
  • effectiveCpu: Cluster.effectiveCpu (Modeled)

Clusters (vSAN)

  • cpuUsagemhz: cpu.usagemhz.AVERAGE
  • cpuUsage: cpu.usage.AVERAGE
  • memConsumed: mem.consumed.AVERAGE
  • memBalloon: mem.vmmemctl.AVERAGE
  • memTotal: Cluster.totalMemory (Property)
  • vsanCapacity:
    • totalCapacity
    • freeCapacity
    • usedCapacity
    • usedCapacityPct
    • vmswapUsed
    • checksumOverheadUsed
    • vdiskUsed
    • fileSystemOverheadUsed
    • namespaceUsed
    • statsdbUsed
    • vmemUsed
    • otherUsed (Python)
  • vsanVersions (Python)
  • statsBackEnd:
    • oio
    • congestion
    • iopsRead
    • iopsRecWrite
    • iopsResyncRead
    • iopsWrite
    • latAvgResyncRead
    • latencyAvgRead
    • latencyAvgRecWrite
    • latencyAvgWrite
    • throughputRead
    • throughputRecWrite
    • throughputWrite
    • tputResyncRead (VSAN Performance)
  • statsFrontEnd:
    • congestion
    • oio
    • iopsRead
    • iopsWrite
    • latencyAvgRead
    • latencyAvgWrite
    • throughputRead
    • throughputWrite (VSAN Performance)
  • effectiveMemory: summary.effectiveMemory (Modeled)
  • effectiveCpu: summary.effectiveCpu (Modeled)

Datastores

  • datastoreCapacity: Datastore.capacity (Property)
  • datastoreUsed: Datastore.usedSpace (Property)
  • datastoreUsedPercent: Datastore.used_pct (Property) - disabled by default
  • datastoreUncommittedPercent: Datastore.uncommitted_pct (Property)
  • datastoreUncommitted: Datastore.uncommitted (Modeled)
  • datastoreFreeSpace: Datastore.freeSpace (Modeled)
  • datastoreRead: Datastore.numberReadAveraged.AVERAGE
  • datastoreWrite: Datastore.numberWriteAveraged.AVERAGE
  • snapshotsSize: Datastore.snapshots_size (Property)

Endpoints

  • perfCollectionTotalTime: perfCollectionTotalTime
  • perfCollectionRetryTime: perfCollectionRetryTime
  • perfCollectionRecoveryTime: perfCollectionRecoveryTime
  • perfCollectionTotalQueries: perfCollectionTotalQueries
  • perfCollectionRetryQueries: perfCollectionRetryQueries
  • perfCollectionRecoveryQueries: perfCollectionRecoveryQueries
  • perfCollectionTotalCount: perfCollectionTotalCount
  • perfCollectionDisabledCount: perfCollectionDisabledCount
  • perfCollectionTimeSinceLast: perfCollectionTimeSinceLast
  • perfCollectionQuerySUCCESS: perfCollectionQuerySUCCESS
  • perfCollectionQueryERROR: perfCollectionQueryERROR
  • perfCollectionQueryTIMEOUT: perfCollectionQueryTIMEOUT
  • perfCollectionQueryNO_DATA: perfCollectionQueryNO_DATA

Hosts

  • cpuReservedcapacity: cpu.reservedCapacity.AVERAGE
  • cpuUsage: cpu.usage.MAXIMUM
  • cpuUsagemhz: cpu.usagemhz.MAXIMUM
  • diskUsage: disk.usage.MAXIMUM
  • memUsage: mem.usage.MAXIMUM
  • memActive: mem.active.MAXIMUM
  • memBalloon: mem.vmmemctl.AVERAGE
  • memConsumed: mem.consumed.AVERAGE
  • memGranted: mem.granted.MAXIMUM
  • memSwapused: mem.swapused.MAXIMUM
  • memTotal: HostSystem.totalMemory (Property)
  • sysUpTime: sys.uptime.LATEST
  • vmCount: HostSystem.vm_count (Property)
  • vmPoweredCount: HostSystem.powered_vm_count (Property)
  • vmSuspendedCount: HostSystem.suspended_vm_count (Property)
  • vmUnpoweredCount: HostSystem.unpowered_vm_count (Property)
  • vsanNodeState (Python)

Hosts (vSAN)

  • cpuReservedcapacity: cpu.reservedCapacity.AVERAGE
  • cpuUsage: cpu.usage.MAXIMUM
  • cpuUsagemhz: cpu.usagemhz.MAXIMUM
  • diskUsage: disk.usage.MAXIMUM
  • memUsage: mem.usage.MAXIMUM
  • memActive: mem.active.MAXIMUM
  • memBalloon: mem.vmmemctl.AVERAGE
  • memConsumed: mem.consumed.AVERAGE
  • memGranted: mem.granted.MAXIMUM
  • memSwapused: mem.swapused.MAXIMUM
  • memTotal: HostSystem.totalMemory (Property)
  • sysUpTime: sys.uptime.LATEST
  • vmCount: HostSystem.vm_count (Property)
  • vmPoweredCount: HostSystem.powered_vm_count (Property)
  • vmSuspendedCount: HostSystem.suspended_vm_count (Property)
  • vmUnpoweredCount: HostSystem.unpowered_vm_count (Property)
  • vsanNodeState (Python)
  • statsBackEnd:
    • oio
    • congestion
    • readCongestion
    • writeCongestion
    • recWriteCongestion
    • resyncReadCongestion
    • iops
    • iopsRead
    • iopsRecWrite
    • iopsResyncRead
    • iopsWrite
    • latencyAvg
    • latencyAvgRead
    • latencyAvgWrite
    • latAvgResyncRead
    • latencyAvgRecWrite
    • throughput
    • throughputRead
    • throughputWrite
    • tputResyncRead (VSAN Performance)
  • statsFrontEnd:
    • oio
    • congestion
    • readCongestion
    • writeCongestion
    • iops
    • iopsRead
    • iopsWrite
    • latencyAvg
    • latencyAvgRead
    • latencyAvgWrite
    • throughput
    • throughputRead
    • throughputWrite (VSAN Performance)

LUNs

  • diskRead: disk.read.AVERAGE
  • diskReadRequests: disk.numberRead.SUMMATION
  • diskWrite: disk.write.AVERAGE
  • diskWriteRequests: disk.numberWrite.SUMMATION

Host Physical NICs

  • nicRx: net.received.AVERAGE
  • nicTx: net.transmitted.AVERAGE

Resource Pools

  • cpuUsagemhz: cpu.usagemhz.AVERAGE
  • cpuEntitlement: cpu.cpuentitlement.LATEST
  • memConsumed: mem.consumed.AVERAGE
  • memBalloon: mem.vmmemctl.AVERAGE
  • memOverhead: mem.overhead.AVERAGE
  • memEntitlement: mem.mementitlement.LATEST
  • memoryLimit: ResourcePool.memoryLimit (Modeled)
  • memoryReservation: ResourcePool.memoryReservation (Modeled)

VMs

  • cpuUsageAvg: cpu.usage.AVERAGE
  • cpuUsagemhz: cpu.usagemhz.AVERAGE
  • memUsage: mem.usage.AVERAGE
  • memConsumed: mem.consumed.AVERAGE
  • memOverhead: mem.overhead.AVERAGE
  • diskUsage: disk.usage.AVERAGE
  • datastoreReadLatency: datastore.totalReadLatency.AVERAGE
  • datastoreWriteLatency: datastore.totalWriteLatency.AVERAGE
  • netUsage: net.usage.AVERAGE
  • netReceived: net.received.AVERAGE
  • netTransmitted: net.transmitted.AVERAGE
  • snapshotsSize: VirtualMachine.snapshots_size (Property)
  • storageCommitted: VM.storageCommitted (Modeled)
  • storageUncommitted: VM.storageUncommitted (Modeled)

VMs (vSAN)

  • cpuUsageAvg: cpu.usage.AVERAGE
  • cpuUsagemhz: cpu.usagemhz.AVERAGE
  • memUsage: mem.usage.AVERAGE
  • memConsumed: mem.consumed.AVERAGE
  • memOverhead: mem.overhead.AVERAGE
  • diskUsage: disk.usage.AVERAGE
  • datastoreReadLatency: datastore.totalReadLatency.AVERAGE
  • datastoreWriteLatency: datastore.totalWriteLatency.AVERAGE
  • netUsage: net.usage.AVERAGE
  • netReceived: net.received.AVERAGE
  • netTransmitted: net.transmitted.AVERAGE
  • snapshotsSize: VirtualMachine.snapshots_size (Property)
  • stats:
    • iopsRead
    • iopsWrite
    • latencyRead
    • latencyWrite
    • throughputRead
    • throughputWrite (VSAN Performance)
  • vsanPoliciesCompliance: status (Python)
  • storageCommitted: VM.storageCommitted (Modeled)
  • storageUncommitted: VM.storageUncommitted (Modeled)

Vdisks

  • read: virtualDisk.read.AVERAGE
  • write: virtualDisk.write.AVERAGE
  • totalReadLatency: virtualDisk.totalReadLatency.AVERAGE
  • totalWriteLatency: virtualDisk.totalWriteLatency.AVERAGE

vSAN Cache Disks

  • iopsDevRead
  • iopsDevWrite
  • latencyDevRead
  • latencyDevWrite
  • latencyDevGAvg
  • latencyDevDAvg
  • throughputDevRead
  • throughputDevWrite (VSAN Performance)

vSAN Capacity Disks

  • capacityUsed
  • iopsDevRead
  • iopsDevWrite
  • iopsRead
  • iopsWrite
  • latencyDevRead
  • latencyDevWrite
  • latencyRead
  • latencyWrite
  • latencyDevGAvg
  • latencyDevDAvg
  • throughputDevRead
  • throughputDevWrite (VSAN Performance)

vSAN Disk Groups

  • capacity
  • capacityReserved
  • capacityUsed
  • wbSize
  • rcSize
  • compCongestion
  • iopsCongestion
  • slabCongestion
  • ssdCongestion
  • logCongestion
  • memCongestion
  • iopsRead
  • iopsWrite
  • iopsRcRead
  • iopsWbWrite
  • iopsWbRead
  • iopsRcWrite
  • latencyAvgRead
  • latencyAvgWrite
  • latencyRcRead
  • latencyWbWrite
  • latencyRcWrite
  • latencyWbRead
  • oioRecWrite
  • oioWrite
  • oioRecWriteSize
  • oioWriteSize
  • latencySched
  • latencySchedQueueMeta
  • latencySchedQueueNS
  • latencySchedQueueRec
  • latencySchedQueueVM
  • iopsSched
  • iopsSchedQueueMeta
  • iopsSchedQueueNS
  • iopsSchedQueueRec
  • iopsSchedQueueVM
  • throughputSched
  • throughputSchedQueueMeta
  • throughputSchedQueueNS
  • throughputSchedQueueRec
  • throughputSchedQueueVM
  • latResyncRead
  • latResyncReadDecom
  • latResyncReadFixComp
  • latResyncReadPolicy
  • latResyncReadRebalance
  • latResyncWrite
  • latResyncWriteDecom
  • latResyncWriteFixComp
  • latResyncWritePolicy
  • latResyncWriteRebalance
  • iopsResyncRead
  • iopsResyncReadDecom
  • iopsResyncReadFixComp
  • iopsResyncReadPolicy
  • iopsResyncReadRebalance
  • iopsResyncWrite
  • iopsResyncWriteDecom
  • iopsResyncWriteFixComp
  • iopsResyncWritePolicy
  • iopsResyncWriteRebalance
  • tputResyncRead
  • tputResyncReadDecom
  • tputResyncReadFixComp
  • tputResyncReadPolicy
  • tputResyncReadRebalance
  • tputResyncWrite
  • tputResyncWriteDecom
  • tputResyncWriteFixComp
  • tputResyncWritePolicy
  • tputResyncWriteRebalance
  • throughputRead
  • throughputWrite (VSAN Performance)

In addition, any other metric exposed by vSphere may be added to a Zenoss monitoring template. This must be done cautiously, however. It is critical to only add metrics that are applicable to the component that the template is bound to, and which are supported by the VMware instances you are monitoring.

Because Zenoss batches multiple performance queries together, adding an unsupported metric may cause unrelated performance graphs to break. The 'Test' function must be used to verify that any newly added datasource will work correctly.

Event Management

The following event classes and their subclasses will be continually collected and passed into the Zenoss event management system.

  • Alarm
  • Event
  • ExtendedEvent
  • EventEx

Various information encoded in these event classes will be used to automatically determine as best as possible the following Zenoss event fields.

Standard Zenoss Event Fields

  • device (set to VMware vSphere Endpoint device in the /vSphere device class)
  • component
  • summary
  • severity
  • eventClassKey (for mapping specific event types)
  • eventKey (for de-duplication and auto-clear fingerprinting)

Additional Event Fields

  • description

Events collected through this mechanism will be timestamped based on the time they occurred within vCenter. Not by the time at which they were collected.

Control Of Events Polling

There are two zProperties to control events polling:

  • zVSphereEventTypes is a list of event types to control which events are collected. If this zProperty is empty then all events will be collected from vSphere. May contain the following values:
    • Error
    • Warning
    • Information
    • User
  • zVSphereAlarmSeverities is a list of alarm severities (colors) to control which alarms are collected. If this zProperty is empty then all alarms will be collected from vSphere. May contain the following values:
    • GREEN
    • YELLOW
    • RED

Alarm severities are mapped to the event classes as follows:

  • GREEN — Clear
  • YELLOW — Warning
  • RED — Error

Guest Device Discovery

You can optionally configure each monitored vSphere account to attempt to discover and monitor the guest Linux or Windows operating systems running within each VM. This requires that your Zenoss system has the network and server access it needs to monitor the guest operating system.

The guest operating system devices' life-cycle optionally might be managed along with the VM. For example, the guest operating system device is set to a decommissioned production state when the VM is stopped, and the guest operating system device is deleted when the VM is destroyed.

Service Impact and Root Cause Analysis

When combined with the Zenoss Service Dynamics product, this ZenPack adds built-in service impact and root cause analysis capabilities for services running on VMware vSphere. The service impact relationships shown in the diagram and described below are automatically added. These will be included in any services that contain one or more of the explicitly mentioned components.

Internal Impact Relationships

  • vCenter access failure impacts all datacenters.
  • Datacenter failure impacts all related hosts, networks, dvSwitches, dvPortgroups, and datastores.
  • Host failure impacts the standalone compute resource and resident VMs.
  • Host NIC failure impacts the related host.
  • Network failure impacts related vNICs.
  • dvSwitch failure impacts related dvPortgroups.
  • dvPortgroup failure impacts related vNICs.
  • Datastore failure impacts attached VMs.
  • Cluster or standalone compute resource failure impacts related resource pools and vApps.
  • Resource pool failure impacts child resource pools, related vApps, and related VMs.
  • vApp failure impacts related VMs.
  • vNIC failure impacts the related VM.

External Impact Relationships

  • vNIC failure impacts guest operating system device's related NIC.
  • VM failure impacts guest operating system devices.
  • NAS file storage providers impact-related datastores.
  • SAN block storage providers impact-related datastores.
  • Resource pool failure impacts related vCloud provider and organizational VDCs.
  • VM failure impacts related vCloud VM.
  • Cisco UCS vNIC failure impacts related host NIC.
  • Cisco UCS service profile failure impacts related host.
  • VM failure impacts Cisco Nexus 1000V VSM running as a guest.
  • Host failure impacts related Cisco Nexus 1000V VEM.
  • Host NIC failure impacts related Cisco Nexus 1000V Ethernet uplink.
  • Cisco Nexus 1000V vEthernet impacts related vNIC.
  • Dell PowerEdge Physical Disk impacts related Datastore.

Most of the impacts described above follow the default policy of a node being in the worst state of the nodes that impact it. For example, a datacenter failure will imply that all related hosts are also failed. In some cases, this is not appropriate and custom policies are used.

Custom Impact Policies

  • vCenter access failure will only cause-related datacenters to be ATRISK because they're probably still functioning but may be unmanageable.
  • Host NIC failure will only imply a host DOWN if all of the host's NICs have failed. If a subset has failed the host will be implicitly ATRISK.
  • Host failure will only imply a cluster DOWN if all of the cluster's hosts have failed. If a subset has failed the cluster will be implicitly ATRISK.
  • vNIC failure will only imply a VM DOWN if all of the VM's vNICs have failed. If a subset has failed the VM will be implicitly ATRISK.
  • Datastore failure will only imply a VM DOWN if all of the VM's datastores have failed. If a subset has failed the VM will be implicitly DEGRADED.

Dynamic View on Device Level

To provide a high-level overview of modeled vSphere device, on device-level a separate Dynamic View diagram is provided. To keep the resulting diagram in reasonable size it includes only the next components:

  • vSphere Endpoint
  • Datacenters
  • Clusters
  • Hosts
  • Datastores
  • Resource Pools
  • dvSwitches

vSAN Cluster Health

vSAN has self-monitoring for a large number of configuration, performance, and availability issues. These tests are run periodically by vSphere, and the results are cached and made available. For vSphere devices with vSAN enabled clusters, this information is provided on the "vSAN Cluster Health" screen.

To view these results, choose a cluster from the combo box on the top, and cached results will be provided. When a test is selected, details will be available on the bottom part of the screen.

To run the health checks and update the results immediately, click the "Retest" button.

vSAN Objects

For vSAN Clusters and VMs located in them, there is a screen with information about vSAN objects related to components. This screen provides information about the object's UUID, Name, Storage Policy attached, and Compliance Status. Also for each vSAN object, its physical location is provided.

Operational Reports

The following operational reports are included with this ZenPack. They can be found in the vSphere report organizer.

Clusters: Shows all clusters, with the count of VMs (total and powered on), hosts, and CPU/Memory utilization within each cluster.

Datastores: Shows all datastores, with the number of connected VMs (total and powered on) and the disk space available and consumed on each datastore.

Hosts: Shows all hosts, with the count of VMs (total and powered on), hosts, and CPU/Memory reservation and utilization on each host.

LUNs: Shows all hosts and their LUNS, with LUN details.

Resource Pools: Shows all resource pools, their CPU and memory allocations, and usage.

VM to Datastore: Shows VMs mapped to datastores.

VMs: Shows all VMs, their operating system, CPU and memory utilization, snapshots size, and which host/cluster they reside within. VMs Snapshots are shown in the submenu as a list with size and date of creation.

VMware Utilization: Provides a summary of VMs, CPU, memory, and disk utilization over a specified time interval, broken down by the host.

Usage

Adding vSphere Endpoint

Use the following steps to start monitoring vSphere using the Zenoss web interface.

  1. Navigate to the Infrastructure page.
  2. Choose Add VMware vSphere Endpoint from the add device button.
  3. Fill out the form.
    • Name can be anything you want.
    • Hostname or IP must be resolvable and accessible from the collector server chosen in the Collector field.
    • Username and Password are the same as what you'd use in the vSphere client.
    • Port is the HTTP/HTTPS port to connect to (443 is normal if HTTPS is being used)
    • SSL should almost certainly be left enabled.
  4. Click ADD.

Unlike other device types, it is possible to add the same vSphere Endpoint multiple times with the same IP but under different device IDs, which will result in duplicate data collection.


Alternatively, you can use zenbatchload to add vSphere endpoints from the command line. To do this, you must create a file with contents similar to the following. Replace all values in angle brackets with your values minus the brackets. Multiple endpoints can be added under the same /Devices/vSphere or /Devices/vSphere/Subclass section.

/Devices/vSphere loader='VMware vSphere', loader_arg_keys=['title', 'hostname', 'username', 'password', 'ssl', 'collector'] vcenter1 hostname='<address>', username='<username>', password='<password>'

You can then load the endpoint(s) with the following command.

zenbatchload filename

Monitoring ESX Host Preparation

If you are using VMWare NSX to provide software-defined networking functionality in your vSphere environment, each of your ESX hosts must be properly prepared and configured, in order to utilize NSX. This feature requires that you have both NSX and vSphere ZenPacks installed on your Zenoss instance.

In monitoring ESX host preparation, Zenoss must communicate directly with the ESX host, rather than through the vCenter endpoint. This requires additional configuration, detailed in the following steps.

  1. In the Device Detail page for your vSphere endpoint, click on Configuration Properties to edit the zProperties for this device.
  2. Click in the empty box under Name to search for zProperties by name.
  3. Find the property named zVSphereHostSystemUser and set it to the username you use to access your ESX hosts via ssh and API.
  4. Find the property named zVSphereHostSystemPassword and set it to the password you use to access your ESX hosts via ssh and API.
  5. Find the property named zVSphereHostCollectionClusterWhitelist. The whitelist filters the hosts which are monitored based on cluster names. You have several options for whitelist entries.
    • You can enter the full name of each cluster whose hosts you wish to monitor.
    • You can enter a partial name that matches one or multiple clusters.
    • You can enter a regular expression that matches one or multiple clusters.
    • You can enter a * as a wildcard to match every cluster attached to the vSphere endpoint.
  6. Once these values are set, if your hosts meet the criteria described below, the zenvsphere demon will begin shortly, and you will see additional data fields populated in your host details section.

The zVSphereHostCollectionClusterWhitelist is a list property that accepts multiple entries. The patterns entered are used in an OR pattern, rather than an AND pattern. That is to say, if a cluster matches ANY pattern, it will be monitored. This means that if you use the wildcard option to match all clusters, there is no reason to enter anything else in the zProperty. Remember also, this whitelist is empty by default. Thus, until values are entered here no hosts will be monitored.

The host preparation monitoring tracks the following values:

  • VXLAN VIB Install Date
  • VXLAN VIB Version
  • VSIP VIB Install Date
  • VSIP VIB Version
  • DVFilter Switch Security VIB Install Date
  • DVFilter Switch Security VIB Version
  • VDS Config Present

This monitoring is based on recommendations for troubleshooting connectivity on ESX hosts with NSX: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2107951

The checks recommended in this article are performed by the host collection, and an event is generated if the checks fail, alerting your team and allowing them to take steps to resolve the issue.

When collecting data from ESX hosts, certain assumptions are made about your host configuration:

  • SOAP API is enabled and accessible
  • API traffic is transmitted over SSL
  • API traffic is handled on the default SSL port 443
  • The ESX host's hostname is valid and routable from the Zenoss collector
  • All ESX hosts managed by a given vCenter have the same username and password as each other

If any of these statements is not correct for your host, host data collection will not be possible, and this feature will not be used for that host.

Configuring Guest Device Discovery

Use the following steps to configure VM guest device discovery. Guest device discovery must be configured individually for each vSphere endpoint.

  1. Navigate to one of the vSphere endpoints.
  2. Navigate to the Configuration Properties panel and in zVSphereDiscoverLinuxDeviceClass and zVSphereDiscoverWindowsDeviceClass zProperties set device class for Linux and Windows VMs.
  3. Verify that appropriate SSH, SNMP, or Windows credentials are configured for the chosen device class(es).
  4. Populate zVSphereDiscoverPaths zProperty with RegEx patterns matches VM's inventory paths you want to discover. By default, it's empty, which means no guest devices will be discovered. To discover all guest devices you may use .* pattern. Inventory path is entity location on the inventory in the form /My Folder/My Datacenter/vm/Discovered VM/VM1. You may get it on the VM's Details page.
  5. In the Zenoss UI, find the gear icon menu in the bottom left corner of the window. Under this menu, click the option labeled "Discover Guest Devices". This will schedule a job for immediate execution, which will run the discovery process for each VM.

NOTE: VMWare Tools are required to be installed on VMs user wants to discover in order to obtain their IP addresses.

Zenoss monitors changes to VMs (adding/removing/IP or inventory path changing) and will create or remove the linked guest devices if needed. The guest device will be removed only if it was created by ZenPack automatically and zVSphereRemoveGuests the property is set to True.

Reasons a Guest Device Fails to be Discovered

Several criteria must be met in order for a guest device to be discovered by the vSphere ZenPack. Those requirements are as follows:

  • The VM must have inventory path matches zVSphereDiscoverPaths configuration property.
  • Guest device classes must be defined. See the zVSphereDiscoverLinuxDeviceClass and zVSphereDiscoverWindowsDeviceClass configuration properties.
  • The guest must have a valid IP, which could be only provided by VMWare Tools installed inside VM.

Configuring Auto Change of the Production State for Guest Devices

You can disable auto change of the production state for VMs, for this purpose you have to:

  1. Click on the Infrastructure tab.
  2. Select discovered VM or the appropriate device classes, in case you want to change the behavior for a group of underlying VMs.
  3. Navigate to the Configuration Properties panel.
  4. Change the zVSphereAutoChangeProdState property (default is true).

By default, the production state is changed to 'Production' (1000) for running VMs, and to 'Decommissioned' (-1) for stopped ones. These states may be customized by specifying the desired production state IDs (numbers) in zVSphereAutoChangeProdStateRunning and zVSphereAutoChangeProdStateStopped.

Controlling which VMs are Modeled

By default, all Virtual Machines managed by the vSphere endpoint are modeled. Optionally, you may exclude certain VMs you do not wish Zenoss to model through the use of the zVSphereModelExcludePaths and zVSphereModelIncludePaths zProperties.

These work similarly to zVSphereDiscoverPaths (described above). Each property contains one or more regular expressions (one per line), which are matched against the inventory path of every VM.

Inventory paths are of the form /My Folder/My Datacenter/vm/Discovered VM/VM1. For any VM that is already modeled by Zenoss, the inventory path is shown in its Details page. In addition, in the dialogs for zVSphereModelExcludePaths and zVSphereModelIncludePaths, the Test the button shows all VMs that are present on the endpoint, their inventory path, and whether they would be modeled based on the current settings of zVSphereModelExcludePaths and zVSphereModelIncludePaths.

When the modeler matches these patterns, it first checks zVSphereModelExcludePaths against the path, and then zVSphereModelIncludePaths. This means that zVSphereModelExcludePaths must be set before zVSphereModelIncludePaths can be useful.

Examples:

  • Model only VMs with prod in their name
    • exclude: .*
    • include: ^/(.+/)[^/]*prod[^/]*$
  • Model only VMs in a specific datacenter
    • exclude: .*
    • include: ^/My Datacenter/
  • Exclude a specific folder of VMs
    • exclude: ^/My Datacenter/vm/Discovered
    • include: (leave blank)
  • Exclude VMs with test in their name, unless they are in the 'prod' datacenter or have prod in their name
    • exclude: ^/(.+/)[^/]*test[^/]*$
    • include: ^/Prod Datacenter/
    • include: ^/(.+/)[^/]*prod[^/]*$
  • Exclude all VMs except those with prod anywhere in their name, datacenter, folder, or VM name included.
    • exclude: .*
    • include: prod

It is recommended to test any settings using the 'Test' button of the zProperty dialog carefully, to avoid accidentally excluding VMs you do not wish to model.

Transitioning from ZenVMware

If you are installing this ZenPack on an existing Zenoss system or upgrading from an earlier Zenoss version you may have a ZenPack named ZenPacks.zenoss.ZenVMware is already installed on your system. You can check this by navigating to Advanced -> ZenPacks.

This ZenPack functionally supersedes ZenPacks.zenoss.ZenVMware, but does not automatically migrate monitoring of your VMware vSphere resources when installed. The ZenPacks can coexist gracefully to allow you time to manually transition monitoring to the newer ZenPack with better capabilities.

Depending on how heavily loaded your vCenter and Zenoss server(s) are you may wish to avoid monitoring the same vSphere resources twice in parallel. If this is the case, you should use the following instructions to first remove the existing vSphere monitoring before adding the new monitoring.

  1. Navigate to the Infrastructure page.
  2. Expand the VMware device class.
  3. Single-click to select a vCenter.
  4. Click the delete (-) button in the bottom-left.
  5. Click OK to confirm deleting the vCenter.
  6. Add the vCenter back using the Adding vSphere Endpoint instructions above. Be sure to select Add VMware vSphere Endpoint and not Add VMware Infrastructure.
  7. Repeat steps 3-6 for each vCenter.

If you're comfortable monitoring the vCenters twice in parallel for a time, you can simply follow the instructions under Adding vSphere Endpoint then delete the old vCenters from the /VMware device class once you're satisfied with the new monitoring.

Installed Items

Installing this ZenPack will add the following items to your Zenoss system.

Configuration Properties

  • zVSphereAlarmSeverities: List of alarm severities (colors) to control which alarms are collected.

    • Default value: ["RED", "YELLOW", "GREEN"]
  • zVSphereAutoChangeProdState: Whether to change production states for guest devices when VM's state changes.

    • Default value: True
  • zVSphereAutoChangeProdStateRunning : Production state to use for guest devices in running state.

    • Default value: 1000
  • zVSphereAutoChangeProdStateStopped: Production state to use for guest devices in a stopped state.

    • Default value: -1
  • zVSphereDiscoverLinuxDeviceClass: Device Class to use for discovered Linux VMs.

  • zVSphereDiscoverPaths: Create guest devices only for VM with inventory path matches one of the provided RegExes. If not set, discovering is disabled.

  • zVSphereDiscoverWindowsDeviceClass: Device Class to use for discovered Windows VMs.

  • zVSphereEndpointHost: Hostname or IP used to connect to vSphere Endpoint.

  • zVSphereEndpointPassword: Password used to connect to vSphere Endpoint.

  • zVSphereEndpointPort: Port used to connect to vSphere Endpoint.

    • Default value: 443
  • zVSphereEndpointUser: User used to connect to vSphere Endpoint.

    • Default value: admin
  • zVSphereEndpointUseSsl: Whether to use SSL when connecting to vSphere Endpoint.

    • Default value: True
  • zVSphereEventTypes: List of event types to control which events are collected.

    • Default value: ["info", "warning", "error", "user"]
  • zVSphereHostCollectionClusterWhitelist: The whitelist filters the hosts which are monitored based on cluster names.

  • zVSphereHostPingBlacklist: List of regular expressions to control which management IP address to ping (matches against hostname:nicname:ip). Note: zVSphereHostPingWhitelist takes precedence over zVSphereHostPingBlacklist.

  • zVSphereHostPingWhitelist: List of regular expressions to control which management IP address to ping (matches against hostname:nicname:ip). Note: zVSphereHostPingWhitelist takes precedence over zVSphereHostPingBlacklist.

  • zVSphereHostSystemPassword: Password used to access ESX hosts via ssh and API.

  • zVSphereHostSystemUser: Username used to access ESX hosts via ssh and API.

  • zVSphereLUNContextMetric : Whether to use LUN-specific metric names when storing performance data. Note: Changing value will cause historical metrics to become inaccessible.

    • Default value: False
  • zVSphereModelAdmSoftLimit: Advanced tuning parameter, consult Zenoss support.

    • Default value: 500
  • zVSphereModelCache: List of regular expressions to configure caching (change detection) for a specific set of properties.

  • zVSphereModelExcludePaths: List of regular expressions (matched against VM inventory path) to control which VMs are NOT modeled. Note: zVSphereModelIncludePaths takes precedence over zVSphereModelExcludePaths.

  • zVSphereModelIgnore: List of regular expressions to suppress collection of attributes matched (See Tuning Modeling).

  • zVSphereModelIncludePaths: List of regular expressions (matched against VM inventory path) to control which VMs are modeled. Note: zVSphereModelIncludePaths takes precedence over zVSphereModelExcludePaths.

  • zVSphereModelMpIndexObjs : Advanced tuning parameter, consult Zenoss support.

    • Default value: 50
  • zVSphereModelMpLevel: Advanced tuning parameter, consult Zenoss support.

    • Default value: 0
  • zVSphereModelMpObjs: Advanced tuning parameter, consult Zenoss support.

    • Default value: 250
  • zVSphereModelVdisks: Enable modeling of Vdisks components.

    • Default value: False
  • zVSphereModelVSAN: Enable modeling of vSAN components.

    • Default value: True
  • zVSpherePerfDelayCollectionMinutes: How long in minutes to lag performance data collection (See Performance Data Collection: Tuning).

  • zVSpherePerfHealthFailsThreshold: Generate an event when performance collection fails percentage exceeds a specified threshold.

    • Default value: 10
  • zVSpherePerfHealthTimeoutsThreshold: Generate an event when performance collection timeouts count exceeds a specified threshold.

    • Default value: 50
  • zVSpherePerfMaxAgeMinutes: An upper limit on how far back Zenoss will query vSphere for performance data.

    • Default value: 28
  • zVSpherePerfParallelQueries: How many queries to execute against vSphere in parallel.

    • Default value: 8
  • zVSpherePerfQueryChunkSize: The maximum number of metrics to include in each query for managed objects stored on the ESX hosts (Hosts, LUNs, pNICs, VMs).

    • Default value: 250
  • zVSpherePerfQueryRaw20: Whether to actually use the 'raw data feed' query option for the managed object types it is normally used for: Hosts, LUNs, pNICs, VMs.

    • Default value: True
  • zVSpherePerfQueryTimeout : How long to allow each API call to respond.

    • Default value: 25
  • zVSpherePerfQueryVcChunkSize: The maximum number of metrics to include in each query for managed objects stored in the vSphere database (Clusters, Datastores, ResourcePools).

    • Default value: 64
  • zVSpherePerfQueryVcRaw20: Whether to actually use the 'raw data feed' query option for the managed object types is not normally used for: Datacenters, Clusters, Resource Pools.

    • Default value: False
  • zVSpherePerfRecoveryMinutes: When an error occurs in querying a specific metric, it is blacklisted for this number of minutes before it is retried.

    • Default value: 20
  • zVSpherePerfTimeoutRecoveryMinutes: When a timeout error occurs in querying a specific metric, it is "blacklisted" for this number of minutes before it is retried. This avoids repeated errors due to attempts to query a metric that is not working properly. The default is fairly long (1 hour), and it can be safely lowered if desired or if gaps appear in the graphs.

    • Default value: 60
  • zVSphereRemoveGuests: Remove discovered guest device if linked VM is removed on vSphere.

    • Default value: False
  • zVSphereVMContextMetric: Whether to use VM-specific metric names when storing performance data. Note: Changing value will cause historical metrics to become inaccessible.

    • Default value: False
  • zVSphereVSANRetriggerAlarms: Controls re-triggering of VSAN alarms based on failing health checks. One of "all" (check all alarms), "current" (only check alarms for health checks that are currently failing), or "never" (do not re-trigger alarms).

    • Default value: all

Device Classes

  • /vSphere

Modeler Plugins

  • vmware.vSphere

Datasource Types

  • VMware vSphere
  • Python vSphere
  • VMware vSphere Modeled
  • vSphere VSAN Performance

Monitoring Templates

  • Endpoint (in /vSphere)
  • Cluster (in /vSphere)
  • Cluster-VSAN (in /vSphere)
  • Datacenter (in /vSphere)
  • Datastore (in /vSphere)
  • Datastore-ESX (in /vSphere)
  • dvPortgroup (in /vSphere)
  • dvSwitch (in /vSphere)
  • Host (in /vSphere)
  • Host-VSAN (in /vSphere)
  • LUN (in /vSphere)
  • Network (in /vSphere)
  • pNIC (in /vSphere)
  • RDM (in /vSphere)
  • ResourcePool (in /vSphere)
  • ResourcePool-ESX (in /vSphere)
  • StandaloneResource (in /vSphere)
  • vApp (in /vSphere)
  • VM (in /vSphere)
  • VM-VSAN (in /vSphere)
  • VDisk (in /vSphere)
  • vNIC (in /vSphere)
  • vSANDisk (in /vSphere)
  • vSANCacheDisk (in /vSphere)
  • vSANCapacityDisk (in /vSphere)
  • vSANDiskGroup (in /vSphere)

Event Classes

  • /vSphere
  • /Perf/vSphere
  • /Perf/vSphere/Utilization

Collector Daemons

  • zenvsphere

Troubleshooting

If any issues are encountered with the functionality offered by this ZenPack, the following checklist should be followed to verify that all configurations are correct.

Configuration Checklist

  1. Verify that Zenoss has been fully restarted after the ZenPack was installed.
  2. Verify that the endpoint's address, username, and password are correct. Check the zVSphereEndpointHost, zVSphereEndpointUser and zVSphereEndpointPassword configuration properties for the endpoint device. See Login Credentials below.
  3. Verify that the Zenoss collector server to which the endpoint device is assigned has network connectivity through any firewalls to the endpoint address on port 443/TCP (HTTPS).
  4. Verify that the zenvsphere daemon is running on the collector to which the endpoint device is assigned.
  5. Check the logs. See Logging below.
  6. Kill zenvmware-client subprocess. See Java Subprocess below.
  7. Tune configuration properties according to Performance Data Collection: Tuning instructions below if necessary.

Modeling

As of version 3.3.0, modeling for vSphere devices is performed continuously by the zenvsphere collector daemon. Because the modeling is handled by a daemon, it may not be performed immediately upon adding a new vSphere endpoint. This does not necessarily mean anything is wrong. Wait a few minutes and see if model data begins to populate.

If you do not see any data after waiting a few minutes, check the logs to see if there were any errors encountered during modeling, see Logging.

One common problem when adding a new vSphere endpoint is incorrect login details. In the logs, you should see something like this:

ERROR zen.PythonClient: Authentication error: [com.vmware.vim25.InvalidLoginFaultMsg] Cannot complete login due to an incorrect user name or password. Check `zVSphereEndpointUser` and `zVSphereEndpointPassword`.

If this occurs, you should verify and re-enter your vSphere login credentials, see Login Credentials.

If you see any other errors, consult the included documentation, this wiki, Zenoss Support, and whatever other resources you have at your disposal.

Normally, it should not be necessary to restart zenvsphere since any model updates will be received automatically from vSphere. But if you deem it necessary to restart the collector daemon, you can do so, see Daemon Restart. Please note the caveats outlined in that section before proceeding.

Daemon Restart

The vSphere ZenPack uses its own service or daemon which runs continuously in the background to collect modeling data, events, and performance data about your vSphere environment. Running as a separate service allows for greater flexibility in handling some constraints of the vSphere architecture and allows for greater performance.

Under normal circumstances, you should never need to restart the daemon. It automatically pulls in data about new devices and components, even changing zProperties and other configuration values. Additionally, this will cause a significant delay in data collection. Restarting the collector daemon closes the open session with the vSphere API, which means the collector has to start over the modeling data collection process from the very beginning, rather than pulling incremental updates. In a smaller environment, this may not be a serious issue, but in complex vSphere environments, with a lot of components, this could be very time-consuming. Thus, if you restart the daemon in such an environment, expect to wait a while.

However, if the daemon is clearly malfunctioning, or you have otherwise determined it is necessary to force remodeling, you can restart the collector daemon using the steps described below.

If you are running Zenoss 5.0 or greater, the zenvsphere daemon runs in its own Docker container, managed by Control Center. To restart it, follow these steps:

  1. Open the Zenoss UI.
  2. Click the Advanced tab, and then select Control Center.
  3. Find 'zenvsphere' in the list and click the restart button.

If you are running an older version of Zenoss than 5.0, there is no Control Center and there are no Docker containers. Instead, follow these steps:

  1. Open the Zenoss UI.
  2. Click the Advanced tab, then go to Settings -> Daemons.
  3. Find the 'zenvsphere' daemon and click the restart button.

Login Credentials

  1. Find your vSphere device in the Infrastructure section of the Zenoss UI.
  2. Go to the Configuration Properties section.
  3. Enter 'vSphereEndpoint' in the Name filter and hit Enter.
  4. You should see a list of configuration properties, including zVSphereEndpointHost, zVSphereEndpointUser and zVSphereEndpointPassword.
  5. Double-click an entry to edit its value.
  6. Ensure the user, host, password, or any other values are correct. Specifically, make sure that you can use the vSphere client with this same information.

NTP

An often-overlooked Zenoss deployment issue is the synchronization of time between data sources and data collectors, especially when the collected data is timestamped at the source. It is not unusual for the hardware clocks in computers to drift by seconds per day, accumulating to significant error in just a week or two.

The Zenoss data collector for vSphere will adjust for time zone and clock skew differences every 2 hours (by comparing the server current time with the collector's) so even without NTP, events and graphs will have correct timestamps in Zenoss. If the clocks are not synchronized, however, the timestamps shown in Zenoss and vSphere for the same event will not match, and this can cause considerable confusion.

  • Be sure that both Zenoss collector and ESX endpoint are configured for automatic time correction using NTP.
  • To further reduce confusion, it is recommended that the collector and endpoint be configured for the same time zone.

Logging

On Zenoss 4, the log file can be found at $ZENHOME/log/<collectorname>/zenvsphere.log. On Zenoss 5, the log file is inside the zenvsphere container, and may be accessed as follows:

  1. serviced service attach zenvsphere
  2. A log can be found at /opt/zenoss/log/zenvsphere.log

By default, only INFO and higher priority messages will be logged. To temporarily increase the logging level to include DEBUG messages you can run zenvsphere debug as the Zenoss user without restarting the daemon. The next time it restarts, logging will resume at the preconfigured level. Alternatively, you can run zenvsphere debug again to return logging to the preconfigured level.

On Zenoss 5, you must attach to the zenvsphere container before running zenvsphere debug, or alternatively, you may skip that step by using the command serviced service debug zenvsphere

Note that if you have multiple collectors under Zenoss 5, you will need to specify the specific zenvsphere service ID rather than zenvsphere in the serviced commands above.

Java Subprocess

The zenvsphere daemon spawns a java process named zenvmware-client as needed to communicate with vSphere endpoints. It may be possible that this java process encounters a problem that zenvsphere is unable to detect or recover from. If you're experiencing issues you can run pkill -9 -f zenvmware-client as the Zenoss user to kill the java process. It will be automatically restarted.

On Zenoss 5, the java process runs within each zenvsphere container, so you may attach it to the container (serviced service attach zenvsphere) and kill the process, or you may restart the entire service (which will restart both zenvsphere and the java subprocess) with the command serviced service restart zenvsphere. (If you have multiple zenvsphere containers, you will need to specify the service ID)

Performance Data Collection: Health Report

For support purposes, an overall health report is generated for each vSphere device, indicating the elapsed time and any relevant errors encountered during the last polling cycle for that device.

This report may be accessed by visiting the device in the Zenoss UI, and changing the URL from

[...]/zport/dmd/Devices/vSphere/devices/[device id]/devicedetail[...]

To

[...]/zport/dmd/Devices/vSphere/devices/[device id]/health

Detailed interpretation of this report is best done by a Zenoss support engineer, however, the header will show the "Last valid data collected" time and "Last perf collection task total elapsed time" fields.

If the last time is not recent, or the elapsed time is greater than 300 seconds, this may indicate that tuning is required.

Some key metrics from the health report are now available in the device graphs. Simply navigate to your device in the web interface and click Graphs in the left navigation bar.

Performance Data Collection: maxQueryMetrics and zVSpherePerfQueryVcChunkSize

In vSphere 5.5U2d and 6.x, a limitation has been introduced by VMware on how many performance metrics may be queried at a time. Since Zenoss performs bulk queries of all monitored metrics every 5 minutes, it can run into this limit.

If it does, the query is automatically split in half and retried, so there should not be any functional problem, but it is less efficient and can crowd the zenvsphere.log file with errors. (These errors will mention querySpec.size (5.5) or vpxd.stats.maxQueryMetrics (6.0))

The limit normally defaults to 64 on current versions of vSphere, and Zenoss has a zproperty (zVSpherePerfQueryVcChunkSize) which should be set to the same, or lower, value to avoid these errors. This zProperty defaults to 64 as well, so it is not normally necessary to adjust it unless you see errors like those described above.

For more details on this limit and how to change it, see https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2107096

Performance Data Collection: Gaps in Resource Pool Graphs

If you are having problems with Resource Pool graphs specifically, with the following symptoms:

  • Gaps in one or more graphs on resource pools
  • No data in the "CPU Entitlement" graph
  • No data in the "Consumed" and "Overhead" lines of the "Memory Usage" graph
  • Errors referring to querySpec.size (5.5) or vpxd.stats.maxQueryMetrics (6.0) in the Performance Health report for some resource pools.

This problem has been observed in at least some versions of vSphere 5.5 and 6.0.

The root causes of these three specific datapoints (memOverhead, cpuEntitlement, and memEntitlement) on Resource Pools, vSphere does not properly calculate the number of metrics being queried.

For example, in vSphere, each Cluster has a top-level resource pool that contains all resources (sub-pools, and ultimately VMs) in the cluster. If we query for a single metric on that top-level pool, say, memEntitlement, instead of being counted as one metric, it is counted as one per VM in the entire cluster. That is, vSphere is expanding our one query into a bunch of internal queries, and counting them all against the maxQueryMetrics limit described above. In practice, this means that in any cluster that has more than 64 VMs, these three metrics will not be available on that top-level resource pool, because there is no way to query them without them being counted as more than 64 metrics.

The same problem applies to all resource pools that contain, directly or through sub-pools, more than 64 VMs.

The only workaround is to disable the vpxd.stats.maxQueryMetrics setting on the vSphere side, as described above, or to raise it to a very high number (under current versions of the zenpack, it would need to be a bit more than 3 times the number of VMs in the cluster - future versions will subdivide these queries further so that it only needs to be the number of VMs)

Note that there will also be gaps in the "CPU Usage" graph because the Zenoss collector will stop trying to collect any metrics from a given resource pool after it encounters these errors, for 20 minutes by default (configurable via zVSpherePerfRecoveryMinutes). It will then try again, and fail again. Occasional data may get through due to the way Zenoss batches its queries, which can cause partial data to show up on affected resource pools.

If you need these graphs to work on these pools, but are unable to raise or disable vpxd.stats.maxQueryMetrics, you may disable the memOverhead, cpuEntitlement, and memEntitlement datapoints in the ResourcePool monitoring template. This should avoid the problem.

Performance Data Collection - Tuning

As of version 3.3, the performance collection mechanism of the vSphere ZenPack is more optimized and self-tuning and generally will not require adjustment. This is primarily due to the fact that the time window of data polled in each cycle is now dynamic, and is based on what data has already been collected. Therefore, rather than tuning the specific window as was done in previous versions of the zenpack using the (no longer supported) zVSpherePerfWindowSize option, the collector will automatically collect as much data as is available since the last time it collected each metric.

This, in combination with other changes, means that the collector is better able to compensate for large vCenter installations or slow response times.

However, a number of configuration variables are still available for advanced tuning:

  • zVSpherePerfDelayCollectionMinutes: If set, will cause collection to be delayed by the specified number of minutes. This may be useful if the most recent data value for each counter is untrustworthy, for instance. The collector will already handle a known case where the most recent data value may show up as a 0 erroneously without the use of this option, so it would only be used if some other similar, but unanticipated problem occurred. This is a very safe option to experiment with, with the only side effect being delayed data on the graphs.
  • zVSpherePerfMaxAgeMinutes: An upper limit on how far back Zenoss will query vSphere for performance data. It is highly recommended not to change this value, as it reflects VMware's recommendation to only query within the last 30 minutes.
  • zVSpherePerfRecoveryMinutes: When an error occurs in querying a specific metric, it is "blacklisted" for this number of minutes before it is retried. This avoids repeated errors due to attempts to query a metric that is not working properly. The default is to wait 20 minutes, and it may be lowered if desired. Lowering it is not particularly beneficial though, since the data that was missed during this gap will normally be retroactively filled in if it starts working again within that 20-minute window.
  • zVSpherePerfTimeoutRecoveryMinutes: When a timeout error occurs in querying a specific metric, it is "blacklisted" for this number of minutes before it is retried. This avoids repeated errors due to attempts to query a metric that is not working properly. The default is fairly long (1 hour), and it can be safely lowered if desired or if gaps appear in the graphs.
  • zVSpherePerfParallelQueries: How many queries to execute against vCenter in parallel. The default is 8, but it may be lowered if this is putting excessive load on vCenter, or raised if doing so increases throughput.
  • zVSpherePerfQueryTimeout: How long to allow each API call to respond. When a timeout or other error occurs, the metrics in that query will be divided in half and each half retried, and if a specific metric persistently times out or produces an error, it will be blacklisted as described under zVSpherePerfRecoveryMinutes. The default value for this property is 25 seconds.
  • zVSpherePerfQueryChunkSize: The maximum number of metrics to include in each query for managed objects for which we normally use the raw data feed query option (data stored on the ESX hosts): Hosts, LUNs, pNICs, VMs
  • zVSpherePerfQueryVcChunkSize: The maximum number of metrics to include in each query for managed objects for which we normally use the 5-minute query option (data stored in the vCenter database): Clusters, Datastores, ResourcePools. The default is 64, but it may be lowered or raised, depending on the needs of the environment. The collector automatically adjusts the number of metrics to request by splitting the request in half if the query fails. However, if the request consistently fails, starting with a lower chunk size will allow the collector to skip these adjustments, reducing the total number of queries required.
  • zVSpherePerfQueryRaw20: Whether to actually use the raw data feed query option for the managed object types it is normally used for: Hosts, LUNs, pNICs, VMs. Setting this to False will cause the less efficient 5-minute query option to be used, but could be useful if there is a problem with the raw data feed.
  • zVSpherePerfQueryVcRaw20: Whether to actually use the raw data feed query option for the managed object types is not normally used for: Datacenters, Clusters, Resource Pools. It is not recommended to change this value from its default of False.

When adjusting the chunk sizes, there may be tradeoffs to values that are too large or too small. On systems with large numbers of datastores and resource pools or vApps, it may be beneficial to raise zVSpherePerfQueryVcChunkSize. However, if vSphere 6.x is being used, zVSpherePerfQueryVcChunkSize must be 64 or less. This restriction can be lifted if vpxd.stats.maxQueryMetrics, an advanced vCenter configuration property, is also adjusted, as described at http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2107096

NOTE: As of version 3.3 of the vSphere ZenPack, the zVSpherePerfWindowSize configuration property is no longer used, and the meaning zVSpherePerfQueryChunkSize has been changed slightly. It used to describe a number of query specifications (groups of metrics on a specific entity), and now it describes the number of metrics instead. This change was made to align its meaning with that of vpxd.stats.maxQueryMetrics.

Tuning Modeling

Normally, the set of objects and attributes to be modeled and how they should be processed is predefined in this ZenPack. However, a limited amount of tuning is possible via the properties zVSphereModelIgnore and zVSphereModelCache.

zVSphereModelIgnore is used to suppress the collection of an attribute or attributes, perhaps because it changes too frequently or is not of interest in a particular environment. Its value is a list of regular expressions which must match against a string of the format ":" For example, "VirtualMachine:summary.guest.toolsStatus" would be matched by "toolsStatus" or "VirtualMachine.*toolsStatus".

NOTE: These are values are VMware classes and properties, not Zenoss classes and properties. They are generally similar, but there are exceptions. For a list of allowed values, consult $ZENHOME/ZenPacks/ZenPacks.zenoss.vSphere*/ZenPacks/zenoss/vSphere/gather_config.json

zVSphereModelCache is used to configure caching (change detection) for a specific set of properties. This is also defined as a set of regular expressions, only this time, the value is <zenoss class name>:<zenoss property/method name>. It is especially rare that you would need to use this configuration option. It is specifically intended for situations where issues with the vSphere data model or API cause use to be repeatedly notified about a property that has not actually changed its value. When caching is turned on for the affected attribute, the collector will keep track of the prior value of the property, and will only notify zenhub if the value actually changes. This saves unnecessary load on zenhub. A list of properties that need this functionality is built into the collector, and this configuration option is only used to add additional properties to the list, should a new one be found. There is generally no benefit to using this caching feature where it is not needed, but no major downside either, other than increased memory usage on the collector. Should you find a property where this caching is needed, please notify Zenoss so that we may make this behavior the default in future versions of this ZenPack.

Advanced tuning of modeling to accommodate the addition of very large vSphere instances to Zenoss, which may otherwise time out during their initial modeling may be performed using the zVSphereModelMpLevel, zVSphereModelMpObjs, zVSphereModelMpIndexObjs, and zVSphereModelAdmSoftLimit variables. At this time we do not recommend that these values be changed without assistance from Zenoss Support.

ESX Host Ping Monitoring

By default, the management IP reported by VMware for each host will be pinged. This may be disabled completely by setting zPingMonitorIgnore to true.

In some situations, a host may have multiple management IPs. In this case, the default is for the first one to be pinged. (according to the order that the physical NICs are reported for that host in the VMware API). If this default is not acceptable, the zVSphereHostPingBlacklist property may be used to filter out undesired IP addresses.

Its value is a list of regular expressions which must match against a string of the format <hostname>:<pnic name>:<ip address>.

For example esx1.zenoss.com:vmk0:10.0.2.3 could be matched by esx1.zenoss.com, :vmk0, 10\\.0\\.2\\.3, or other patterns.

This can be used to filter out specific NICs (:vmk0:), subnets (:10\\.0\\.2), or hosts (esx1.zenoss.com)

zVSphereHostPingBlacklist maybe combined with zVSphereHostPingWhitelist to create exceptions, for instance, one could set zVSphereHostPingBlacklist to ignore vmk1 on all hosts, but then zVSphereHostPingWhitelist to make an exception fo a specific hostname or subnet.

One useful way to combine these is to set the zVSphereHostPingBlacklist to .\* (that is, disable all ping monitoring), and then specifically enable your management subnets in zVSphereHostPingWhitelist, one pattern for each subnet.

Note that since these are regular expressions, any . in the IP address should be escaped with a backslash, as shown above. If this is not done, the . will match any character.

Zenoss Analytics

This ZenPack provides additional support for Zenoss Analytics. Perform the following steps to install extra reporting resources into Zenoss Analytics after installing the ZenPack.

  1. Copy vsphere-analytics.zip from $ZENHOME/ZenPacks/ZenPacks.zenoss.vSphere*/ZenPacks/zenoss/vSphere/analytics/ on your Zenoss server.
  2. Navigate to Zenoss Analytics in your browser.
  3. Log in as a superuser.
  4. Remove any existing vSphere ZenPack folder.
    1. Choose Repository from the View menu at the top of the page.
    2. Expand Public in the list of folders.
    3. Right-click on vSphere ZenPack folder and choose Delete.
    4. Confirm the deletion by clicking OK.
  5. Add the new vSphere ZenPack folder.
    1. Choose Server Settings from the "Manage" menu at the top of the page.
    2. Choose Import on the left page.
    3. Remove checks from all checkboxes.
    4. Click Choose File to import a data file.
    5. Choose the vsphere-analytics.zip file copied from your Zenoss server.
    6. Click Import.

You can now navigate back to the vSphere ZenPack folder in the repository to see the resources added by the bundle.

The vSphere Domain can be used to create ad hoc views using the following steps.

  1. Choose Ad Hoc View from the Create menu.
  2. Click Domains at the top of the data chooser dialog.
  3. Expand Public then vSphere ZenPack.
  4. Choose the vSphere Domain domain

Known issues

SSL Errors with Certain VMware versions

Resource Manager 5.2.x and higher may be unable to monitor certain versions of ESXi 5.5 directly (vCenter seems to be unaffected). This is believed to be a compatibility issue between the java 1.8 JVM used by Zenoss and the version of OpenSSL included in these versions of ESXi. If this problem is occurring, the error HTTP transport error: javax.net.ssl.SSLHandshakeException: Remote host closed connection during handshake will be found in the zenvsphere.log.

To correct it, either update ESXi to 5.5 build 4722766 or later, or edit the zenvsphere.conf file and set the following option:

vmargs -Dhttps.protocols=TLSv1

Note that setting this option, may have negative side effects, so it should only be used when necessary. In particular, it will prevent zenvsphere from being able to collect to newer versions of VMware, which have disabled TLSv1 support in favor of TLSv1.1 or TLSv1.2. See https://kb.vmware.com/s/article/2145796 for a detailed breakdown of which versions of VMware have had TLSv1 support disabled. There are options (described in the KB article) to re-enable it if required.

A similar issue may occur with Zenoss 4.2.5 through 5.1.x when attempting to monitor a version of VMware newer than 6.5. Again, it will manifest as an HTTP transport error: javax.net.ssl.SSLHandshakeException: Remote host closed connection during handshake error. This occurs because VMware is now requiring the use of TLSv1.2, which is not enabled by default in these older versions of Zenoss. To correct this, set the following option in zenvsphere.conf:

vmargs -Dhttps.protocols=TLSv1,TLSv1.1,TLSv1.2

Note that this requires the use of Java 7, which may not be installed by default on RM 4.2.5 systems. You can check the installed version of java by running java -version on the Zenoss collector.

Analytics

Analytics stores history for devices and components. In some cases, this may cause reports to show 2x/3x/… larger data. Turning off "deleted dimensions" may help.

To change this setting permanently in the data warehouse to not keep deleted dimensions around you can do the following on your analytics server:

  • mysql -u root reporting -e “update meta_setting set setting_value = 1 where setting_name = ‘remove_deleted_dimensions’;”

  • Wait 24 hours for the next MODEL extract and load to take place and account for this change in configuration.

If you want to test this out before making the setting permanent to see its impact in your environment before you make the change permanent, you can execute the following command on your analytics server:

mysql -u root reporting -e "call remove_deleted_dimensions;"

Note that you will also need to "blow out" all the jaspersoft ad hoc caches that cache results of queries to see the impact of changing the data in the database in your view.

To do this:

  • Log into analytics as Internal Authentication->superuser/superuser and navigate to Manage->Server Settings->Ad Hoc Cache.
  • Click on the "Clear all" button at the bottom of the screen.
  • Load up the Datastore View again and confirm the removal of deleted dimensions has corrected the issues with the "duplication" of your data.

Upgrades from 3.5.x

When upgrading from 3.5.x to a newer version, a message such as the following may be displayed:

ERROR Monitoring template /vSphere/LUN has been modified since the
ZenPacks.zenoss.vSphere ZenPack was installed. These local changes will
be lost as this ZenPack is upgraded or reinstalled. The existing
template will be renamed to 'LUN-upgrade-1484782049'. Please review and
reconcile local changes

If the only difference shown is the 'component' field on the diskReadRequests datasource and a change in the ordering of some other properties on the template, this may be disregarded, and the LUN-upgrade- template may be deleted if desired.

In addition, as described in the Resource Manager Administration Guide, all Zenoss services except zeneventserver should be stopped prior to installing or upgrading any zenpack. This is particularly *critical* with this upgrade. Failure to stop the services may cause migration steps required for this upgrade to fail.

Upgrading from any previous version to 3.7.x or 4.0.x

Installing 3.7.x or 4.0.x of this zenpack involves a migration step that adds a new relationship to all existing vSphere devices. This can take several minutes, and if a model update comes in during this time, a database conflict will occur and cause the zenpack installation to start over, potentially multiple times. On large enough systems, it may never be able to complete without a conflict.

It is therefore recommended to ensure that the zenvsphere service is shut down prior to upgrading this zenpack, and only restart it once the upgrade is complete. This will eliminate the issue and speed up the upgrade process.

When upgrading to these versions, the Snapshots submenu, "Discover Guest Devices" option, and some vSAN related screens may be not available. If this issue is encountered, restart the zproxy service (by restarting) top-level Zenoss.resmgr application in Control Center. It is not necessary to restart the child services.

Configuring JAVA heap size

By default, the zenvsphere daemon does not set a heap size. The service definition assumes 1G, but often this is exceeded because the daemon allows the JVM to choose a default heapsize based on the total RAM.

In case of need, this value may be tuned in /opt/zenoss/etc/zenvsphere.conf by uncommenting javaheap option and setting it to the proper value (in MB). To determine the current heap one should run java -XX:+PrintFlagsFinal -version 2>&1 \| grep MaxHeapSize command in zenvsphere container.

Error vmodl.fault.MethodNotFound

If this error is coming from vSAN and there is no vSAN on the vSphere device, setting zVSphereModelVSAN to False should resolve this issue.

Changes

4.2.0

  • Improved error handling for vSphere modeling (ZPS-7464)
  • Fixed modeling of highly scalable VMWare systems (ZPS-7836)
  • Improved error handling for VSAN datasources (ZPS-8084)
  • Added support for VMware vSphere 8 (ZPS-8641)
  • Tested with Zenoss 6.7, Zenoss Cloud and Service Impact 5.6.0.

4.1.2

  • Fixed modeling of large environments (ZPS-7961)
  • Fixed modeling for cases when totalMemory was unchanged (ZPS-7689)
  • Fixed modeling of VirtualAHCIControllers (ZPS-6759)
  • Fixed Alarm severity mapping (ZPS-7518)
  • Fixed processing of com.vmware.vc.StatelessAlarmTriggeredEvent (ZPS-7505)
  • Tested with Zenoss 6.6.0, Zenoss Cloud and Service Impact 5.5.5.

4.1.1

  • Resolve issue with huge datamaps being produced causing failures in zenhub (ZPS-7130)
  • Fix issue when vSphere can sometimes show 100% usage for all datastore components (ZPS-7170)
  • Tested with Zenoss Cloud, Zenoss 6.5.0, and Service Impact 5.5.3.

4.1.0

  • Fix issue with linking existing guest devices (ZPS-5523)
  • Return datastoreUsedPercent datasource (ZPS-7112)
  • Better control of polling events from vConsole (ZPS-7166)
  • Add new values to the CallHome report to calculate Nutanix VMs on vSphere (ZPS-7293)
  • Support of monitoring vSphere 7 (ZPS-7296)
  • Add support for Java 11 JVM (ZEN-32976)
  • Tested with Zenoss Cloud, Zenoss 6.5.0, and Service Impact 5.5.3.

4.0.3

  • Fix modeling failure of vDisk if control is of NVME type (ZPS-6595)
  • Fix traceback during vsphere_gather operation with apiVersion '6.7.2' (ZPS-6594)
  • Fix modeling failure caused by atypical VSAN configuration (ZPS-6526)
  • Tested with Zenoss Resource Manager 6.4.1, Zenoss Cloud and Service Impact 5.5.1

4.0.2

  • Fix invalid threshold for memory capacity on vSphere hosts (ZPS-4979)
  • Changing zVSphereModelVSAN property also switch monitoring templates (ZPS-5216)
  • Add support for NSX OpaqueNetworks (ZPS-5076)
  • Fix "vmodl.fault.MethodNotFound" error from vSAN monitoring against vCenter v6.0 (ZPS-5545)
  • Handle a situation where VMware Tools return a guest IP address with spaces in it (ZPS-5536)
  • Fix sporadic "'NoneType' is not iterable" errors caused by model-monitored datapoints whose values are not updated frequently. (ZPS-5067)
  • Migrate Property datasources to vSphere Modeled in local copies of monitoring templates. (ZPS-5628)
  • Tested with Zenoss Resource Manager 6.3.2, Zenoss Cloud

4.0.1

  • Improve vSAN 6.2 version detection, add zVSphereModelVSAN property (ZPS-4810)

4.0.0

  • vSAN support (Disk Groups, Cache and Capacity disk modeling, and monitoring, vSAN Cluster Health, vSAN Objects, Compliance status)
  • Lower default configsipsize from 25 to 5 in zenvsphere.conf in order to facilitate pulling large configs from ZenHub. (ZPS-798)
  • Changed thresholds related to performance to not affect availability in Service Impact (ZPS-3702). This changes events from being in "/vSphere/Utilization" to be in the "/Perf/vSphere/Utilization" class
  • Ensure consistent treatment of CPU Usage (%) in thresholds and Analytics (ZPS-3589)
  • Renamed ResourcePool threshold High CPU Usage to "High Load" (ZPS-3589)
  • Plot ResourcePool Memory Overhead and Memory Consumption on separate graphs (ZPS-369)
  • Added migration script for upgrade-related catalog index issues (ZPS-3922)
  • Removed transform that was introduced in 3.7.x on the /vSphere event class (ZPS-4306)
  • Fix NotInDatacenterException error when permissions are restricted on a folder without propagate to children (ZPS-4232)
  • Add impact relations between vSphere vSAN Datastores and Dell PowerEdge Physical Disks. (ZPS-4518)
  • Ignore blank lines in zVSphereModelIgnore (ZPS-1423)
  • Add Test button to zVSphereHostPingWhitelist/zVSphereHostPingBlacklist zProperties (ZPS-3464)
  • Reports do not automatically run when selected (ZPS-4172)
  • Added an authoritative list of event types to monitor ESX directly (ZPS-3695)
  • Fixed clearing for hbr.primary.NoConnectionToHbrServerEvent events (ZPS-4019)
  • Allowed to put devices in sub-classes of /vSphere while initially adding them (ZPS-3996)
  • Reduce excessive CPU and memory consumption by zenvsphere container (ZPS-643)
  • Tested with Zenoss Resource Manager 5.3.3, Zenoss Resource Manager 6.2.1, Zenoss Cloud and Service Impact 5.3.3

3.7.2

  • Reduce memory usage when used with the Layer2 ZenPack (ZPS-2172)
  • Tested with Zenoss Resource Manager 6.1.2, Zenoss Resource Manager 4.2.5 RPS 743, Zenoss Cloud and Service Impact 5.3.0

3.7.1

  • vSphere Report organizer not present with a fresh install (ZPS-3727)
  • Fix sporadic modeling failures related to empty snapshots data (ZPS-3586)
  • Fix KeyError that could sometimes occur during performance collection when a VM was removed or its monitoring turned off. (ZPS-3820)
  • Tested with Zenoss Resource Manager 5.3.3 and 6.1.2, Zenoss Resource Manager 4.2.5 RPS 743, and Service Impact 5.3.0

3.7.0

  • Support for vSphere 6.7
  • Upgraded from ZenPackLib v1 to ZenPackLib v2. Note: ZenPackLib ZenPack is a new prerequisite, and must be installed separately.
  • Automatically add vSphere Guest OS devices (optionally) (ZPS-340)
  • Added Dynamic View (overview) (ZPS-2227)
  • Disable event collection from hosts that are in maintenance mode.
  • Added monitoring of Snapshots (ZPS-351)
  • Added (optional) modeling and monitoring of Vdisks
  • Added additional perf collection device-level graphs
  • Fix Datastore to LUN mapping when datastore use local storage (ZPS-323)
  • Add capacity thresholds to Hosts, pNICs, Datastores, and Clusters (ZPS-2694)
  • Add zVSphereModelIncludePaths/zVSphereModelExcludePaths to control which VMs are modeled
  • When new VMs are discovered, disable monitoring for the discovered VM if the host or cluster it is in has its monitoring turned off.
  • Fix potential file descriptor leak in zenvsphere (ZPS-3435)
  • Fix situation where config changes may not be properly cached in zenvsphere. (ZPS-3427)
  • Performance improvements and bugfixes in vsphere_queryperf tool
  • Tested with Zenoss Resource Manager 5.3.3 and 6.1.2, Zenoss Resource Manager 4.2.5 RPS 743, and Service Impact 5.3.0

3.6.3

  • Fix issue that can cause duplicated components (ZPS-1614)
  • Include vSphere devices in standard device tables in analytics (ZPS-1501)

3.6.2

  • Fix zVSphereLUNContextMetric NameError in zenhub.log. (ZPS-1184)

3.6.1

  • Support new contextMetric capability in Zenoss 5.2.3. (ZEN-27004)

3.6.0

  • Support for vSphere 6.5
  • Support monitoring of disk usage on datastores on ESX endpoints (ZPS-504). ZenPacks.zenoss.PropertyMonitor >= 1.1.1 is required for this. 1.1.0 is sufficient for vCenter endpoints.
  • Alarms missing descriptions ("Alarm_alarm-123 status is red") on vSphere 6.0 targets (ZPS-397)
  • Fix modeling of certain datastores on ESX endpoints (ZPS-512)
  • Modeler suppresses objects which should not be modeled, including references from other objects. Support added for "effectiveRole" based filtering to exclude VMs from the model if the user has the NoAccess role on the host or cluster that the VM is hosted on. Fix suppression-related issues for LUNs and VM Templates. (ZEN-22233, ZEN-21904, ZEN-25884, ZEN-25625)
  • Added zVSphereHostPingWhitelist and zVSphereHostPingBlacklist to control the selection of management IPs to ping (ZPS-335)
  • Updated to zenpacklib 1.1.2.
  • Fix modeling errors with VMs that are inside nested resource pools within vApps (ZPS-387)
  • Suppress events when vSphere host is in maintenance mode (by setting Host and VM productionState accordingly) (ZPS-332)
  • Clarify resource pool counts on Datacenter components (ZPS-389)
  • Remove broken folder links from component details (ZPS-396)
  • Fix error while displaying VMs report (ZPS-337)
  • Fix calculation of missedRuns in zenvsphere by not counting event and modeler tasks (they are run constantly, and can never be "late") (ZPS-399)
  • Fix improperly rounded "Total" value on "Performance Data Collection Counters" graph (ZPS-392)
  • [Analytics] Count vSphere devices in the overall device inventory (add to dim_device) (ZPS-398)

3.5.3

  • Fix TypeError while adding vSphere hypervisor (ZEN-26197)

3.5.1

  • Fix RRD file paths on Zenoss 4.x (ZEN-25807)

3.5.0

  • General
    • Converted zenpack to use zenpacklib. Small UI differences and improved functionality throughout.
    • Hide password in the "Add Multiple Endpoint" dialog. (ZEN-25387)
    • Improve performance of "Resource Pool" and "Hosts and VMs" reports (ZEN-21039)
    • Documented analytics behavior and workaround for deleted dimensions (ZEN-24333)
    • Add HostSystem->Datastore DV/impact relation, only in the case where a datastore is stored on a local disk on that host. (ZEN-23166)
    • Add labels and descriptions to vSphere zProperties
    • Change severity of incorrect username or password in the model log from WARNING to ERROR and update message (ZEN-21003)
    • Add additional filtering options to all reports (ZEN-23157)
    • Allow addition of vSphere devices under subclasses of /Devices/vSphere (ZEN-22015)
    • Improvements to performance of component grid with large numbers of components. (componentCatalogs and linkable optimizations)
    • Disable ping monitoring based on the 'monitored' setting for host components. (ZEN-22206)
    • Fix CPU/Mem Reservation fields display. (ZEN-21112)
    • Add labels and descriptions to vSphere zProperties. (ZEN-21278)
  • Modeling
    • Add advanced tuning of modeling transactions via zVSphereModelMpLevel, zVSphereModelMpObjs, zVSphereModelMpIndexObjs.
    • Reduce likelihood ConflictError during modeling of large vSphere devices. (ZEN-25488)
    • Add support for java 8 JVM (ZEN-25482)
    • Fix units for memEntitlement graph (ZEN-25460)
    • Extend 'Model Time' field to show also "Waiting For Modeling" (ZEN-21012)
  • Event Processing
    • Ensure that repeat count increases on open alarms until they are closed so that events do not clear in Zenoss prematurely. (ZEN-25342)
    • Auto clear internal heartbeat events on restore (ZEN-24144)
    • Fix bug where events may be timestamped in UTC time (ZEN-22724)
    • Fix misleading ping-down message on vSphere hosts. (ZEN-22203)
  • Performance data collection
    • Changed recovery behavior- recover non-timeout errors more frequently. (20 minutes, rather than 4 hours). Place a limit on how much time will be spent on recovery per task, leaving remaining entries for a future polling cycle. This allows more frequent recovery to occur without risking a serious impact on the overall polling cycle time.
    • Lowered default query timeout (zVSpherePerfQueryTimeout) from 200 seconds to 25. Failures due to timeout are retried as smaller queries anyway, and a faster failure is good for overall collection time.
    • Change default value of zVSpherePerfParallelQueries from 6 to 8, to improve overall query performance, especially on large targets.
    • Group all metrics being polled through the raw data feed by the host and cluster to which they relate, when possible. Hash the hosts over a set of 6 sub-tasks, and collect each one separately. This keeps a bad host from slowing down all the collection tasks randomly. (Any such slowdown is restricted to 1/6 of the data being collected)
    • Add extensive details to the health report, with an emphasis on grouping errors by host/cluster as described above. Raise events when a particular host or cluster is showing large numbers of errors. Calculate average response times, and distinguish between transient and permanent errors in these reports where appropriate.
    • Graph high-level performance collection metrics at the device level (# of counters polled, retries vs normal queries, total elapsed time by task type, etc)
  • Impact
    • When a ping failure event is found for a host, consider the host to be DOWN, rather than ATRISK, regardless of powerState/connnectionState values (ZEN-24612)

3.4.1

  • Support multiple –query options in vsphere_queryperf tool
  • Improved component grid performance with Zenoss 5.1.4+
  • Use epoll reactor in zenvsphere to support >1024 descriptors.
  • Fix off-by-one in chunk size calculation (ZEN-22805)
  • Monitor ESX host preparation for NSX networking (ZEN-21157)
  • Remove Dynamic View component from navigation, since it is not fully supported (ZEN-16277)

3.4.0

  • vSphere Devices can no longer be modeled by zenmodeler or the "Model Device" UI option. Modeling is performed automatically by zenvsphere when new devices are added. The "Model Time" timestamp will reflect the last time a full model was done. These are only performed when needed when zenvsphere is restarted or the connection is lost to vsphere. Otherwise, regular ongoing changes are applied continuously, updating the "Last Change" timestamp only.
  • Performance data collection was rewritten to improve its scalability.
  • Remove explicit host -> impacts -> cluster impact relationship. (ZEN-20722)
  • Prevent VM templates from being modeled. (ZEN-20724)
  • Add "Provider" links for datastores, RDMs and LUNs. (ZEN-18029)
  • Ensure that modeler optimization is used when possible (ZEN-10279, ZEN-19701)
  • Enabled "monitored" checkbox for vSphere components (ZEN-19125)
  • Fixed sorting for numeric columns in reports (ZEN-9023)
  • Fix for problem on Zenoss 5.0.x where some metric data is lost in large vCenter environments (ZEN-20039)
  • User Session events (vim.event.AlreadyAuthenticatedSessionEvent, vim.event.BadUsernameSessionEvent, vim.event.NoAccessUserEvent, vim.event.ServerStartedSessionEvent, vim.event.SessionTerminatedEvent, vim.event.UserLoginSessionEvent, vim.event.UserLogoutSessionEvent) are now sent directly to history.
  • Default severity of several events (vim.event.AlarmEmailFailedEvent, vim.event.AlarmScriptFailedEvent, vim.event.AlarmSnmpFailedEvent, vim.event.BadUsernameSessionEvent, vim.event.NoAccessUserEvent) is lowered to Info.
  • ESX failure events (esx.net.connectivity, esx.net.dvport.connectivity, esx.net.dvport.redundancy, esx.net.redundancy, esx.net.vmnic.linkstate, esx.scsi.device.io.latency, esx.scsi.device.state, esx.scsi.device.state.permanentloss, esx.storage.connectivity, esx.storage.redundancy) are now mapped so that their corresponding "clear" events will clear the event in Zenoss.
  • Enable ping monitoring of vSphere Hosts on their management IPs (ZEN-8412)
  • Allow use of nonstandard HTTP/HTTPS ports for vSphere endpoint (ZEN-15110)
  • Added requirement for PropertyMonitor 1.0.1 (ZEN-20688)
  • Moved "VM to Datastore" report from Enterprise Reports section to vSphere section (ZEN-16153)
  • Removed unused "Software" component from the navigation on Device Detail (ZEN-20531)
  • Updated legend for "Memory Usage" graph in Clusters for clarity (ZEN-16422)
  • Removed tracebacks for authentication errors when hitting vSphere API, because if it's a login error, we know what's wrong and don't need a traceback (ZEN-18760)
  • Added "LUNs" report

3.2.1

  • Fix for CiscoUCS startup issue (ZEN-18814)

3.2.0

  • Improve reliability of modeling of vSphere alarm events (model pre-existing alarms at startup and apply updates to alarms based on change events).
  • Make alarm events more easily filtered, with an eventGroup of vim.event.AlarmEvent and an eventClassKey of the vSphere alarm systemName for vmware's predefined events, or userdefined.<id> for user defined alarms.
  • Remove the "overall status" column and stop collecting this value from vSphere. Compute equivalent value based upon the most severe active alarm on the element.
  • Impact now relies upon active events (alarms) and ignores the "overall status". Undesired alarms may now be more easily suppressed or transformed to improve impact view, and the root cause will reflect the responsible alarms. (ZEN-16728)
  • Add model collection tuning options (zVSphereModelIgnore, zVSphereModelCache)
  • Reduce zenhub load by suppressing duplicate (unnecessary) setDiskNames calls on datastores
  • Add several predefined thresholds (disabled by default)
  • Improved logging and idle connection cleanup in java txvsphere process
  • Improved connection performance and FD leak in java txvsphere process when an endpoint is unresponsive
  • Add support for Layer2 ZenPack and network map
  • Fix time zone handling for event collection
  • Add a link from LUN to an underlying storage device (ZEN-18029)
  • Increase default all vSphere API calls. Lowered timeout during login, and raised during performance querying.
  • Collect host hardware vendor/model and management IP addresses
  • Fix error encountered when a vApp contains child resource pools (ZEN-17670)

3.1.3

  • Fix potentially missed linkage of UCS service profiles to vSphere hosts. (ZEN-18631)

3.1.2

  • Fix ConnectionStateError. (ZEN-16924)

3.1.1

  • Fix bug that halted data collection that after a single TimeoutError (ZEN-10927)
  • Fix handling of broken standalone resources in vSphere (ZEN-16382)
  • Background java process exits when zenvsphere is shut down (ZEN-15431)
  • Ensure java process is restarted automatically if zenpack is upgraded
  • Restart the java process with proper settings if it is started with default memory settings by zenmodeler (ZEN-16833)
  • Fixes to VMware Utilization report (ZEN-15523)
  • Add hypervisor version to UI (ZEN-15901)
  • Model optimization on Zenoss >= 4.2.4 SP728, 4.2.5 SP316 and 5.0.0. (ZEN-17058)

3.1.0

  • Add support for Zenoss Analytics
  • Add support for VMware vSphere 5.5
  • Improvements to impact model
  • This zenpack now depends upon ZenPacks.zenoss.PropertyMonitor.
  • Corrected handling of errors in performance data collection (ZEN-11086)
  • Improved vsphere_queryperf and added interactive testing of data sources in monitoring templates
  • Fix traceback that can occur after a connection has been lost and reconnected (ZEN-11033)
  • Fix datastore-released performance data collection under vmware 4.1 (ZEN-11110)
  • Fixed file descriptor leak that would result in an endless cycle of TimeoutErrors under certain conditions (ZEN-10927)
  • Avoid "pool exhaustion" errors by terminating the oldest connection to vsphere when the session limit is exceeded, rather than immediately failing. (ZEN-10325)
  • Model linkSpeed on physical NICs and compute bandwidth utilization.
  • The model used uncommitted percentages on datastores, and add graphs of capacity utilization over time (ZEN-11256)
  • Add graphs of VM state (ZEN-11255)
  • Add total memory to Cluster and Host graphs (ZEN-10890)
  • Add VMWare tools, VM Hardware, and ESX software versions. (ZEN-13442)
  • Correct units for storageCommitted and storageUncommitted (ZEN-15233)
  • Separate vApp and Pool counts on Datacenter grid (ZEN-15272)
  • Fix potential exception in default event mapping for vSphere (ZEN-12410)
  • Improve error reporting when invalid or duplicate hostnames or IPs are specified while adding an endpoint (ZEN-11907, ZEN-14272)
  • Properly indicate if there are open events for VMs and DistributedVirtualSwitches (ZEN-12434)
  • Ignore invalid NICs with no mac address (ZEN-10154)
  • Calculate vm_count for ResourcePools recursively. (ZEN-14262)
  • Correct issue where relationships between Datastore and LUN could be missed during the initial model (ZEN-13124)
  • Add support for performance counters on vNICs (ZEN-12106)

3.0.4

  • Reduction of invalidations caused by vSphere model changes. (ZEN-10838)
  • Remove modeling of guestHeartbeatStatus. (ZEN-10838)
  • Ignore pNICs with empty MAC addresses. (ZEN-10836)

3.0.2

  • Disable modeling optimization that led to POSKeyError errors.
  • Multiple improvements in vSphere API error recovery and session management.
  • Reduce pool exhaustion issues. Add related tuning options.
  • Continue collecting data when /Status/Ping events exist for endpoints.
  • Fix error caused by /usr/bin/java being non-existent or the wrong JRE.
  • Fix error caused by noexec option on /tmp filesystem. (ZEN-10076)
  • Notify the user of an attempt to add a duplicate endpoint. (ZEN-10741)
  • More detailed logging.

3.0.1

  • Add vsphere_gather and vsphere_queryperf command-line troubleshooting tools.
  • Add power state to VM component display panel.
  • Add LUN ID to LUN component display panel.
  • Add proper counting of VMs for licensing purposes.
  • Fix bug that would prevent editing of VMware vSphere datasources on Zenoss 4.1.
  • Fix SAN storage impact relationships.
  • Fix modeling bug on some vSphere endpoints.

3.0.0

  • Initial release.