This ZenPack monitors Kubernetes (K8s) clusters deployed on Google Cloud
Platform (GKE), Amazon Web Services (EKS), Microsoft Azure (AKS),as well
as on locally-hosted environments. It uses RBAC authentication to access
all data related to modeling and monitoring.
ZenPack features include:
Overall Cluster Health Monitoring
Health Monitoring for Nodes, Services, Pods
Graphs for Kubernetes Cluster, Nodes, Deployments, StatefulSets,
Pods, Containers
Dashboard Portlets for Pod CPU and Memory consumption
Service Impact and root cause analysis
Event Management
Commercial
This ZenPack is developed and supported by Zenoss Inc. Commercial
ZenPacks are available to Zenoss commercial customers only. Contact Zenoss to
request more information regarding this or any other ZenPacks. Click here to
view all available Zenoss Commercial ZenPacks.
Support
This ZenPack is included with commercial versions of Zenoss and
enterprise support for this ZenPack is provided to Zenoss customers with
an active subscription.
Compatible with Zenoss Resource Manager 6.7 and Zenoss Cloud
Support Requirements
Zenoss:
Zenoss 6.2+
ZenPackLib ZenPack 2.1.0+
Kubernetes:
Kubernetes versions 1.9.X - 1.21.X
Kubernetes versions 1.17.X - 1.21.X deployed on Google Cloud
Platform, GKE
Kubernetes versions 1.18.X - 1.21.X deployed on Amazon Web Services,
EKS
Kubernetes versions 1.20.X - 1.22.X deployed on Microsoft Azure, AKS
Kubernetes versions 1.16.X - 1.29.X deployed on local environment
Gallery
Upgrade Notes
Version 2.0.0
Beginning with version 2.0.0, incremental modeling is significantly improved:
- The new K8sRedisCache module and class allows Zenoss to properly track changes across restarts
of the collector daemon and enables proper replication of cached data through collectorredis.
- Two new zProperties control namespaces and their child components' modeling: zKubernetesNamespaceInclude and
zKubernetesNamespaceExclude, which replaces the deprecated zKubernetesNamespaceFilter.
Additionally, Kubernetes Nodes deployed on vSphere are now linked to the corresponding virtual machines.
Two new thresholds were added to Kubernetes Pods performance collection: CPU and Memory. If no limits are defined,
a default of 0.0 is returned. If no container limits are set in Kubernetes, the default is 90% of the node the pod is running on.
Note that in this release, PersistentVolumes and PersistentVolumeClaims are split into separate components,
with the following relations created:
When updating to version 1.2.0 monitoring of a new component StatefulSet
was added. Similar to Services, Deployments, Pods, and Containers,
StatefulSets can be selected for modeling using
zKubernetesNamespaceFilter. In addition, a new zProperty called
zKubernetesPodFilter was added to select Pods and Containers for
modeling.
Please note that the generation of identifiers for Pods and Containers
has been changed due to the improvements, therefore, after updating to
version 1.2.0, the old monitoring data for Pods and Containers,
collected before the update, will no longer be displayed on the
component graphs.
Version 1.1.0
When updating to version 1.1.0 or later (from versions prior to 1.1.0),
a new zProperty was added, zKubernetesNamespaceFilter, to filter
Deployments, Services, Pods, and Containers based upon the namespace to
which they belong. These four components link together, hence they all
rely on the same zProperty. Further, the default behavior for most
Kubernetes consoles hid components under the 'kube-system' namespace
while displaying everything else. This behavior has been adopted by the
1.1.0 Kubernetes ZenPack; the zProerties zKubernetesNamespaceFilter and
zKubernetesContainerNamesModeled may be updated during the upgrade
process to reflect the new default behavior.
If the zKubernetesContainerNamesModeled was changed from the default
value then the new value will not be updated, even if the value is
'kube-system/.*/.*'. In this situation, the property
zKubernetesNamespaceFilter will have to be updated to allow
'kube-system'.
Service Impact relations may become out of sync when upgrading to 1.1.0.
This issue should only affect instances where Service Impact is running.
Service Impact can be manually corrected (after installation is
complete) by running this command:
zenimpactgraph run --update
Kubernetes Structure and Discovery
Objects are automatically discovered via the Kubernetes API. The ZenPack
class structure can be visualized in the diagram on the right:
The Kubernetes model will be automatically updated as changes are
detected on the cluster. New and deleted Services, Deployments,
StatefulSets, Pods, and Clusters will be updated as part of the regular
monitoring cycle. Changes detected to Namespaces, Nodes, and
PersistentVolumes will also be automatically updated. Because
incremental modeling is conjoined with the Zenoss monitoring cycle,
(default 5 minutes cycles,) it may take several minutes before the
Zenoss Kubernetes model synchronizes with the Kubernetes Cluster.
Incremental modeling makes use of the Kubernetes Watch API to monitor
for changes to K8s clusters by tracking the resourceVersion for each API
endpoint. When a zenpython instance starts, the resourceVersion is set
to '0', and the first monitoring cycle will jump the resourceVersion to
the latest version. Occasionally, the resourceVersion may become 'Gone'
(indicating that the resourceVersion is too old and no longer in the K8s
history). In this situation, the resourceVersion is set back to '0', so
it can again jump forward to the latest version. Due to these factors,
it may take two cycles to fully synchronize the K8s model.
The following Kubernetes zProperties also affect incremental modeling:
zKubernetesContainerNamesModeled
zKubernetesContainerLabelsModeled
zKubernetesNamespaceInclude
zKubernetesNamespaceExclude
ZKubernetesPodFilter
zKubernetesWatchApiTimeout
Changes to these properties may not be pick-up and applied until the
next modeling cycle.
It is possible that specific Kubernetes cluster workloads might experience a high rate of churn with Pods and Containers.
The updated Kubernetes 2.0.0 ZP significantly reduces the massive number of datamaps previously sent, but high churn may still occur.
To identify Pods and their Namespaces with a large amount of churn, run the following command:
kubectlgetpods--all-namespaces--watch
To address this issue, update the zKubernetesNamespaceExclude and zKubernetesPodFilter filters to exclude these
namespaces and pods from modeling.
Device (Cluster)
Description: The device represents a single Kubernetes cluster.
Attributes:
buildDate
cluster_ip
cpu_capacity
cpu_usage
gcp_cluster
memory_capacity
memory_usage
platform
version
Relationships:
k8sNamespace
k8sNode
k8sPersistentVolume
Datasource/Datapoints:
event
metrics
cpu
memory
Graphs:
CPU Utilization
Memory Utilization
Capacity Thresholds:
CPU Capacity
Memory Capacity
Namespace
Description: Namespaces for Kubernetes.
Attributes:
container_count
namespace_uid
status
Relationships:
k8sService
k8sPod
k8sDeployments
k8sStatefulSet
Node
Description: Compute nodes that Kubernetes is build from.
Attributes:
architecture
cpu_allocatable
cpu_capacity
cpu_usage
ephemeral_storage_allocatable
ephemeral_storage_capacity
externalIP
guest_device
internalIP
kubeletVersion
manageIP
memory_allocatable
memory_capacity
memory_usage
modeled_cpu_allocatable
modeled_cpu_capacity
modeled_memory_allocatable
modeled_memory_capacity
node_hostname
node_type
node_uid
operatingSystem
pods_allocatable
pods_capacity
region
status
Relationships:
k8sCluster
k8sPod
Datasource/Datapoints:
status
status
metrics
cpu
memory
allocatable
cpu
memory
capacity
cpu
memory
Graphs:
CPU Utilization
Memory Utilization
Thresholds:
High Memory (default: disabled)
High CPU Load (default: disabled)
Persistent Volume
Description: Storage volume abstraction.
Attributes:
capacity
pv_uid
status
storageClassName
Relationships:
k8sNamespace
K8sPersistentVolumeClaim
Datasource/Datapoints:
status:
status
Persistent Volume Claim
Description: Storage volume abstraction.
Attributes:
storageClassName
pvc_uid
pv_uid
status
accessModes
volumeMode
labels
Relationships:
K8sNamespace
K8sPods
K8sPersistentVolume
Datasource/Datapoints:
status:
status
Service
Description: Kubernetes Services represent virtual services that are
realized by Pods and Containers.
Attributes:
cluster_ip
container_count
port_list
selector
service_type
service_uid
Relationships:
k8sNamespace
k8sPods
Deployments
Description: Kubernetes Deployments control automation for Pods and
Containers.
Attributes:
labels
created
Relationships:
k8sNamespace
k8sPods
Datasource/Datapoints:
replicas
availableReplicas
readyReplicas
unavailableReplicas
updatedReplicas
collisionCount
Graphs:
Replica Set
Collision Count
Thresholds:
Replica Count
StatefulSet
Description: StatefulSet controller for Kubernetes.
Attributes:
labels
created
Relationships:
k8sPod
k8sNamespace
Datasource/Datapoints:
replicas
currentReplicas
readyReplicas
updatedReplicas
collisionCount
Graphs:
Replica Set
Collision Count
Thresholds:
Replica Count
Pod
Description: A group of one or more containers with shared
storage/network, and a specification for how to run the containers.
Attributes:
labels
pod_uid
status
Relationships:
k8sNamespace
k8sNode
k8sContainer
k8sDeployments
k8sStatefulSet
Datasource/Datapoints:
metrics:
cpu
memory
status:
status
Graphs:
CPU Usage
Memory Usage
Container
Description: Lowest compute abstraction element for Pods.
Attributes:
cpu_limits
cpu_requests
image
labels
memory_limits
memory_requests
Relationships:
k8sPod
Datasource/Datapoints:
metrics:
cpu
memory
Graphs:
CPU Usage
Memory Usage
Note:It is common for some containers to have only partial
data for cpu/memory so it is natural that some of those graphs
will be missing data.
Thresholds:
High CPU Load
High Memory
Dashboard Portlets
This ZenPack adds portlets that provide at-a-glance views into Pod and
Cluster memory and CPU utilization. Portlets are viewed on the first
page upon login, and can be added or removed using the dashboard and
portlet controls.
Kubernetes Portlets
The following are portlets specific to Kubernetes:
Top K8s Pods by Memory
Top K8s Pods CPU
These two portlets can be filtered by:
Cluster
Namespace
Service
Platform Portlets
In addition to Memory and CPU, the following platform portlets support
Kubernetes events and issues:
Device Issues
Event View
Open Events
Open Events Chart
Usage
RBAC Authentication
You must expose the Kubernetes V2 and metrics.k8s.io APIs on your
system. We exclusively use Role-based access control (RBAC) for cluster
API access. For more information see Using RBAC Authorization.
You generally must do at least the following steps for both GCP and
locally installed Kubernetes systems:
Set MY_PREFIX and capture ACCOUNT_ID and API_SERVER:
MY_PREFIX=zenoss API_SERVER=$(kubectl cluster-info | head -1 | cut -d' ' -f6 | sed 's/\x1b\[[0-9;]*m//g')
If using GKE deployed on the Google Cloud Platform, first ensure
you are connected to the correct project associated with your
cluster. Now find your ACCOUNT_ID:
ACCOUNT_ID=$(gcloud info --format='value(config.account)')
If using EKS deployed on the Amazon Web Services, first ensure
you are connected to the correct project associated with your
cluster. Now find your ACCOUNT_ID:
ACCOUNT_ID=$(aws sts get-caller-identity --output text --query 'Account')
If using AKS deployed on the Microsoft Azure, first ensure you are
connected to the correct project associated with your cluster. Now
find your ACCOUNT_ID:
ACCOUNT_ID=$(az account show --query id --output tsv)
If using locally-hosted Kubernetes, determine the ACCOUNT_ID
and prepare the credentials as per Kubernetes Getting started.
The device should now load and model automatically
Adding a Custom Datasource to Metrics
In order to add a metrics datasource, you must be familiar with the API
target you wish to call and the resulting JSON data response.
The metrics datasource provided requires three configuration parameters,
which we describe below:
api_target: The API target that gets appended to the metrics
base API URL
data_path: The path through the returned JSON that identifies
the metric
aggregator: Method to aggregate values returned by apt_target
and data_path.
Together, the api_target and data_path provide the complete
information for the datasource to acquire the requested data.
The aggregator provides the method to put that data together to form a
single data value.
api_target
The api_target must be a valid path for the API. It must be in a plain
REST GET format.
<string1>/<string2>/<string3>
where each <string*> must be a valid string defined in the API.
Examples:
These examples supply the entire API path beyond the base URL, and are
required. More information can be found in Resource metrics pipeline.
data_path
The data_path string represents a path through the returned JSON data
that loosely follows the
jq style format which
separates path elements (dictionary keys) by dots.
It can include the following items:
Plain jq strings. For example: a.b
Strings with square brackets with a jq-style identifier:
items[metadata.name]
This example will scan all list elements in items and select the
meta.name element from those items.
To clarify, this will match all items that have the JSON key
metadata with sub-key name.
Note that this element is not useful on its own unless
items[metadata.name] filters items and selects out only those
which have metadata.name structure.
Strings with square brackets with a value-qualified jq-styled
identifier. This allows you to filter list items that match a
dictionary key or value.
Examples:
Note that the last two examples show that you can use dynamic TALES
expressions instead of static strings to filter the items elements
by value.
Also note that the last three examples specify the path to the
metric that matches the item list elements in square brackets.
aggregator
The required aggregator is selected from the drop-down. Choose from:
AVERAGE: Average all elements
FIRST: Choose the first element only
MAX: Select the maximum value
MIN: Select the minimum value
PERCENT_AVERAGE: Return average of the data multiplied by 100
PERCENT_SUM: Return sum of the data multiplied by 100
SUM_OR_ZERO: Sum the data, return zero if no data exists
SUM: Sum all the data
Installed Items
Installing this ZenPack will add the following items to your Zenoss
system:
Configuration and zProperties
The zProperties and default settings are as follows:
zKubernetesClusterIP: The IP address of the Kubernetes Cluster
API.
zKubernetesClusterName: Name of cluster used for association
with related resources.
zKubernetesPort: The TCP port of the API.
Default value: 443
zKubernetesServiceAccount: The Kubernetes service account
associated with the API account. See
kubectl get serviceaccounts for more information.
zKubernetesClusterToken: The token associated with
zKubernetesServiceAccount. See kubectl describe secrets for
more information.
zKubernetesGuestUseExternalIP: Boolean to set the manageIp to
the external IP for host monitoring. This property should be set to
False If guest device of EC2 account is modeled by an internal IP in
order to have links to Kubernetes Guest devices.
Default value: True
zKubernetesEventInterval: Polling interval for events.
Default value: 60
zKubernetesMonitoringInterval: Polling interval for metrics
collection.
Default value: 300
zKubernetesStatusInterval: Polling interval for status updates.
Default value: 300
zKubernetesContainerNamesModeled: RegEx Pattern of Container
names to model. Note that only Containers which are members of Pods
that match the zKubernetesNamespaceInclude and
zKubernetesPodFilter patterns may be captured. Containers that
belong to Pods that are not modeled will also not be modeled. If
kept blank, then no containers which satisfy
zKubernetesContainerNamesModeled pattern will be modeled.
Format: regex
Default value: [".*"]
zKubernetesContainerLabelsModeled: Container labels to model. If
both zKubernetesContainerLabelsModeled and
zKubernetesContainerNamesModeled are set, then all containers that
match at least one property will be listed, (i.e. Venn diagram
union).
Format: key: value
Default value: [""]
zKubernetesPodFilter: Pattern for Pods to model and monitor.
This affects the modeling and monitoring of Pods and Containers
related to them. By default, all Pods and Containers are allowed to
be modeled and monitored. If left blank, then no components will not
be modeled.
Format: regex
Default value: [".*"]
zKubernetesWatchApiTimeout: Timeout (in seconds) for the list/watch call.
This limits the duration of the call, regardless of any activity or inactivity.
Format: number
Default value: 2
zKubernetesNamespaceInclude: Regex pattern of namespaces to include when modeling.
Regular expression(s) for namespaces to include when modeling.
Each pattern should be written on a new line.
When specified, only the namespaces provided are included in modeling.
Any expression provided in zKubernetesNamespaceExclude will override
any specified entry here. By default, we include everything.
Format: regex
Default value: [".*"]
zKubernetesNamespaceExclude:
Regex pattern of namespaces to exclude in modeling.
Regular expression(s) for namespaces to ignore when modeling, including
any specified in zKubernetesNamespaceInclude. Write each pattern on a new line.
By default, we exclude the 'kube-system' namespace and components.
Format: regex
Default value: ["kube-system"]
Device Modeling Configuration
Some zProperties, noted above, can affect the application of other
properties during modeling of a device, i.e.:
zKubernetesNamespaceInclude/zKubernetesNamespaceExclude can affect:
zKubernetesPodFilter
zKubernetesContainerNamesModeled
zKubernetesContainerLabelsModeled
zKubernetesPodFilter can affect:
zKubernetesContainerNamesModeled
zKubernetesContainerLabelsModeled
zKubernetesContainerNamesModeled can affect:
zKubernetesContainerLabelsModeled
zKubernetesContainerLabelsModeled can affect:
zKubernetesContainerNamesModeled
To configure the modeling of Kubernetes Cluster components use the
following combination of zProperties:
zKubernetesNamespaceInclude: All Deployments, StatefulSets,
and Services that belong to Namespaces and that are
specified by the zKubernetesNamespaceInclude pattern will be modeled.
Any expression provided in zKubernetesNamespaceExclude will
override any expression specified here. By default, we include everything.
zKubernetesNamespaceExclude: All Deployments, StatefulSets,
and Services that belong to Namespaces and that are
specified by zKubernetesNamespaceExclude pattern will not be modeled,
including any entries specified in zKubernetesNamespaceInclude.
All Pods and Containers that belong to Namespaces that are specified by the
zKubernetesNamespaceExclude pattern will not be modeled, even if they are
specified by zKubernetesPodFilter, zKubernetesContainerNamesModeled,
and zKubernetesContainerLabelsModeled.
zKubernetesPodFilter: All Pods which belong to Namespaces
specified by the zKubernetesNamespaceExclude pattern and are not
specified by zKubernetesPodFilter will not be modeled. Containers
related to Pods are allowed to be modeled by zKubernetesNamespaceInclude
will not be modeled if these Pods are not allowed to model by
zKubernetesPodFilter even if the corresponding Containers are
specified by zKubernetesContainerNamesModeled and
zKubernetesContainerLabelsModeled.
zKubernetesContainerNamesModeled: All Containers allowed for
modeling by zKubernetesNamespaceInclude and zKubernetesPodFilter
will not be modeled if they are not allowed by
zKubernetesContainerNamesModeled, however, if they are allowed by
zKubernetesContainerLabelsModeled they will be modeled.
zKubernetesContainerLabelsModeled: All Containers allowed for
modeling by zKubernetesNamespaceInclude and zKubernetesPodFilter
will not be modeled if they are not allowed by
zKubernetesContainerLabelsModeled, however, if they are allowed by
zKubernetesContainerNamesModeled they will be modeled.
zKubernetesContainerNamesModeled: All Containers allowed for
modeling by zKubernetesNamespaceInclude and zKubernetesPodFilter
will be modeled if they are allowed by zKubernetesContainerLabelsModeled and
zKubernetesContainerNamesModeled. This option provides an
opportunity to mix approaches for the selection of Containers, which
should be modeled, using their names and labels.
Common values for filter zProperties
Common values for zKubernetesNamespaceInclude, zKubernetesPodFilter,
and zKubernetesContainerNamesModeled:
[""] - no components will be selected for the modeling.
[".*"] - all available components will be selected for the
modeling.
["default|test"] - all components related to default and test
will be selected for the modeling.
["^((?!pod-1).)*$"] - all containers which do not relate to
pod-1 will be selected for the modeling.
Common values for zKubernetesContainerLabelsModeled:
[""] - no components will be selected for the modeling.
["app: mysql|app: redis"] - containers which have a label mysql
or redis will be selected for the modeling.
Modeler Plugins
Kubernetes.Cluster
Service Impact and Root Cause Analysis
When combined with the Zenoss Service Dynamics product, this ZenPack
adds built-in service impact and root cause analysis capabilities. The
service impact relationships shown in the diagram (right) and described
below are automatically added and maintained. These will be included in
any services that contain one or more of the explicitly mentioned
components.
The following objects types would typically be added to Impact services.
Kubernetes Containers
Linux device associated with a Kubernetes Node
Impact Relationships between Kubernetes Components
GuestCluster (external): impacts Cluster
GuestDevice (external): impacts Node
Cluster: impacts Node, Persistent Volume
Node: impacts Container
Container: impacts Pod
PersistentVolume: impacts PersistentVolumeClaim
PersistentVolumeClaim: impacts Pod
Pod: impacts Deployment, StatefulSet, Service
Deployment: impacts Service
StatefulSet: impacts Service
Appendix: Kubernetes RBAC Setup
In order to properly enable the Core Metrics Service and provide RBAC
access permissions to other components, the following YAML must be
applied to the account in the following way:
The Amazon EKS control plane includes master nodes that run the
Kubernetes software, such as the Kubernetes API server and etcd. The
control plane runs in a separate account managed by AWS. Amazon EKS
worker nodes run in customer's AWS account and connect to cluster's
control plane. So, on AWS EKS we should see only worker nodes.
Appendix: AKS, Azure nodes
An Azure AKS cluster consists of two components:
The Azure frontend is managed by Ingress
The Azure production cluster hosts kubernetes nodes
Added links between Kubernetes nodes and corresponding vSphere VMs. (ZPS-6894)
Resolved "MISSING" status issue for Kubernetes Pods in the ZenPack. (ZPS-8063)
Fixed excessive invalidation churn caused by Kubernetes incremental modeling. (ZPS-8261)
Added CPU and Memory threshold support for Kubernetes Pods. (ZPS-8779)
Fixed large-scale DataMap generation during incremental modeling in K8sWatchDataSource. (ZPS-8780)
Fixed an issue where zKubernetesNamespaceFilter could disrupt modeling.
The two new zProperties are introduced: zKubernetesNamespaceInclude and zKubernetesNamespaceExclude. (ZPS-8944)
Improved debugging by enhancing error details in Kubernetes modeling logs. (ZPS-8951)
Fixed Kubernetes event generation and clearing by addressing component ID mismatches. (ZPS-8994)
Fixed event clearing issues when the Kubernetes API returns an empty response. (ZPS-8995)
Fixed linking of services, statefulsets, and deployments to Pods through custom labels. (ZPS-9001)
Updated the relationship between PersistentVolumeClaims and Pods to many-to-many (M:M). (ZPS-9017)
Fixed modeling errors caused by Unicode ObjectMap IDs. (ZPS-9018)
Fixed modeling failures related to outdated resourceVersion values in API calls. (ZPS-9019)
Resolved incremental modeling inconsistencies caused by ConflictErrors leading to lost Object Maps. (ZPS-9027)
Fixed Pod to Deployments, Services, and StatefulSets mapping issue during incremental modeling. (ZPS-9032)
Tested with Zenoss Cloud, Zenoss 6.7.0 and Service Impact 5.7.0
1.2.0
Added monitoring of StatefulSet component (ZPS-6984)
Added zKubernetesPodFilter for filtering Pods and Containers
(ZPS-7294)
Fixed Cluster, Container, and Node templates (ZPS-7409)
Fixed modeling of Pods with the same names (ZPS-7887)
Fixed namespace setting during modeling of Containers (ZPS-7888)
Tested with Zenoss Cloud, Zenoss 6.6.0 and Service Impact 5.5.5
1.1.0
Added support for incremental modeling
Added support for EKS (AWS) and AKS (Azure)
Add Deployment component and updated Impact relations (ZPS-4625)
Improved explanation in auth related errors (ZPS-5955)
Added Operating System Relationships (ZPS-5878)
Tested with Zenoss 6.4.1, Zenoss Cloud and Impact 5.5.1
1.0.1
Fix install issue with Zenoss 6.2.0 (ZPS-4674)
Tested with Zenoss 6.2.1, Zenoss Cloud and Impact 5.3.1
1.0.0
Initial Release
Tested with Zenoss 6.2.1, Zenoss Cloud and Impact 5.3.1