Amazon Web Services (Commercial)
ZenPacks.zenoss.AWS
This ZenPack provides support for monitoring Amazon Web Services (AWS). Monitoring for the following EC2, VPC, RDS, CloudFormation, ECS, ElastiCache and S3 entities is provided through a combination of the AWS EC2, RDS, CloudFormation, ECS, ElastiCache and CloudWatch APIs.
Commercial
This ZenPack is developed and supported by Zenoss Inc. Commercial ZenPacks are available to Zenoss commercial customers only. Contact Zenoss to request more information regarding this or any other ZenPacks. Click here to view all available Zenoss Commercial ZenPacks.
Releases
Version 5.2.1 - Download
- Released on 2024/04/25
- Requires PythonCollector ZenPack (>=1.11.0), ZenPackLib ZenPack (>=2.1.0)
- Compatible with Zenoss 6.4 - 6.7 and Zenoss Cloud
Version 5.2.0 - Download
- Released on 2022/12/01
- Requires PythonCollector ZenPack (>=1.11.0), ZenPackLib ZenPack (>=2.1.0)
- Compatible with Zenoss 6.4 - 6.7 and Zenoss Cloud
Features
The features added by this ZenPack can be summarized as follows. They are each detailed further below.
- Discovery of EC2, VPC, RDS, ECS, ElastiCache, CloudFormation, Auto Scaling Groups, Application and Network Load Balancers, and S3 entities.
- Monitoring of CloudWatch metrics.
- Monitoring of Region, S3Bucket and Subnet components.
- Event management and monitoring.
- Optional auto-discovery and monitoring of instance guest operating systems.
- Optional service impact with the addition of Zenoss Service Dynamics product.
- Monitoring of estimated charges for Amazon services.
- Expense Analysis is broken down by tag filters and service.
- Optional SQS Queues listening and consuming.
Discovery
The following entities will be automatically discovered through an account name, access key and secret key you provide. The attributes, tags and collections will be updated on Zenoss' normal remodeling interval which defaults to every 12 hours.
Regions: Attributes: ID: Collections: VPCs, Subnets, Zones, Instances, Volumes, Images, Snapshots, Gateways, Reservations, Elastic IPs, SQS Queues, CF Stacks, ECS Clusters, ElastiCache clusters
Zones: Attributes: ID, Region, State: Collections: Instances, Volumes, Subnets, Auto Scaling Groups, RDS Instances, Reserved Instances, ECS Container Instances, Load Balancers, ElastiCache Nodes
- VPCs: Attributes: ID, Region, CIDR Block, State: Tags: Name, CollectorCollections: VPCs,
- Collections: Subnets, Instances, ElastiCache Nodes
Subnets: Attributes: ID, Region, VPC, Zone, State, CIDR Block, Available IP Address Count, Zone Default, Auto-Public IP: Tags: Name: Collections: Instances
Instances: Attributes: ID, Region, VPC, Zone, Image, Subnet, State, Instance ID, Tag, Instance Type, Instance Type Details, Platform, Public DNS Name, Private IP Address, Public IP, Launch Time, Guest Device: Tags: Name: Collections: Volumes: Other: Guest Device (if monitored by Zenoss)
Volumes: Attributes: ID, Region, Zone, Instance, Type, Created Time, Size, IOPS, Status, Attach Data Status, Attach Data Device: Tags: Name: Collections: Snapshots
Elastic IPs: Attributes: ID, Region, Public IP, Private IP, Instance ID, Domain, Network interface ID, Network interface owner ID: Tags: Name
SQS Queues: Attributes: ID, Region: Tags: Name
S3 Buckets: Attributes: ID, Creation date: Tags: Name
Snapshots: Attributes: ID, Region, Volume, Volume size in Bytes, Progress, Started, Description: Tags: Name
VPN Gateways: Attributes: ID, Region, Gateway type, State: Tags: Name
Images: Attributes: ID, Region, Status, Location, Owner ID, Architecture, Image type, Kernel ID, Ramdisk ID, Description, Block device mapping, Root device type, Root device name, Virtualization type, Hypervisor: Tags: Name
Reserved Instances: Attributes: ID, Region, Zone, State, Instance Type, Reserved Instance ID: Tags: Name
RDS Instances: Attributes: ID, Region, Zone, State, Instance ID, Instance Type, Parameters Groups, Security Groups, Engine, Engine Version, VPC, VPC Subnets: Tags: Name
RDS Security Group: Attributes: ID, Region, Owner ID, Description, EC2 Groups, IP Ranges: Tags: Name
CF Stacks: Attributes: ID, Description, Creation Time, Disable Rollback, Notification ARNs, Capabilities, Tags, Status, Status Reason, Timeout, Policy, Template, Parameters, Outputs: Tags: Name: Collections: CF Resources
- CF Resources: Attributes: ID, Resource Type, Timestamp, Description, Logical
- Resource ID, Physical Resource ID, Status, Status Reason:
- Tags: Name
Auto Scaling Groups: Attributes: ID, Region, Zones, Created Time, Capacity, Default Cooldown, Health Check Grace Period, Health Check Type, Launch Configuration, Launch Template, Enabled Metrics, Protected From Scale In, Placement Group, VPC Zone Identifier: Tags: Name: Collections: Instances
Load Balancers (Application and Network): Attributes: ID, Region, Availability Zones, ARN, Canonical Hosted Zone Id, Type, Created Time, DNS Name, Scheme, State, IP Address Type, VPC Zone Identifier: Tags: Name: Note: Classic Load Balancers are not supported
Target Groups: Attributes: ID, Region, Zones, ARN, Protocol, Port, Target Type, Health Check, Unhealthy Threshold Count, Health Check Path, Matcher, VPC Zone Identifier: Tags: Name: Collections: Instances
Lambda Functions
Attributes: FunctionName, Description, FunctionArn, Handler, CodeSize, LastModified, Role, Alias, Memory, Runtime, Version, Timeout, Security Group
Tags: Name
ECS Clusters
Attributes: Name, Region, Status, Active Services, Running Tasks, Pending Tasks, Registered Container Instances, ECS Container Instances, ECS Schedule Rules
Tags: Name
Collections: ECS Container Instances, ECS Services, ECS Schedule Rules
ECS Container Instances
Attributes: Name, Region, ECS Cluster, EC2 Instance, Status, Registered CPU, Remaining CPU, Registered Memory, Remaining Memory, Running Tasks, Pending Tasks
Tags: Name
ECS Schedule Rules
Attributes: Name, Region, ECS Cluster, Schedule Expression, Event Bus Name, State, ECS Scheduled Tasks
Collections: ECS Scheduled Tasks
ECS Scheduled Tasks
Attributes: Name, Region, ECS Cluster, ECS Schedule Rule, Launch Type, Tasks, Subnets
ECS Services
Attributes: Name, Region, ECS Cluster, Status, Desired Tasks, Running Tasks, Pending Tasks, Launch Type
Tags: Name Collections: ECS Tasks
Note: For ECS Tasks that are hosted directly on the ECS Cluster additional 'virtual' ECS Service will be added to contain those tasks
ECS Tasks
Attributes: Name, Region, ECS Cluster, ECS Service, ECS Container Instance, Availability Zone, Last Status, Desired Status, ECS Containers
Tags: Name Collections: ECS Containers
ECS Containers
Attributes: Name, Region, ECS Cluster, ECS Service, ECS Container Instance, Availability Zone, Last Status, Desired Status
ElastiCache Cluster
Attributes: Name, Region, Cluster Description, Cache Cluster Status, Cluster Mode, ElastiCache Nodes, Snapshotting Node Id, Automatic Failover, Number of Shards, Number of Cache Nodes, Cache Node Type, Cluster ARN, Snapshot Retention Limit, Snapshot Window, Transit Encryption Enabled, At Rest Encryption Enabled
Tags: Name
Collections: ElastiCache Nodes
ElastiCache Node
Attributes: Name, Region, Availability Zone, ElastiCache Cluster, VPC, Subnets, Cache Node Status, Cache Node Type, Engine, Engine Version, Node Arn, Node Creation Time, Preferred Maintenance Windows, Parameter Group Name, Cache Subnet Group Name, Replication Group Id, Snapshot Retention Limit, Snapshot Window, Auto Minor Version Upgrade, Role, Address, Port, Shard
Tags: Name
Modeling
Starting from AWS ZP 5.0, the modeling was changed. The first modeling starts right after the AWS device has been added. After that, the AWS device will be modeled every 12 hours by default.
Note: Right now we use only one `aws.Base` plugin in 'Modeler Plugins' and new `zAWSEnabledPlugins` property in the Configuration Properties' was added to control a list of AWS modeler sub-plugins.
If you model the device directly by pressing the 'Model Device' button, changes will not be applied instantly after modeling is done. It will take some time to apply all new data to your device.
There are two datsources in the EstimatedCharges template. AWSModelRunPlugin is responsible for starting a special server on zenpython's side to listen for requests from zenmodeler, when modeling is performed. AWSQueueProcessingPlugin periodically checks the internal queue on zenpython's side and sends datamaps contained to zenhub.
Modeling and zAWSEnabledPlugins property
zAWSEnabledPlugins
controls a list of AWS modeler sub-plugins, such
as:
- EC2
- RDS
- S3Buckets
- CloudFormation
- AutoScalingGroups
- LoadBalancers
- LambdaFunctions
- ECS
- ElastiCache
Monitoring
The following metrics will be collected every 5 minutes by default. Any other CloudWatch metrics can also be collected by adding them to the appropriate monitoring template. The Average statistic is collected, and the graphed value is per second for anything that resembles a rate.
Account: Metrics: EstimatedCharges, EC2EstimatedCharges, S3EstimatedCharges, RDSEstimatedCharges, DynamoDBEstimatedCharges, LightsailEstimatedCharges, RedshiftEstimatedCharges, SESEstimatedCharges, SNSEstimatedCharges, CloudTrailEstimatedCharges, DataTransferEstimatedCharges, QueueServiceEstimatedCharges, KmsEstimatedCharges, ECSEstimatedCharges, ElastiCacheEstimatedCharges
Regions: Metrics: CPUUtilization, DiskReadOps, DiskWriteOps, DiskReadBytes, DiskWriteBytes, NetworkIn, NetworkOut
Note
These metrics aggregated only for EC2 Instances with detailed monitoring enabled
Instances: Metrics: CPUUtilization, DiskReadOps, DiskWriteOps, DiskReadBytes, DiskWriteBytes, NetworkIn, NetworkOut, StatusCheckFailed_Instance, StatusCheckFailed_System, CheckReserved
Volumes: Metrics: VolumeReadBytes, VolumeWriteBytes, VolumeReadOps, VolumeWriteOps, VolumeTotalReadTime, VolumeTotalWriteTime, VolumeIdleTime, VolumeQueueLength: Provisioned IOPS Metrics: VolumeThroughputPercentage, VolumeReadWriteOps
S3 Buckets: Metrics: BucketTotalSize, BucketKeysCount
RDS Instances: Metrics: CPUUtilization, FreeableMemory, FreeStorageSpace, SwapUsage, ReadIOPS, WriteIOPS, DatabaseConnections, DiskQueueDepth, ReplicaLag
SQSQueue: Metrics: NumberOfMessagesSent, NumberOfMessagesDeleted, ApproximateNumberOfMessagesVisible
Auto Scaling Groups: Metrics: GroupInServiceInstances, GroupPendingInstances, GroupStandbyInstances, GroupTerminatingInstances, GroupTotalInstances
Application Load Balancers:
Metrics: ActiveConnectionCount, NewConnectionCount, RejectedConnectionCount, TargetConnectionErrorCount, ELBAuthError, ELBAuthFailure, TargetResponseTime, TargetTLSNegotiationErrorCount, ClientTLSNegotiationErrorCount, HTTPCode_ELB_3XX_Count, TPCode_ELB_4XX_Count, HTTPCode_ELB_5XX_Count, HTTPCode_ELB_500_Count, HTTPCode_ELB_502_Count, HTTPCode_ELB_503_Count, HTTPCode_ELB_504_Count, HTTPCode_Target_2XX_Count, HTTPCode_Target_3XX_Count, HTTPCode_Target_4XX_Count, HTTPCode_Target_5XX_Count, ProcessedBytes, ConsumedLCUs, RequestCount, RuleEvaluations:
Note
Depending on the Load Balancer configuration, some metrics might be unavailable.
Network Load Balancers: Metrics: ActiveFlowCount, NewFlowCount
Network Target Groups: Metrics: HealthyHostCount, UnHealthyHostCountApplication Target Groups: Metrics: HealthyHostCount, UnHealthyHostCount
ECS Clusters:
Metrics: CPUUtilization, MemoryUtilization, CPUReservation, MemoryReservation, NetworkRxBytes, NetworkTxBytes, StorageWriteBytes, StorageReadBytes
Note
Enable Container Insights to monitor Network and Storage metrics.
ECS Service:
Metrics: CPUUtilization, MemoryUtilization, NetworkRxBytes, NetworkTxBytes, StorageWriteBytes, StorageReadBytes
Note
Enable Container Insights to monitor Network and Storage
metrics
ElastiCache Cluster:
Metrics: CacheClusterStatus
ElastiCacheCluster: Metrics: EngineCPUUtilization, CPUUtilization, DatabaseMemoryUsagePercentage, SwapUsage, MemoryFragmentationRatio, ReplicationLag, CurrConnections, Evictions, NetworkBytesIn, NetworkBytesOut, CacheNodeStatus
The Amazon CloudWatch datasource type also allows for the collection of any other CloudWatch metric.
Besides CloudWatch metrics, the following metrics will also be collected every 5 minutes by default.
Subnets: Metrics: Available IP Addresses count
Monitoring large cloud may require contacting AWS support with a request to increase the CloudWatch API requests limit. An appropriate events will be created in Zenoss in case the limit for CloudWatch requests has been exceeded.
CloudWatch datasources utilize multithreading for better performance. It
is possible to increase speed by setting twistedthreadpoolsize
the
value in the configuration of zenpython
daemon. Please note that
setting a higher value will result also in bigger memory usage.
Collection interval may be changed using the
zAWSCloudWatchCollectionInterval
property. By default, it is set to
300 seconds. This will affect most Amazon CloudWatch datasources and may
help in reducing monitoring costs. It doesn't change the interval of
datapoints on the graphs, but only changes the frequency Zenoss performs
API calls to CloudWatch.
Note
By default, CloudFormation
, ECS
and ElastiCache
modeler sub-plugins are not enabled. Users need to add them manually and
initiate the modeling for those components to be modeled and
monitored. For AutoScalingGroup monitoring to become working, the user
needs to enable the monitoring on the AWS console for the specific
AutoScalingGroups.
SQS Queue Messages Monitoring
Users may configure Zenoss to consume specific SQS Queues, parse and convert messages to Zenoss events.
SQS Events generated might be delayed in their creation due to Amazon's use of short polling by default.
If configured to not delete messages (listen), events will be sent only for messages created after the previous monitoring cycle. This prevents flooding Zenoss Events console with historical SQS messages.
CloudFormation Events Monitoring
The monitoring plugin collects CloudFormation Events for each CF Stack and shows them as Zenoss Events at the same time. Also, it updates the status of the CF Stack or CF Resource component it belongs.
Standard Zenoss Event Fields:
- device (set to EC2Account)
- component (CF Stack)
- summary
- severity
- eventClassKey (set to CFStackEvents)
- eventKey (for de-duplication and auto-clear fingerprinting)
Additional Fields:
- aws.cf.event_id
- aws.cf.logical_resource_id
- aws.cf.physical_resource_id
- aws.cf.resource_properties
- aws.cf.resource_status
- aws.cf.resource_status_reason
- aws.cf.resource_type
- aws.cf.stack_id
- aws.cf.stack_name
CREATE_FAILED
and DELETE_FAILED
events have CRITICAL severity, all
others INFO one.
By default, all generated events are mapped to /AWS/CloudFormation
event class.
Note
Once the event is sent, it will not be sent again. If the user clears the event, it will not reappear again.
In case zAWSCloudFormationEventsAutoClear
zProperty set to True for
each CREATE_COMPLETE
and DELETE_COMPLETE
corresponding auto-clear
event will be generated to clear previous CRITICAL ones.
ElastiCache Events monitoring
The monitoring plugin collects ElastiCache Events for each ElastiCache Cluster and Node and shows them as Zenoss Events at the same time.
Standard Zenoss Event Fields:
- device (set to EC2Account)
- component (ElastiCache Cluster/ElastiCache Node)
- summary
- severity
- eventClassKey (set to
ElastiCacheEvents
) - eventKey (for de-duplication and auto-clear fingerprinting)
Additional Fields:
- aws.elasticache.sourcetype (replication-group/cache-cluster)
CREATE_FAILED
and DELETE_FAILED
events have CRITICAL severity, all
others INFO one.
By default, all generated events are mapped to `/AWS/ElastiCache` event class.
Note
Once the event is sent, it will not be sent again. If the user clears the event, it will not reappear again.
Zenoss Notifications with SES
Notifications for events now have the option to be sent by email using Amazon SES.
In addition to the standard email notification fields, you will need to fill out the following additional fields.
- AWS Account Name
- AWS Region
- AWS Access key
- AWS Secret key
The senders' email and the email of the subscribers must be verified within SES for the target region.
Soft Limits Monitoring
The following resource counts subject to the soft limits will be collected every 5 minutes and when any of these metrics approaches a soft limit threshold, a Zenoss event will be triggered.
Regions: Soft Limit Metrics: Elastic IPs count, Instances count, Subnets count, VPC Security Groups count, VPC Security Groups Rules max, Volumes count
The thresholds are set to the default limit values. If you changed this limit for your account, you should manually change the Max threshold value using the following steps:
- Select the AWS device on Infrastructure page.
- Navigate to Regions component and select region.
- Click Display and select
Templates
. - Press Create Local Copy button (The template will be changed for the current region only).
- On the Thresholds panel choose the resource count to be changed.
- Double-click on the resources count and change the value in the Maximum Value field.
Guest Device Discovery
You can optionally configure each monitored AWS account to attempt to discover and monitor the guest Linux or Windows operating systems running within each EC2 instance when specific Tags are present. This requires that your Zenoss system has the network and server access it needs to monitor the guest operating system. VPC and non-VPC modes are supported.
The guest operating system devices' life-cycle is managed along with the instance. For example, the guest operating system device is set to a decommissioned production state when the EC2 instance is stopped, and the guest operating system device is deleted when the EC2 instance is destroyed.
Service Impact
When combined with the Zenoss Service Dynamics product, this ZenPack adds built-in service impact capability for services running on AWS. The following service impact relationships are automatically added. These will be included in any services that contain one or more of the explicitly mentioned entities.
Service Impact Relationships
- Account access failure impacts all regions.
- Region failure affects all VPCs, ElastiCache Clusters, ECS Clusters and zones in the affected region.
- VPC failure affects all related subnets.
- Zone failure affects all related subnets, instances, RDS Instances, Auto Scaling Groups, Load Balancers, ECS Container Instances and Volumes.
- Subnet failure affects all instances on the affected subnet.
- Subnet failure affects all functions on the affected subnet.
- Subnet failure affects ECS Container Instances and ECS Scheduled Tasks on the affected subnet.
- Volume failure affects any attached instance.
- Instance failure affects the guest operating system device and Target Groups instance belongs to.
- SQSQueue, VPNGateway, or EC2ElasticIP failure affects any related region.
- S3Bucket failure affects related account.
- Each component affects the corresponding CF Resource if it has any related region.
- CF Resource failure affects CF Stack.
- Auto Scaling Group affects related EC2 Instances.
- Target Group affects related Load Balancers.
- ECS Cluster affects ECS Services, ECS Container Instances and ECS Schedule Rules.
- ECS Container Instance affects ECS Tasks.
- ECS Schedule Rule affects ECS Scheduled Tasks.
- ECS Service affects ECS Tasks.
- ECS Tasks affect ECS Containers.
- ElastiCache Cluster affects ElastiCache Nodes.
Tag Filters
The ZenPack now provides a way to group and collect AWS components on an account based on AWS Tags. You can define a tag filter by navigating to your AWS account device and selecting "Add AWS Tag Filter" from the "+" menu in the lower left corner of the screen. On the dialog that pops up, give your Tag Filter a name, and select the tag you want to track. You can combine multiple tags with the AND and OR operators. You can also generate a Component Group based on the Tag Filter. Click Submit when finished.
The Tag Filter will be visible in the the navigation bar area, and the "AWS Tag Filters" section. This will allow you to view all components of any type matched by the filter, along with their graphs.
In addition, you can use this Tag Filter to view billing information for the group of components in the Expenses Analysis section (see Expense Analysis).
The AWS Tag Filters use a special monitoring template, TagFilter, which is not visible in the device-level monitoring template section, but is visible if you go to Advanced > Monitoring Templates. From here, you can add modify the template, should you need to do so.
Estimated charges monitoring
To turn on monitoring of charges for Amazon services one should enable EstimatedCharges monitoring template for AWS device. This will add graphs with billing information to the device overview page and on the Expenses Analysis page.
Account Billing Overview
To control spendings limit zAWSBillingCostThreshold
zProperty should
be used. It is set to 1000 by default. This property sets threshold for
bullet-like billing graph to turn red and used in "Billing Cost"
threshold as well. Event is generated if spending goes over its value.
Billing graphs shows estimated charges for whole account and detailed charges per service. Top 10 services displayed on pie chart.
This ZenPack uses linear interpolation to predict total per month
charges and this information is displayed on the device overview page
and on the Total Estimated Charge
graph as new a datapoint in
Expense Analysis
.
Expense Analysis
You can track AWS usage charges for a given tag or tag group, and group by specific services. In order to set this up, you must create a Tag Filter to match the tag or tags in which you are interested. And then you must configure detailed billing reports in your AWS account. See Configuring Charges Per Tags Monitoring for details.
Expenses Analysis
Cloudwatch API Cost
This ZenPack uses the Amazon Cloudwatch API to collect metric data. The first 1,000,000 calls to this API each month are free, and then additional calls are charged at a rate of $0.01 per 1,000 calls. For specific pricing questions, see AWS Cloudwatch Pricing.
A report is provided (Reports > AWS Reports > Monitoring Costs) to provide a detailed breakdown of API calls and estimated cost per monitoring template on each monitored EC2 Account.
CloudFormation Stacks Blueprints
CloudFormation Stacks Blueprints provides a graphical representation of all Stacks templates. The same way as it's done in AWS Console.
Stacks Blueprints
At start, only stacks are shown. Double click on the node expands stacks and shows its resources. Also, buttons for quick expanding and collapsing all visible stacks are available.
The set of visible stacks can be narrowed down by regions and stack name filters. The stack name filter sets the fragment that needs to be present in the stack's name. After setting filters Refresh button should be pressed to apply changes.
Each node in the stack is a resource defined in the template. The first row of text specifies the name of the resource defined in the template, the second one is a type of resource and the last is the id of deployed AWS entity.
By default diagram only shows resources that were deployed, to show all resources Show Undeployed Resources checkbox can be used.
Links represent dependencies between resources (e.g. EC2 Instances refer to Security Groups).
There also are separate blueprints for each CF Stack component.
Configuring ECS Schedule Rules discovery
zAWSECSEventBuses
contains names of AWS Event Bridge Event Buses which
are used for ECS Schedule Rules discovery.
By default Event Bus with the name default
is used. If you use a
custom Event Bus for ECS Schedule Rules, you will need to add its name
to zAWSECSEventBuses
manually.
Usage
Adding AWS Accounts
Use the following steps to start monitoring EC2 using the Zenoss web interface.
- Navigate to the Infrastructure page.
- Choose Add EC2 Account from the add device button.
- Enter your AWS account id, account name, access key and secret key.
- Optionally choose a collector other than the default localhost.
- Click Add.
Alternatively, you can use zenbatchload
to add accounts from the command
line. To do this, you must create a file with contents similar to the
following. Replace all values in angle brackets with your values minus
the brackets. Multiple accounts can be added under the same
/Device/AWS/EC2
section.
/Devices/AWS/EC2 loader='ec2account', loader_arg_keys=['accountid', 'devicename', 'accesskey', 'secretkey', 'devicePath', 'collector'] <devicename> accountid='accountid', devicename='devicename', accesskey='accesskey', secretkey='secretkey', devicePath='/Devices/AWS/EC2', collector='localhost'
You can then load the account(s) with the following command:
zenbatchload <filename>
Configuring filter for modeler plugin
Use zAWSRegionToModel
property to narrow components modeled. By
default it has an empty value, so all EC2 regions and its child
components will be discovered. Specify the EC2 region name, or multiple
names separated by a comma in it. This will be used as a filter and may
help with large AWS accounts.
Some regions (such as 'ap-east-1' (Hong Kong)) may be disabled by default, on the AWS console. In this case, a message about skipping it will be shown while modeling.
Configuring Guest Device Discovery
Use the following steps to configure instance guest device discovery. Guest device discovery must be configured individually for each EC2 account.
- Navigate to one of the EC2 accounts.
- Click the edit link beside Device Class for Discovered Linux Instances
- Choose the device class for Linux and/or Windows instances.
- Navigate to the Configuration Properties panel and in the
zAWSDiscover
property specify the instances' tags and values (e.g.<tag:value>;
). - Verify that appropriate SSH, SNMP or Windows credentials are configured for the chosen device class(es).
- To choose private or public IP address will be used for creating
guest devices, and change the
zAWSGuestUsePublicIPs
property. - To specify device classes for guest devices based on AWS tags from the
instance, set the tag
key=value
in thezAWSGuestDeviceClassTags
. For example: in EC2 Console it is a tagapp=1
, then useapp=1
and/Server/Linux
inzAWSGuestDeviceClassTags
property edit window. - Like specifying guest device classes from AWS tags, Zenoss Group &
System membership can be configured via
zAWSGuestDeviceGroupingTags
. The tag definition works exactly as it does inzAWSGuestDeviceClassTags
. Note: you will need to restart zproxy service in Control Center, to have a proper UI window for this zProperty after installation. - Remodel the EC2 account by choosing Model Device from its menu.
zAWSGuestDeviceClassTags Property
Please note: zAWSDiscover
defines a filter for guest devices discovery
and used before zAWSGuestDeviceClassTags
. A list of tags in
zAWSGuestDeviceClassTags
used from top to bottom, so first matching
key=value will apply. Also, property zAWSGuestDeviceClassTags
takes
precedence over Device Overview configured classes.
zAWSDiscover
also supports complex tags in the format
<some:long:key:value>;
Multiply colons are allowed in the
key
name only (key:key:value
).
If your instances are VPC instances and are in a different VPC than the Zenoss server that's monitoring the EC2 account, you must add a Collector tag containing VPC with the value set to the name of the Zenoss collector to which discovered guest devices should be assigned.
Example:
- If
zAWSDiscover
was filled with the valueTest:test;
after modeling all the devices with the tagTest:test
will be discovered - If
zAWSDiscover
was filled with the valueTest1:test1;Test2:test2
after modeling all the devices with either of the tag will be discovered
Configuring Remote Collector for Guest Devices
You can optionally configure an alternate remote collector for the devices created from AWS Instances with the following configuration properties:
zAWSGuestCollector
This property allows you to specify the name of the collector all
discovered devices for this AWS device will use.
zAWSResetGuestCollector
Setting this property to false on a guest device (not an EC2
Account) will tell AWS not to change the collector if you have set
it manually.
Configuring closing events on guest device deletion
New zAWSCloseGuestEvents
property added. If set to true, all open events
for guest device will be closed when the guest device is deleted.
Find Missing Guest Devices
Guest devices should be discovered automatically during modeling. However, if an error occurs during modeling or some other unexpected event, it is possible for guest devices to be skipped. If some guest devices appear to be missing, you can force the discovery process to be repeated.
In the Zenoss UI, navigate to your AWS EC2 Account device, and find the gear icon menu in the bottom left corner of the window. Under this menu, click the option labeled "Find Missing Guest Devices." This will schedule a job for immediate execution, which will clear the guest ID cache and run the discovery process for each instance. Existing guest devices will remain, but any devices previously missed will be detected. You can monitor the progress of this job in the Jobs section of the UI, under the Advanced Tab.
Reasons a Guest Device Fails to be Discovered
Several criteria must be met in order for a guest device to be discovered by the AWS ZenPack. Those requirements are as follows:
- The instance must contain a tag listed in the
zAWSDiscover
configuration property. - Guest device classes must be defined. See the "Device Class for Discovered Linux Instances" and "Device Class for Discovered Windows Instances" fields on the Device Overview page.
- The guest must have a valid collector, either from the EC2
Instance's VPC, from the
zAWSGuestCollector
, or the default collector for the AWS account device. - The guest must have a valid
manageIP
, either the EC2 Instance's private IP, public IP, or its DNS name. - The EC2 Instance
guest
property must be set. This should be set automatically. If you believe it is set improperly, use the Find Missing Guest Devices feature described above. - The EC2 Instance
_has_guest
must be false. This should be set automatically. If you believe it is set improperly, use the Find Missing Guest Devices feature described above. - The guest device ID must not be previously cached in the AWS account's guest device ID cache. This should be handled automatically. If you believe it is set improperly, use the Find Missing Guest Devices feature described above.
If all the criteria above are met by the EC2 Instance, and an existing device with an ID or title matching the EC2 Instance's ID exists, or an existing device has a matching IP address, the EC2 Instance will be linked to that existing device.
If no existing device matches the EC2 Instance, but the criteria above are met, a new device will be created in the Linux or Windows device class configured for the account.
Note that guest device creation is triggered during modeling, but is queued as a job to be run later. Thus a guest device will not show up until after modeling has been completed, and the corresponding scheduled job has been completed.
If a device link appears to be missing, double-check the criteria above, and run the Find Missing Guest Devices task described in the preceding section.
When creating guest devices a job should be scheduled for each guest device to be created. If a job was created for the guest device, but the guest device was not created, you can check the job output in the Jobs section of Zenoss.
If a job was not created, you can try running the modeler in debug mode to see why guest device creation was skipped.
Configuring Instances Remodeling
You can optionally configure your monitored AWS account, so that the newly added or recently dropped instances are automatically reflected on Zenoss UI during monitoring:
- Navigate to the Configuration Properties panel.
- Enable the
zAWSRemodelEnabled
property.
If zAWSRemodelEnabled
is false, only the instance state will be
updated on existing instances. If set to true, then all instance
properties will be updated on existing instances, and new instances will
be added to the model.
Configuring Auto Change of the Production State for EC2 Instances
You can disable auto change of the production state for EC2 Instances, for this purpose you have to:
- Click on the Infrastructure tab.
- Select discovered EC2 Instances or the appropriate device classes, in case you want to change the behavior for a group of underlying EC2 instances.
- Navigate to the Configuration Properties panel.
- Change the
zAWSAutoChangeProdState
property (default is true).
By default, the production state is changed to 'Production' (1000) for
running EC2 instances, and to 'Decommissioned' (-1) for stopped ones.
These states may be customized by specifying the desired production
state IDs (numbers) in zAWSAutoChangeProdStateRunning
and
zAWSAutoChangeProdStateStopped
.
If the user changes the production state for some guest device manually, this state will be used for this guest device when the EC2 instance switches to a running state.
PEM file
Use the following steps to specify the PEM file to the region for use in auto-discovering instance guest operating systems:
- Navigate to the Configuration Properties panel.
- Set region name and path to PEM file in the appropriate fields of
zAWSRegionPEM
property (see image below).
zAWSRegionPEM Property
Disable AWS Snapshot Monitoring
In some cases, you may have a large quantity of AWS Snapshots in your
environment, which can slow down the performance of the modeler. If you
do not need to model them, you can disable the collection of snapshots
by setting the zAWSEnableSnapshotCollection
property to false. This
will prevent the modeler from collecting and modeling snapshots in the
future. It will also cause current snapshot components to be removed
from Zenoss the next time the model is updated.
If you have already modeled your AWS snapshots, and the count is high,
removing them can cause the modeler to timeout. If this occurs, you can
remove them manually by running the included dmd script
delete_all_snapshot_components
from the zope container.
Note: The delete_all_snapshot_component
script will delete all
AWS snapshot components from all AWS devices without prompting for
confirmation. If you have multiple AWS devices and only want to delete
snapshots from some devices, use zendmd
.
Configuring Charges Per Tags Monitoring
If you use tag filters to organize your modeled AWS components, you may also want to enable monitoring charges per tag filter added to Zenoss. This will require configuration on both AWS and Zenoss sides.
To process Cost and Usage reports AWS Athena service is used, so please expect some extra costs for the service usage.
Configuration on the AWS side (you may use a different account to
collect billing data from the account being used for monitoring, by
using zAWSBillingAccessKey
and zAWSBillingSecretKey
zProperties):
- Activate User-Defined Cost Allocation Tags according to AWS documentation. Choose tags you use to filter components.
- Turn on the AWS Cost and Usage report according to AWS documentation. There is no need to enable Redshift or QuickSight manifests.
- Grant read permissions for S3 bucket reports to be delivered to the AWS account configured in Zenoss.
- Grant AWS user used in Zenoss access to use AWS Athena service. Please see the details in the documentation.
Note: It can take up to 24 hours for AWS to start delivering reports to your S3 bucket.
For configuration on Zenoss side set the following zProperties:
zAWSBillingReportS3Bucket
: S3 Bucket name of Cost and Usage reports being delivered (e.g.aws-billing-master
).zAWSBillingReportPrefix
: Report path prefix prepended to reports. Can be empty.zAWSBillingReportName
: Report namezAWSBillingAthenaResultsS3Bucket
: S3 Bucket AWS Athena will use to store query results. For details please check AWS documentation.zAWSBillingAthenaRegion
: Region which will be used to run AWS Athena. To avoid extra charges for cross-region data transfer, it's recommended to use the same region as S3 Bucket with Cost and Usage reports.
If Cost and Usage reports are stored on a separate account,
zAWSBillingAccessKey
and zAWSBillingSecretKey
zProperties should be
set to access and secret keys of this account. If these properties are
empty, access and a secret key from the device will be used.
If a tag is used for Tag Filter but is missing in Cost and Usage reports, billing data will not be collected for such Tag Filter and a corresponding Info event with a list of missing tags will be generated.
Configuring HTTP Proxies
If necessary, this zenpack can query AWS through an HTTP proxy. This is
configured in the usual way, by setting the *_proxy
environment
variables. Because of this, the setting is global for a particular
Zenoss process. It is therefore important to be aware that, for
instance, enabling proxying for zenpython may cause it to be used for
other service monitoring beyond just AWS.
To configure these environment variables, edit the service definitions
(via serviced service edit
or the Control Center UI) for the
zenpython, zenmodeler, and zenjobs containers as follows:
Change
"Environment": null,
to:
"Environment": [ "http_proxy=http://[proxy host]:[proxy port]", "https_proxy=http://[proxy host]:[proxy port]", "no_proxy=localhost" ],
Note that both http_proxy
and https_proxy
values must begin with
http://
. The no_proxy
variable is required so that communication
with other zenoss services is not impacted.
Note: Do not add this to the zope container.
Configuring SQS Queue Messages Monitoring
To control SQS Queue Messages Monitoring zAWSSQSQueues
zProperty
should be used. It defines a list of queue name RegEx patterns and
Zenoss event generation configuration.
zAWSSQSQueues Property
- Queue Name RegEx: RegEx to look for any match in the SQS queue name. (Note: empty expression matches all queue names.)
- Parser: Parser name to process messages
- Event Class Key: Optional field with Event Class Key for generated events
- Delete: Defines whether to delete the processed message from the queue
For each SQS queue, the list of patterns is checked from top to bottom, and the first queue name match will define a configuration. In case of no match, the queue will be not listened to.
Messages from the queue will be consumed, and parsed with one of the following message types:
simple
- Uses message body as subject/message for Zenoss events. The severity will be Info.
sns_topic
- There is a possibility to attach SQS Queues to SNS Topics. The parser Zenoss will decode the message and use the "Subject" and "Message" fields to set corresponding Zenoss event fields. Event severity will be Info.
zenoss_json
-
Zenoss will expect the message body to be in JSON format, decode and treat it as an already formed Zenoss event. The next fields will be extracted:
- message (at least one of summary or message should be present)
- summary
- eventKey (optional)
- severity (optional, will be Info as default)
- rcvtime (optional, if omitted current timestamp will be used)
- eventClassKey (optional, if omitted configured will be used)
- eventClass (optional)
For all parsers the generated events will have the next fields:
- device (set to EC2Account)
- component (SQS Queue)
If the event class or event class key is not defined in the event,
AWSSQSMessages
event class key (mapped to/AWS/SQSMessage
) will be used.
Configuring and Testing CloudWatch Datasources
You can get namespaces and metrics available on the selected device, for
this purpose you have to: 1. In the CloudWatch datasource edit window
click on the dropdown button in Namespace
or Metric Name
input
fields. 2. In the new window select Device
, then Region
and a list
of available Namespaces and Metrics for that region will appear.
CloudWatch Datasource Helper
Before testing datasource (using the button Test
in datasource edit
window) you may set a valid device name in Device Name
input field.
Adding Custom AWS CloudWatch Datasources
Sometimes you may need to monitor some AWS CloudWatch metrics which are
not defined in AWS ZP. In case AWS ZP already models components you need
to add metric you may just update provided template (create a new one
with-addition
suffix to avoid template overriding during the AWS ZP
upgrade) and add a new datasource with corresponding graphs. You will
need an Amazon CloudWatch datasource with Namespace and
Metric Name
fields populated.
If there is no component you want to monitor you may create a graph on the device level. But you will need a separate datasource for each component instance.
Here is an example of how to add monitoring of AWS ECS:
- Create a separate monitoring template in Advanced > Monitoring
Templates tab under
/AWS/EC2
device class. - Create a new data source for the template created above using the Amazon CloudWatch type. Add a datapoint with the same name as datasource.
- Edit datasource created above and
populate
Namespace
andMetric Name
fields. Change theRegion
field and set the actual region where your ECS cluster is located. InDimension
put the string in formatServiceName=<service_name>;ClusterName=<cluster_name>
(e.g.ServiceName=web;ClusterName=test-ec2-cluster
). - You can add the datapoints you created to a graph for visualization.
- Open your AWS device bind template you just created.
- You may see your graphs on the Graphs page.
Enabling incremental modeling for ECS Components
To enable incremental modeling for ECS Components you will need to select any Region component, switch to the Templates in the drop-down menu and enable the ECSWatch datasource. This datasource is disabled by default. The collection interval can be changed by using the Cycletime property of the ECSWatch datasource.
ECS incremental modeling
Changing maximum retries while making API call to AWS
zAWSMaxCallRetries
with a default value of 4 is responsible for the
number of maximum retries per call to AWS API. You can increase it if
you have throttling issues with AWS API. Setting this property to high
values may cause too long calls and delays.
Using zAWSUseNewIds
property
This property is applied for AutoScalingGroups, LambdaFunctions, LoadBalancers, and TargetGroups. After changing this property and remodeling - device historical data for that components will be lost. By default, this property is enabled for a fresh install for AWS ZP 5.1.0 and disabled for an upgrade to AWS ZP 5.1.0.
This property uses AWS ARN as id
for AutoScalingGroups,
LambdaFunctions, LoadBalancers and TargetGroups to handle a case, when
some component was created on the AWS, deleted and recreated with the
same name. On the AWS side - these are two different components. On
Zenoss's side, this was the same component for AutoScalingGroups,
LambdaFunctions, LoadBalancers and TargetGroups. Setting
zAWSUseNewIds
to true and remodeling will fix this issue.
Configuring retries delay for Lambda Functions, ECS Services and Load Balancers
Sometimes throttling error Rate exceeded
may happen during
modeling Lambda Functions, ECS Services and Load Balancers while getting
tags info and ECS List of Services. Properties zAWSLambdaFunctionsRetriesDelay
,
zAWSECSRetriesDelay
and zAWSLoadBalancersRetriesDelay
set base delay in seconds for retries.
Also, zAWSLoadBalancersMaxARNsCount
and
zAWSECSListServicesMaxResults
were added for
API calls count configuring. These zProperties should be used to reduce
or increase API call count to prevent throttling and timeout
exceptions.
Configuring exponential backoff and retries delay for SQS Queues
Sometimes
connecting cancelled error ConnectingCancelledError
may happen during
collection of SQS Queues messages. zProperty zAWSConnectingCancelledRetriesDelay
set base delay in seconds for retries. These zProperty should be used to
prevent connecting cancelled errors and timeout exceptions for SQS
Queues.
Additional SQS Queues discovery
If you have more than 1000 SQS Queues in one region, only 1000 will be
discovered. To handle this case you may use the zAWSSQSQueuesPrefix
property. Add prefixes to specify
SQS Queues you want to model additionally. Only strings should be used,
regexes are not supported. SQS Queues that are collected using
zAWSSQSQueuesPrefix
are also limited
to 1000 Queues per prefix.
Filter EC2 Instances by name
You may filter
EC2 instances by name with zAWSEC2Allowed
and zAWSEC2Denied
properties.
They contain a list of regexes to allow/deny EC2 instances modeling. At
first we apply zAWSEC2Allowed
propery, if it
is empty - we modell all EC2 instances. Then we skip instances which
match zAWSEC2Denied
regexes.
Overwrite default AWS events transformation
Default event mapping is the following:
if getattr(evt, 'eventKey', None) in ('|StatusCheckFailed_Instance_StatusCheckFailed_Instance|StatusCheckFailed_Instance', '|StatusCheckFailed_System_StatusCheckFailed_System|StatusCheckFailed_System'):
evt.eventClass = '/Status'
if evt.severity == 0:
evt.summary = evt.message = 'Status Check is ok'
else:
evt.summary = evt.message = 'Status Check has failed'
If you want to overwrite the default event transformation you should do the following:
- Go to the `Events Console` tab;
- Then go to the `Events Classes` tab;
- Select the `AWS` events tab;
- Chose `Mapping Instances` from the drop-down if not already chosen;
- Double-click on `AWS Default Mapping`;
- Select `Transforms` tab.
After that, you can write and apply your own AWS event transformation for specific events.
Note: Do not forget to click the Save Transform button.
AWS default events transformation
CloudFormation Stacks and Resources discovery and monitoring on highly scalable environment
If you have a large amount of CloudFormation Stacks and Resources on your EC2 Account you may face "timeout" events for data sources responsible for data collection for these components. This cause no status and events collection for CloudFormation components. In order to avoid "timeout" events and collect all data correctly, you should increase the "cycletime" property for events, the data source for CF Stacks, and "CFResourceState" for CF Resources on appropriate monitoring templates.
Note: In case when you have around 2000 CloudFormation Stacks the appropriate cycletime for events and CFResourceState data sources is 1200 seconds.
Sometimes throttling error "Rate exceeded" may happen during modeling
and monitoring CloudFormation Stacks and
Resources. zProperty zAWSCloudFormationRetriesDelay
set base delay
in seconds for retries.
This zProperty should be used to prevent throttling errors and timeout
exceptions for CloudFormation Stacks and Resources.
Note: Starting from the 5.2.0 version the attributes Stack
Template
and Stack Policy
are no longer discovered during
device modeling. These components were moved to a separate
CFStackDetails
data source under CF Stack components and will be
discovered after 12 hours from the initial device modeling. The
CloudFormation components are collecting with a Zenoss' normal
remodeling interval which defaults to every 12 hours.
Installed Items
Installing this ZenPack will add the following items to your Zenoss system.
Device Classes
- /AWS
- /AWS/EC2
Configuration Properties
- zAWSDiscover
- zAWSRegionPEM
- zAWSRemodelEnabled
- zAWSAutoChangeProdState
- zAWSAutoChangeProdStateRunning
- zAWSAutoChangeProdStateStopped
- zAWSGuestCollectorTroubleshooting
- zAWSResetGuestCollector
- zAWSGuestUsePublicIPs
- zAWSRegionToModel
- zAWSCloudWatchSSL
- zAWSCloudWatchMaxParallel
- zAWSCloudWatchMaxRetries
- zAWSBillingCostThreshold
- zAWSCloudFormationEventsAutoClear
- zAWSCloudFormationRetriesDelay
- zAWSEnableSnapshotCollection
- zAWSEnabledPlugins
- zAWSGuestDeviceTitleTag
- zAWSGuestDeviceClassTags
- zAWSBillingAccessKey
- zAWSBillingSecretKey
- zAWSBillingReportS3Bucket
- zAWSBillingReportPrefix
- zAWSBillingReportName
- zAWSBillingAthenaResultsS3Bucket
- zAWSBillingAthenaRegion
- zAWSCloudWatchCollectionInterval
- zAWSSQSQueues
- zAWSCloseGuestEvents
- zAWSLambdaFunctionsRetriesDelay
- zAWSMaxCallRetries
- zAWSECSEventBuses
- zAWSUseNewIds
- zAWSSQSQueuesPrefix
- zAWSGuestDeviceGroupingTags
- zAWSEC2Allowed
- zAWSEC2Denied
- zAWSLoadBalancersRetriesDelay
- zAWSECSRetriesDelay
- zAWSECSListServicesMaxResults
- zAWSLoadBalancersMaxARNsCount
- zAWSConnectingCancelledRetriesDelay
Modeler Plugins
- aws.Base
Datasource Types
- Amazon CloudWatch
- AWSDataSource
- Tag Filter Billing Report
- ECSWatchDataSource
Monitoring Templates
- EstimatedCharges (in /AWS/EC2)
- EC2Region (in /AWS/EC2)
- EC2Instance (in /AWS/EC2)
- EC2Instance-Detailed (in /AWS/EC2)
- EC2Volume (in /AWS/EC2)
- EC2Volume-IOPS (in /AWS/EC2)
- EC2Image (in /AWS/EC2)
- EC2VPC (in /AWS/EC2)
- EC2VPCSubnet (in /AWS/EC2)
- EC2Snapshot (in /AWS/EC2)
- EC2Zone (in /AWS/EC2)
- S3Bucket (in /AWS/EC2)
- SQSQueue (in /AWS/EC2)
- EC2ReservedInstance (in /AWS/EC2)
- VPNGateway (in /AWS/EC2)
- RDSInstance (in /AWS/EC2)
- CFStack (in /AWS/EC2)
- TagFilter (in /AWS/EC2)
- AutoScalingGroup (in /AWS/EC2)
- ApplicationLoadBalancer (in /AWS/EC2)
- NetworkLoadBalancer (in /AWS/EC2)
- LambdaFunction (in /AWS/EC2)
- ApplicationTargetGroup (in /AWS/EC2)
- NetworkTargetGroup (in /AWS/EC2)
- TargetGroup (in /AWS/EC2)
- ECSCluster (in /AWS/EC2)
- ECSContainerInstance (in /AWS/EC2)
- ECSService (in /AWS/EC2)
- ECSTask (in /AWS/EC2)
- ECSContainer (in /AWS/EC2)
- ElastiCacheCluster (in /AWS/EC2)
- ElastiCacheNode (in /AWS/EC2)
Device Types
- EC2Account (in /AWS/EC2)
Component Types
- EC2Region (on EC2Account)
- EC2VPC (on EC2Region)
- EC2VPCSubnet (on EC2Region)
- EC2Zone (on EC2Region)
- EC2Instance (on EC2Region)
- EC2Volume (on EC2Region)
- EC2Image (on EC2Region)
- EC2Snapshot (on EC2Region)
- SQSQueue (on EC2Region)
- VPNGateway (on EC2Region)
- EC2ReservedInstance (on EC2Region)
- S3Bucket (on EC2Account)
- Elastic IP (on EC2Region)
- RDSInstance (on EC2Region)
- RDSSecurityGroup (on EC2Region)
- CFStack (on EC2Region)
- CFResource (on CFStack)
- AutoScalingGroup (on EC2Region)
- LoadBalancer (on EC2Region)
- TargetGroup (on EC2Region)
- LambdaFunction (on EC2Region)
- ECSCluster (on EC2Region)
- ECSContainerInstance (on ECSCluster)
- ECSService (on ECSCluster)
- ECSTask (on ECSService)
- ECSContainer (on ECSTask)
- ECSScheduleRule (on ECSCluster)
- ECSScheduledTask (on ECSScheduleRule)
- ElastiCacheCluster (on EC2Region)
- ElastiCacheNode (on ElastiCacheCluster)
Event Classes
- /AWS/SQSMessage
- /AWS/Suggestion
- /AWS/CloudFormation
- /AWS/ElastiCache
Event Class Mapping
- AWS Default Mapping (on /AWS)
Reports
- /AWS Reports/Monitoring Costs
Required Daemons
Type | Name |
---|---|
Modeler | zenmodeler |
zenpython | |
Performance Collector | zenpython |
IAM Permissions
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "autoscaling:DescribeAutoScalingGroups", "cloudformation:Describe*", "cloudformation:GetStackPolicy", "cloudformation:GetTemplate", "cloudformation:ListStackResources", "cloudwatch:GetMetricStatistics", "cloudwatch:ListMetrics", "ec2:Describe*", "ecs:Describe*", "ecs:List*", "events:ListRules", "events:ListTargetsByRule", "elasticloadbalancing:Describe*", "lambda:ListAliases", "lambda:ListFunctions", "lambda:ListTags", "rds:DescribeDBInstances", "rds:DescribeDBSecurityGroups", "rds:ListTagsForResource", "s3:GetBucketLocation", "s3:ListAllMyBuckets", "ses:SendEmail", "sqs:DeleteMessageBatch", "sqs:GetQueueAttributes", "sqs:GetQueueUrl", "sqs:ListQueues", "sqs:ReceiveMessage", "sts:GetCallerIdentity", "elasticache:DescribeReplicationGroups", "elasticache:DescribeCacheClusters", "elasticache:DescribeEvents" ], "Resource": "*" } ] }
Troubleshooting
If modeling is not working and the following message is present in the modeling log:
No suitable AWS Modeler Server instance found. Please check in zenpython log if it's running.
Check if monitoring templateEstimatedCharges
is available on device
level with AWSModelRunPlugin
and AWSQueueProcessingPlugin
datasources inside it. If not, add them from Advanced - Monitoring
Templates - EstimatedCharges. Then restart zenpython and look into
zenpython.log for the message - AWS Modeler Server is up.
It should
appear before regular collecting process starts.
During monitoring AWS Account such error events might be created:
During processing ... datapoints in ... an error occured: SignatureDoesNotMatch: The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. An error occurred (SignatureDoesNotMatch) when calling the GetQueueUrl operation: The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details. An error occurred (AuthFailure) when calling the DescribeReservedInstances operation: AWS was not able to validate the provided access credentials.
These errors mean that Zenoss could not connect to AWS API due to the wrong access token. It might be caused by:
-
Wrong AWS credentials. Please check EC2 Access and Secret Keys.
-
Wrong time on collector host. Please adjust the system clock on collector hosts. Consider using NTP daemon to automatically adjust the host's clock.
The next error occurs during gathering billing data for tags and means
that zAWSBillingAccessKey
and zAWSBillingSecretKey
need to be
checked:
Could not fetch billing data. Check your zAWSBilling* properties: An error occurred (SignatureDoesNotMatch) when calling the GetObject operation: The request signature we calculated does not match the signature you provided. Check your key and signing method.
Upgrade
The AWS Zenpack of versions 2.0.0 / 2.1.0 can be upgraded. To upgrade the ZenPack, install the latest version over the existing one. There is no action for the user to migrate the data. The performance data and events of the old ZenPack are retained as per the retention policy settings.
During the upgrade from version 2.x to 3.0.0 and above all performance data for S3 Buckets will be lost.
When upgrading from 3.x to 4.x, tags are structured differently. Devices must be remodeled to handle tags properly.
In the case of using a local copy of EstimatedCharges
template on the
AWS device, after upgrading to 5.x and above, the bound local template
needs to be synced with ZenPack's one manually. There are new
datasources AWSModelRunPlugin
and AWSQueueProcessingPlugin
need to
be copied to the local template.
In the 5.0.0 version modeling CloudFormation components was enabled by
default. This was fixed in version 5.0.1. If the new device was added in
version 5.0.0, recheck zAWSEnabledPlugins
for enabling/disabling
CloudFormation modeling.
Limitations
After upgrading to 4.1.1 performance data for LoadBalancers and Target Groups components, (which were introduced in 4.1.0) will be lost.
Starting from version 4.1.0 AWS ZenPack is no longer supported on Zenoss 4.x.
When upgrading from versions 4.0.2, you may see errors regarding
default_value
. Those errors will be cleared automatically within the
next monitoring cycles. If those events are not cleared, close them
manually.
In the current version of Zenpack monitoring of large AWS accounts (e.g. >1000 EC2 instances and volumes) may cause performance issues:
- The limit for datapoints processed by
zenpython
daemon may be exceeded. This will result in gaps in graphs. - The monitoring cycle may not fit into default value of 5 minutes. This will result in some points on graphs to be not aligned by 5 minutes interval.
- Having more than one AWS account added to Zenoss may lead to the issues described above.
zenpython
may consume more memory than allocated in Control Center for its service. That will require increasing RAM Requested parameter on Control Center UI.
It is possible to reduce the number of datapoints collected by disabling monitoring templates you don't need.
Known Issues
ZPS-1533
- "TypeError" flare may be shown when attempting to add a device after upgrading from an older version of the ZenPack on Zenoss 5.x
- If this error is encountered, restarting the zproxy service (by restarting) top-level "Zenoss.resmgr" application in Control Center. It is not necessary to restart the child services.
ZPS-7015
- if incremental modeling for ECS
is turn on, there may be warning in
zenpython
log that ECSWatchDataSourcePlugin is blocked for some amount of time in collect. This doesn't affect incremental modelling, unless the blocking time is longer than 30 seconds. - if the ECSWatch datasource is
blocked, it will remain permanently blocked until its name is
removed from
/var/zenoss/zenpython.blocked
on Zenoss Cloud and Zenoss 6. The zenpython service must be restarted after manual modifications to this file.
ZPS-7541
- Spaces are not trimmed in
zAWSGuestDeviceClassTags
andzAWSGuestDeviceGroupingTags
properties.
Changes
5.2.1
- Make AWS ZenPack compatible with Ubuntu-based CZ 7.2 (ZPS-8852)
- Tested with Zenoss Resource Manager 6.7.0, Zenoss Cloud and Service Impact 5.6.0
5.2.0
- Add support for Amazon ElastiCache Clusters and Nodes (ZPS-8295)
- Add the possibility to control which EC2 Instances to model based on naming rule/filter (ZPS-7487)
- Made ECS Service discovering scalable for large EC2 Accounts (ZPS-7852)
- Made SQS Queues events collection scalable for large EC2 Accounts (ZPS-8357)
- Made CloudWatch Namespace collection on UI scalable for large EC2 Account (ZPS-8055)
- Made CloudFormation Stacks and Resources discovering scalable for large EC2 Accounts (ZPS-8440)
- Fixed selected region modeling for Lambda Functions (ZPS-8170)
- Tested with Zenoss Resource Manager 6.7.0, Zenoss Cloud, and Service Impact 5.5.5
5.1.1
- Define Group & System membership by Tag in Guest Device Discovery (ZPS-7228)
- Tested with Zenoss Resource Manager 6.5.0, Zenoss Cloud and Service Impact 5.5.3
5.1.0
- Add support (with optional incremental modeling) for AWS ECS
- Make possible modeling more than 1000 of SQS Queues per one region (ZPS-7094)
- Support complex AWS tags in zAWSDiscover (ZPS-7033)
- Add new datapoints to AWS ApplicationLoadBalancer Monitoring Template (ZPS-6964)
- Add zAWSUseNewIds property to switch to new IDs for some components (ZPS-6911)
- Fix AWS Zenpack Lambda Rate Exceeded Error (ZPS-6856)
- Add new AWS Target Group Monitoring Templates (ZPS-7219)
- Add new datapoint for billing predicted charges (ZPS-7002)
- Tested with Zenoss Resource Manager 6.4.1, Zenoss Resource Manager 6.5.0, Zenoss Cloud and Service Impact 5.5.2
5.0.1
- Fix Lambda functions modeling, add retries to handle Rate Exceeded Error (ZPS-6639)
- Fix Load Balancers discovery (ZPS-6641)
- Add possibility to close guest device events while deleting guest device (ZPS-6643)
- Tested with Zenoss Resource Manager 6.4.1, Zenoss Cloud and Service Impact 5.5.1
5.0.0
- Move modeling from zenmodeler to zenpython (ZPS-6117)
- Support AWS Serverless technologies (ZPS-5717)
- Fix endpoints generating heavy invalidations load (ZPS-5880)
- Fix cloudwatch CreateDatasource "NameSpaces" field doesn't load (ZPS-5914)
- Add eu-north-1 region modeling (ZPS-5818)
- Fix tagsCatalog indexing (ZPS-5998)
- Tested with Zenoss Resource Manager 6.4.1, Zenoss Cloud and Service Impact 5.5.1
4.1.1
- Added aws.AutoScalingGroup, aws.LoadBalancers modeler plugins (ZPS-5088)
- Fix conflict in Dynamic View groups (ZPS-5187)
- Fix linking of ELBs and target Groups and case when these entities have same names on Amazon side (ZPS-5251)
- Make CloudWatch datasource helper to be created on-demand (ZPS-5562)
- Remove a restriction of 20 ELBs modeled in one account (ZPS-5117)
- Added links to Azure and Kubernetes instances (ZPS-5080)
- Tested with Zenoss Resource Manager 6.3.2, Zenoss Cloud and Service Impact 5.3.4
4.1.0
- Add possibility to use Amazon CloudWatch datasource on SQS Queues (ZPS-3955)
- Update zAWSAutoChangeProdState to not change the production state of a guest device during a maintenance window (ZPS-3489)
- Fix guest device productionState is not always updated with zAWSAutoChangeProdState enabled (ZPS-3477)
- Upgrade to botocore 1.7.84 / boto3 1.10.84
- Update component group when linked tag filter is changed (ZPS-4309)
- Re-design SQS-based event monitoring (ZPS-3061)
- Added zAWSGuestDeviceClassTags property to specify guest device classes mapped from EC2 instance tags (ZPS-5005)
- Auto Scaling Groups modeling and monitoring (ZPS-4474)
- Added "Treat missing data as" field to Amazon CloudWatch datasource to set the default value for metric when no data returned from AWS
- Application and Network Load Balancers modeling and monitoring (ZPS-38)
- Add link from EC2 Instance if it is part of Kubernetes cluster to Kubernetes node. Requires Kubernetes ZP version greater then 1.0.1 (ZPS-4777)
- Tested with Zenoss Resource Manager 5.3.3, Zenoss Resource Manager 6.3.1, Zenoss Cloud and Service Impact 5.3.4
4.0.2
- Handle CloudFormation templates where a stack output has no description (ZPS-3181)
- Upgrade to botocore 1.8.41 / boto3 1.5.27
- Fix type of the ec2secretkey property (ZEN-29852)
- AWS Prediction charges time is 'undefined' on overview page (ZEN-30367)
- Styling updates and fixes for Zenoss Cloud
- Change event class for Billing Cost threshold (ZPS-3838)
- Handle Cloud Formation Stack Outputs witout a 'Description' field (ZPS-3181)
- Updated documentation with IAM permissions required to model SQS successfully (ZPS-3268)
- Tested with Zenoss Cloud, Zenoss Resource Manager 6.2.0, 5.3.3 and Service Impact 5.3.1
4.0.1
- Avoid duplicate events created from SQS messages by querying based on timestamp. (ZPS-2364)
- Update botocore endpoint list to reflect new regions and AWS services (ZPS-3037)
- Tested with Zenoss Resource Manager 4.2.5 RPS 743, Zenoss Resource Manager 5.3.3, Zenoss Resource Manager 6.1.0 and Service Impact 5.2.3.
4.0.0
- Allow filtering of components by AWS tags.
- Optionally populate Component Groups based on tag filters
- Monitoring billing information aggregated by tags
- Added zAWSCloudWatchCollectionInterval (default to 300) to simplify configuration of default collection interval for all Amazon CloudWatch datasources
- Fixed incorrectly scaled percentage values in the Volume 'Time' graph. (ZPS-2247)
- Improve the managing of guest device production states when zAWSAutoChangeProdState is enabled. When an instance is restarted, restore its previous production state. (ZPS-1865)
- Add support for CloudFormation YAML templates (ZPS-2201)
- Converted to use ZPL and updated to Boto v3
- SSL error fixed (ZPS-1739)
- Added report "Monitoring Costs" to check estimated charges for AWS devices monitoring
- Tested with Zenoss Resource Manager 5.3.2, Zenoss Resource Manager 4.2.5 RPS 743 and Service Impact 5.1.7
3.0.3
- Specify dmd to use for device facade in unit test (ZEN-28777)
- Internal-only release. No changes to production code, only unit tests
3.0.2
- Fixed crochet requirement for unit tests, to allow platform build tests to run (ZEN-28777)
- Internal-only release. No changes to production code, only unit tests
3.0.1
- Fixed SSL error in S3 modeling when using proxy
- Added zAWSEnableSnapshotCollection (default to false) to allow disabling collection of Snapshots, in order to improve modeling performance
- Added gear menu option and job to find missing guest devices
- Added zAWSGuestDeviceTitleTag (default to empty) to allow guest device titles to be populated based on AWS tag from instance
- Moved guest device deletion to scheduled job to improve modeling performance and reduce database conflicts due to long transactions
3.0.0
- CloudFormation and RDS support
- Estimated charges monitoring
- Add support for GovCloud (us-gov-west-1) region
- Migrate S3 Bucket monitoring to use AWS CloudWatch
2.4.6
- Fix broken AWS monitoring when a proxy is being used (ZPS-1260)
2.4.5
- Update boto version shipped with the ZenPack to support new "eu-west-2" region.
- Updated AmazonCloudWatchDataSource to use txboto.
- Usage of AmazonCloudWatchDataSource on device level is now allowed.
2.4.4
- Update boto version shipped with the ZenPack to support new "us-east-2" region.
2.4.3
- Fix Region and S3 Buckets graphs inconsistencies (ZEN-17242)
- Fix ZenPack failing on model [New Region in Mumbai] (ZEN-23892)
- AWS ZenPack is able to collect and consume data from demo environment (ZEN-24089)
- Proper handling for ConnectionLost, TimeoutError and other exceptions (ZEN-23901)
- Fix EC2RegionPlugin's traceback events (ZEN-23174)
- Fix S3 bucket lookup / get_bucket broken for eu-central-1 (ZEN-23044)
- Fix S3BucketPlugin's traceback events when S3 bucket's region is EU (ZEN-23236)
- Account ID field is added to 'Add EC2 Account' dialog (ZEN-21880)
- Add
zAWSAutoChangeProdState
property to have more control over EC2 Instance's production state (ZEN-23427)
2.4.2
- Fix intermittent graph gaps (ZEN-22337)
2.4.1
- Fix errors encountered during monitoring of Reserved Instances (ZEN-22379)
2.4.0
- Update boto version shipped with the ZenPack to support new "ap-northeast-2" region.
- Improve HTTP errors and warnings.
- Added
zAWSCloudWatchMaxParallel
property to configure number of concurrent cloudwatch calls. - Make the number of retries for cloudwatch calls configurable
(
zAWSCloudWatchMaxRetries
property). - Allow modeler to set it Region explicitlty, and ignore unmodeled buckets.
- Added path reporter for EC2Snapshots
2.3.1
- Ignore reserved instances with a null id. (ZEN-17556).
- Added
zAWSRegionToModel
property to tell RM what to model (ZEN-17374) - Improved
zAWSRemodelEnabled
andzAWSResetGuestCollector
properties
2.3.0
- Add ability for instances into VPC to use public IP address for guest device
- Add parallel processing for CloudWatch datasources using multithreading. For large AWS installation it can be boosted by setting bigger value for "twistedthreadpoolsize" setting of PythonCollector.
2.2.2
- Add support for Zenoss 5x.
- Add ability for user to specify an alternate remote collector for discovered devices.
- Update boto version shipped with the ZenPack to support signature v4.
2.2.1
- Add support for SQS Messages, S3 Buckets, Reserved Instances, Elastic IPs, Images, VPN Gateways, Snapshots.
- Discover instances via Layer 3 when specific Tags are present on the instance.
- Add ability for user to upload PEM file to region for use in auto-discovering instance guest operating systems.
- Add ability for user to enable reflecting new instances on Zenoss UI during monitoring.
- Support multiple IP addresses per instance and add instance type details.
- Monitor AWS Soft Limits and VPC Subnet available IP address count.
- Update component statuses during monitoring.
2.1.0
- Support CloudWatch metrics with multiple indexes.
- Add "Amazon Email Host" notification type for SES notifications.
2.0.0
- Add support for regions, zones, VPCs, subnets and volumes.
- Add support for custom CloudWatch metrics.
- Complete rewrite.