Alerts v10

PEM continually monitors registered servers. It compares performance metrics against predefined and user-specified thresholds that specify good or acceptable performance for each statistic. Any deviation from an acceptable threshold value triggers an alert. An alert is a system-defined or user-defined set of conditions that PEM compares to the system statistics. Alerts tell you about conditions on registered servers that require your attention.

Viewing the alerts via Global dashboard

When your system statistics deviate from the boundaries specified for that statistic, the alert triggers. The alert displays a high (red), low (yellow), or medium (orange) severity warning in the left-most column of the Alert Status table on the Global Overview dashboard.

The Alert Status table

The PEM server includes a number of predefined alerts that are actively monitoring your servers. The alert definition might make details available about the cause of the alert. Select the down arrow to the right of the severity warning to open a dialog box that has details about the condition that triggered the alert.

Alert details

PEM also provides an interface that lets you create customized alerts. Each alert uses metrics defined on an alert template. An alert template defines how the server evaluates the statistics for a resource or metric. The PEM server includes predefined alert templates, and you can create custom alert templates.

Viewing the alerts via Alerts dashboard

Use the Dashboards menu (on the Monitoring tab) to open the Alerts dashboard. The Alerts dashboard shows a summary of the active alerts and the status of each alert.

The Alerts Dashboard

The Alerts dashboard header shows the date and time that the dashboard was last updated and the number of current alerts.

The Alerts Overview section shows a visual representation of the active alerts and a count of the current high, low, and medium alerts. The vertical bar on the left of the graph provides the count of the alerts displayed in each column. Hover over a bar to display the alert count for the selected alert severity in the upper-right corner of the graph.

The Alert Details table provides a list of the alerts that are currently triggered. The entries appear in order from high severity to low severity. Each entry includes information that lets you identify the alert and recognize the condition that triggered the alert. Select an alert to review detailed information about the alert definition.

The Alert Errors table shows configuration-related errors, such as accidentally disabling a required probe or improperly configuring an alert parameter. You can use the information provided in the Error Message column to identify and resolve the conflict that's causing the error.

Customizing the Alerts dashboard

You can customize tables and charts that appear on the Alerts dashboard. To customize a table or chart, select Settings in the upper-right corner.

Use fields on the Personalize Chart Configuration dialog box to provide your display preferences:

  • Use the Auto Refresh field to specify the number of seconds between updates of the data displayed in the table or chart.
  • Use the Download as field to indicate whether to download a chart as a JPEG image or as a PNG image.
  • Use Colours selectors to specify the colors to use on a chart.
  • Set the Show Acknowledged Alerts switch to Yes if you want the table to display alerts that you acknowledged with a check box in the Ack'ed column. Set it to No to hide any acknowledged alerts. Acknowledged alerts are purged from the table content only when the time specified in the alert definition passes.

To save your customizations, select Save (a checkmark) in the upper-right corner. To delete any previous changes and revert to the default values, select Delete. Use the Save and Delete menus to specify whether to apply your preferences to all dashboards or to a selected server or database.

Managing alerts

Use the PEM client's Manage Alerts tab to define, copy, or manage alerts. To open the Manage Alerts tab, select Management > Manage Alerts.

The Manage Alerts tab

Use the Quick Links toolbar to open dialog boxes and tabs for managing alerts:

  • Select Copy Alerts to open the Copy Alert Configuration dialog box and copy an alert definition.
  • Select Alert Templates to open the Alert Template tab and modify or create an alert template.
  • Select Email Templates to open the Email Template dialog box and modify the default email template to customize an email notification.
  • Select Email Groups to open the Email Groups tab and modify or create an email group.
  • Select Webhooks to open the Webhooks tab and create or manage the webhooks endpoints.
  • Select Server Configurations to open the Server Configuration dialog box and review or modify server configuration settings.
  • Select Help to open the PEM online help in a new tab.

Use the table in the Alerts section of the Manage Alerts tab to create new alerts or manage existing alerts.

Alert templates

An alert template is a prototype that defines the properties of an alert. An alert instructs the server to compare the current state of the monitored object to a threshold specified in the alert template to determine if a situation requires administrative attention.

You can use the Alert Templates tab to define a custom alert template or view the definitions of existing alert templates. To open the Alert Templates tab, select Management > Manage Alerts. From the Manage Alerts tab, on the Quick Links toolbar, select Alert Templates.

Use the Show System Template list to filter the alert templates that are displayed in the Alert Templates table. From the list, select a level of the PEM hierarchy to view all of the templates for that level.

Defining a new alert template

To define a new alert template, from the Show System Template list, select None. Then click the plus sign (+) in the upper-right corner of the alert template table. The alert template editor opens.

Use fields on the General tab to specify general information about the template:

  • Use the Template name field to specify a name for the new alert template.

  • Use the Description field to provide a description of the alert template.

  • Use the Target type list to select the type of object that is the focus of the alert.

  • Use the Applies to server list to specify the server type (EDB Postgres Advanced Server or PostgreSQL) to which to apply the alert. You can specify a single server type or ALL.

  • Use the History retention field to specify the number of days to store the result of the alert execution on the PEM server.

  • Use the Threshold unit field to specify the unit type of the threshold value.

  • Use fields in the Auto create box to specify for PEM to use the template to generate an automatic alert. If you enable this option, PEM creates an alert when a new server or agent, as specified by the Target type list, is added and deletes that alert when the target object is dropped.

    • Move the Auto create? slider to Yes to specify for PEM to create alerts based on the template. If you modify an existing alert template by changing the Auto create? slider to Yes, PEM creates alerts on the existing agents and servers. If you change the slider from Yes to No, the default threshold values in existing alerts are erased, and you can't recover them.
    • Use the Operator list to select the operator for PEM to use when evaluating the current system values.

    Select a greater-than sign (>) to trigger the alert when the system values are greater than the values entered in the Threshold values fields.

    Select a less-than sign (<) to indicate to trigger the alert when the system values are less than the values entered in the Threshold values fields.

  • Use the threshold fields to specify the values for PEM to compare to the system values to determine whether to raise an alert. You must specify values for all three thresholds (Low, Medium, and High).

  • Use the Check frequency field to specify the default number of minutes between alert executions. This value specifies how often the server invokes the SQL code specified in the definition and compares the result to the threshold value specified in the template.

Use the fields on the Probe Dependency tab to specify the names of probes referred to in the SQL query specified on the SQL tab:

  • Use the Probes list to select from a list of the available probes.

    • To add the probe to the list of probes used by the alert template, select a probe name and select Add.
    • To remove a probe from the selected probes list, select the probe name and select Delete.
  • Use the Parameters tab to define the parameters to use in the SQL code specified on the SQL tab. Select the plus sign (+). Then:

    • Use the Name field to specify the parameter name.

    • Use the Data type list to specify the type of parameter.

    • Use the Unit field to specify the type of unit specified by the parameter.

  • Use the Code field on the SQL tab to provide the text of the SQL query for the server to invoke when executing the alert. The SQL query provides the result against which to compare the threshold value. If the alert result deviates from the specified threshold value, an alert is raised.

In the query, reference parameters defined on the Parameters tab sequentially by using the variable param_x. The x indicates the position of the parameter definition in the parameter list. For example, param_1 refers to the first parameter in the parameter list, param_2 refers to the second parameter in the parameter list, and so on.

The query can also include the following variables:

Variable descriptionVariable name
agent identifier'${agent_id}'
server identifier'${server_id}'
database name'${database_name}'
schema name'${schema_name}'
Table'${object_name}'
index'${object_name}'
sequence'${object_name}'
function name'${object_name}'
  • Use the Detailed Information SQL field to provide a SQL query to invoke if the alert is triggered. The result set of the query might be displayed as part of the detailed alert information on the Alerts dashboard or Global Overview dashboard.
Note

If the specified query depends on one or more probes from different levels in the PEM hierarchy (server, database, schema, and so on), and a probe becomes disabled, any resulting alerts are displayed as follows:

  • If the alert definition and the probe referenced by the query are from the same level in the PEM hierarchy, the server displays any alerts that reference the alert template on the Alert Error table of the Global Alert dashboard.
  • If the alert definition and the probe referenced by the query are from different levels of the PEM hierarchy, the server displays any triggered alerts that reference the alert template on the Alert Details table of the hierarchy on which the alert was defined.

To save the alert template definition and add the template name to the Alert Templates list, select Save. After saving a custom alert template, you can use the Alerting dialog box to define an alert based on the template.

Exporting or importing alert templates

To export the alert template:

  1. Select any alert template from the Alert Templates tab.
  2. Select Export in the upper-right corner of the table.
  3. Select Save File.
  4. To generate the JSON file, select OK.

To import the Alert Template:

  1. On the Alert Templates tab, select Import in the upper-right corner.

  2. To select the JSON file with the code import, select Browse, and then select Import.

  3. After selecting the file to import, you can select the following check boxes:

    • Skip existing Skip the alert template if it already exists.

    • Skip existing dependent probe The alert templates depend on probes. Select this check box to skip the dependent probe if it already exists.

    If both the check boxes are selected and the alert template already exists, then it skips importing the alert template.

    If you don't select the Skip existing check box, select Skip dependent probe, and the alert template already exists, then the alert template imports successfully.

    If both the check boxes are cleared and the alert template doesn't exist, then it successfully imports the alert template.

Modifying or deleting an alert template

To view the definition of an existing template (including PEM predefined alert templates), use the Show System Template list to select the type of object monitored. When you select the object type, the Alert Templates table displays the alert templates that correspond with that object type.

Select a template name in the list, and select Edit at the left end of the row to review the template definition.

Use the Alert Templates dialog box to view detailed information about the alert template:

  • The General tab displays general information.
  • The Probe Dependency tab lists the names of probes that provide data for the template.
  • The Parameters tab lists the names of any parameters referred to in the SQL code.
  • The SQL tab displays the SQL code that defines the behavior of the alert.

To delete an alert template, select the template name in the alert templates table and select Delete, located in the upper-right corner of the table. The alert history persists for the time specified in the History Retention field in the template definition.

Predefined alert templates – reference

An alert definition contains a system-defined or user-defined set of conditions that PEM compares to the system statistics. If the statistics deviate from the boundaries specified for that statistic, the alert triggers, and the PEM client displays a warning on the Alerts Overview page and optionally sends a notification to a monitoring user.

The tables that follow list the system-defined alert templates that you can use to create an alert. This list is subject to change and can vary by system.

Templates applicable on agent

Template nameDescriptionProbe dependency
Load Average (1 minute)1-minute system load averageload_average
Load Average (5 minutes)5-minute system load averageload_average
Load Average (15 minutes)15-minute system load averageload_average
Load Average per CPU Core (1 minutes)1-minute system load average per CPU coreload_average
Load Average per CPU Core (5 minutes)5-minute system load average per CPU coreload_average
Load Average per CPU Core (15 minutes)15-minute system load average per CPU coreload_average
CPU utilizationAverage CPU consumptioncpu_usage
Number of CPUs running higher than aNumber of CPUs running at greater than K% utilization thresholdcpu_usage
Free memory percentageFree memory as a percent of total system memorymemory_usage
Memory used percentagePercentage of memory usedmemory_usage
Swap consumptionSwap space consumed (in megabytes)memory_usage
Swap consumption percentagePercentage of swap area consumedmemory_usage
Disk ConsumptionDisk space consumed (in megabytes)disk_space
Disk consumption percentagePercentage of disk consumeddisk_space
Disk AvailableDisk space available (in megabytes)disk_space
Disk busy percentagePercentage of disk busydisk_busy_info
Most used disk percentagePercentage used of the most utilized disk on the systemdisk_space
Total table bloat on hostThe total space wasted by tables on a host, in MBtable_bloat, settings
Highest table bloat on hostThe most space wasted by a table on a host, in MBtable_bloat, settings
Average table bloat on hostThe average space wasted by tables on host, in MBtable_bloat, settings
Table size on hostThe size of tables on host, in MBtable_size
Database size on hostThe size of databases on host, in MBdatabase_size
Number of ERRORS in the logfile on agent N in last X hours.The number of ERRORS in the logfile on agent N in last X hoursN/A
Number of ERRORS in the audit logfile on agent N in last X hoursThe number of ERRORS in the audit logfile on agent N in last X hoursN/A
Number of WARNINGS in the logfile on agent N in last X hoursThe number of WARNINGS in the logfile on agent N in last X hoursN/A
Number of WARNINGS in the audit logfile on agent N in last X hoursThe number of WARNINGS in the audit logfile on agent N in last X hoursN/A
Number of WARNINGS or ERRORS in the logfile on agent N in last X hoursThe number of WARNINGS or ERRORS in the logfile on agent N in last X hoursN/A
Number of WARNINGS or ERRORS in audit the logfile on agent N in last X hoursThe number of WARNINGS or ERRORS in the logfile on agent N in last X hoursN/A
Package version mismatchCheck for package version mismatch as per catalogN/A
Total materialized view bloat on hostThe total space wasted by materialized views on a host, in MBmview_bloat, settings
Highest materialized view bloat on hostThe most space wasted by a materialized view on a host, in MBmview_bloat, settings
Average materialized view bloat on hostThe average space wasted by materialized views on host, in MBmview_bloat, settings
Materialized view size on hostThe size of materialized views on host, in MBmview_size
Agent DownSpecified agent is currently downN/A

Templates applicable on server

Template nameDescriptionProbe dependency
Total table bloat in serverThe total space wasted by tables in server, in MBtable_bloat, settings
Largest table (by multiple of unbloated size)Largest table in server, calculated as a multiple of its own estimated unbloated size; exclude tables smaller than N MBtable_bloat, settings
Highest table bloat in serverThe most space wasted by a table in server, in MBtable_bloat, settings
Average table bloat in serverThe average space wasted by tables in server, in MBtable_bloat, settings
Table size in serverThe size of tables in server, in MBtable_size
Database size in serverThe size of databases in server, in MBdatabase_size
Number of WAL filesTotal number of Write Ahead Log filesnumber_of_wal_files
Number of prepared transactionsNumber of transactions in prepared statenumber_of_prepared_transactions
Total connectionsTotal number of connections in the serversession_info
Total connections as percentage of max_connectionsTotal number of connections in the server as a percentage of maximum connections allowed on server, settingssession_info, settings
Unused, non-superuser connectionsNumber of unused, non-superuser connections on the server, user_info, settingssession_info, user_info, settings
Unused, non-superuser connections as percentage of max_connectionsNumber of unused, non-superuser connections on the server as a percentage of max_connections of max_connections, user_info, settingssession_info, user_info, settings
Ungranted locksNumber of ungranted locks in serverblocked_session_info
Percentage of buffers written by backendsThe percentage of buffers written by backends vs. the total buffers writtenbackground_writer_statistics
Percentage of buffers written by checkpointThe percentage of buffers written by the checkpoints vs. the total buffers writtenbackground_writer_statistics
Buffers written per secondNumber of buffers written per second, over the last two probe cyclesbackground_writer_statistics
Buffers allocated per secondNumber of buffers allocated per second, over the last two probe cyclesbackground_writer_statistics
Connections in idle stateNumber of connections in server that are in idle statesession_info
Connections in idle-in-transaction stateNumber of connections in server that are in idle-in-transaction statesession_info
Connections in idle-in-transaction state, as percentage of max_connectionsNumber of connections in server that are in idle-in-transaction state, as a percentage of maximum connections allowed on server, settingssession_info, settings
Long-running idle connectionsNumber of connections in the server that have been idle for more than N secondssession_info
Long-running idle connections and idle transactionsNumber of connections in the server that have been idle or transactions idle-in-transaction for more than N secondssession_info
Long-running idle transactionsNumber of connections in the server that have been idle in transaction for more than N secondssession_info
Long-running transactionsNumber of transactions in server that have been running for more than N secondssession_info
Long-running queriesNumber of queries in server that have been running for more than N secondssession_info
Long-running vacuumsNumber of vacuum operations in server that have been running for more than N secondssession_info
Long-running autovacuumsNumber of autovacuum operations in server that have been running for more than N secondssession_info
Committed transactions percentagePercentage of transactions in the server that committed vs. that rolled-back over last N minutesdatabase_statistics
Shared buffers hit percentagePercentage of block read requests in the server that were satisfied by shared buffers, over last N minutesdatabase_statistics
Tuples insertedTuples inserted into server over last N minutesdatabase_statistics
InfiniteCache buffers hit percentagePercentage of block read requests in the server that were satisfied by InfiniteCache, over last N minutesdatabase_statistics
Tuples fetchedTuples fetched from server over last N minutesdatabase_statistics
Tuples returnedTuples returned from server over last N minutesdatabase_statistics
Dead TuplesNumber of estimated dead tuples in servertable_statistics
Tuples updatedTuples updated in server over last N minutesdatabase_statistics
Tuples deletedTuples deleted from server over last N minutesdatabase_statistics
Tuples hot updatedTuples hot updated in server, over last N minutestable_statistics
Sequential ScansNumber of full table scans in server, over last N minutestable_statistics
Index ScansNumber of index scans in server, over last N minutestable_statistics
Hot update percentagePercentage of hot updates in the server over last N minutestable_statistics
Live TuplesNumber of estimated live tuples in servertable_statistics
Dead tuples percentagePercentage of estimated dead tuples in servertable_statistics
Last VacuumHours since last vacuum on the servertable_statistics
Last AutoVacuumHours since last autovacuum on the servertable_statistics
Last AnalyzeHours since last analyze on the servertable_statistics
Last AutoAnalyzeHours since last autoanalyze on the servertable_statistics
Percentage of buffers written by backends over the last N minutesThe percentage of buffers written by backends vs. the total buffers backends over last Nbackground_writer_statistics
Table CountTotal number of tables in serveroc_table
Function CountTotal number of functions in serveroc_function
Sequence CountTotal number of sequences in serveroc_sequence
Number of users expiring in N daysNumber of users whose accounts are expiring in N daysuser_info
Number of users whose password expiring in N daysNumber of users whose password have expired or are expiring in N daysuser_info
Index size as a percentage of table sizeSize of the indexes in server, as a percentage of their tables' sizeindex_size, oc_index, table_size
Largest index by table-size percentageLargest index in server, calculated as percentage of its table's size, oc_index, table_sizeindex_size, oc_index, table_size
Number of ERRORS in the logfile on server M in the last X hoursThe number of ERRORS in the logfile on server M in last X hoursN/A
Number of WARNINGS in the logfile on server M in the last X hoursThe number of WARNINGS in logfile on server M in the last X hoursN/A
Number of WARNINGS or ERRORS in the logfile on server M in the last X hoursThe number of WARNINGS or ERRORS in the logfile on server M in the last X hoursN/A
Number of attacks detected in the last N minutesThe number of SQL injection attacks occurred in the last N minutessql_protect
Number of attacks detected in the last N minutes by usernameThe number of SQL injection attacks occurred in the last N minutes by usernamesql_protect
Number of replica servers lag behind the primary by write locationStreaming Replication: number of replica servers lag behind the primary by write locationstreaming_replication
Number of replica servers lag behind the primary by flush locationStreaming Replication: number of replica servers lag behind the primary by flush locationstreaming_replication
Number of replica servers lag behind the primary by replay locationStreaming Replication: number of replica servers lag behind the primary by replay locationstreaming_replication
Replica server lag behind the primary by write locationStreaming Replication: replica server lag behind the primary by write location in MBstreaming_replication
Replica server lag behind the primary by flush locationStreaming Replication: replica server lag behind the primary by flush location in MBstreaming_replication
Replica server lag behind the primary by replay locationStreaming Replication: replica server lag behind the primary by replay location in MBstreaming_replication
Replica server lag behind the primary by size (MB)Streaming Replication: replica server lag behind the primary by size in MBstreaming_replication
Replica server lag behind the primary by WAL segmentsStreaming Replication: replica server lag behind the primary by WAL segmentsstreaming_replication
Replica server lag behind the primary by WAL pagesStreaming Replication: replica server lag behind the primary by WAL pagesstreaming_replication
Total materialized view bloat in serverThe total space wasted by materialized views in server, in MBmview_bloat, settings
Largest materialized view (by multiple of unbloated size)Largest materialized view in server, calculated as a multiple of its own estimated unbloated size; exclude materialized views smaller than N MBmview_bloat, settings
Highest materialized view bloat in serverThe most space wasted by a materialized view in server, in MBmview_bloat, settings
Average materialized view bloat in serverThe average space wasted by materialized views in server, in MBmview_bloat, settings
Materialized view size in serverThe size of materialized view in server, in MBmview_size
View CountTotal number of views in serveroc_views
Materialized View CountTotal number of materialized views in serveroc_views
Audit config mismatchCheck for audit config parameter mismatchaudit_configuration
Server DownSpecified server is currently inaccessibleN/A
Number of WAL archives pendingStreaming Replication: number of WAL files pending to be replayed at replicawal_archive_status
Number of minutes lag of replica server from primary serverStreaming Replication: number of minutes replica node is lagging behind the primary nodestreaming_replication_lag_time
Log config mismatchCheck for log config parameter mismatchlog_configuration
PGD Group Raft ConsensusPGD group Raft consensus not workingbdr_monitor_group_raft
PGD Group Raft Leader ID not matchingPGD group Raft leader ID not matchingbdr_group_raft_details
PGD Group versions checkPGD/pglogical version mismatched in PGD groupbdr_monitor_group_raft
PGD worker error detectedPGD worker error detected reported for PGD node
Transaction ID exhaustion (wraparound)Check for transaction ID exhaustion (wraparound)
Inactive replication slotsCheck for slots that are inactive for a particular server
Conflicting replication slotsCheck for slots that are conflicting for a particular server
Multixact ID exhaustion (wraparound)Check for multixact ID exhaustion (wraparound)

Templates applicable on database

Template nameDescriptionProbe dependency
Total table bloat in databaseThe total space wasted by tables in database, in MBtable_bloat, settings
Largest table (by multiple of unbloated size)Largest table in database, calculated as a multiple of its own estimated unbloated size; exclude tables smaller than N MBtable_bloat, settings
Highest table bloat in databaseThe most space wasted by a table in database, in MBtable_bloat, settings
Average table bloat in databaseThe average space wasted by tables in database, in MBtable_bloat, settings
Table size in databaseThe size of tables in database, in MBtable_size
Database sizeThe size of the database, in MBdatabase_size
Total connectionsTotal number of connections in the databasesession_info
Total connections as percentage of max_connectionsTotal number of connections in the database as a percentage of maximum connections allowed on server, settingssession_info, settings
Ungranted locksNumber of ungranted locks in databaseblocked_session_info
Connections in idle stateNumber of connections in database that are in idle statesession_info
Connections in idle-in-transaction stateNumber of connections in database that are in idle-in-transaction statesession_info
Connections in idle-in-transaction state,as percentage of max_connectionsNumber of connections in database that are in idle-in-transaction state, as a percentage of maximum connections allowed on server, settingssession_info, settings
Long-running idle connectionsNumber of connections in the database that have been idle for more than N secondssession_info
Long-running idle connections and idle transactionsNumber of connections in the database that have been idle or idle-in-transaction for more than N secondssession_info
Long-running idle transactionsNumber of connections in the database that have been idle in transaction for more than N secondssession_info
Long-running transactionsNumber of transactions in database that have been running for more than N secondssession_info
Long-running queriesNumber of queries in database that have been running for more than N secondssession_info
Long-running vacuumsNumber of vacuum operations in database that have been running for more than N secondssession_info
Long-running autovacuumsNumber of autovacuum operations in database that have been running for more than N secondssession_info
Committed transactions percentagePercentage of transactions in the database that committed vs. that rolled-back over last N minutesdatabase_statistics
Shared buffers hit percentagePercentage of block read requests in the database that were satisfied by shared buffers, over last N minutesdatabase_statistics
InfiniteCache buffers hit percentagePercentage of block read requests in the database that were satisfied by InfiniteCache, over last N minutesdatabase_statistics
Tuples fetchedTuples fetched from database over last N minutesdatabase_statistics
Tuples returnedTuples returned from database over last N minutesdatabase_statistics
Tuples insertedTuples inserted into database over last N minutesdatabase_statistics
Tuples updatedTuples updated in database over last N minutesdatabase_statistics
Tuples deletedTuples deleted from database over last N minutesdatabase_statistics
Tuples hot updatedTuples hot updated in database, over last N minutestable_statistics
Sequential ScansNumber of full table scans in database, over last N minutestable_statistics
Index ScansNumber of index scans in database, over last N minutestable_statistics
Hot update percentagePercentage of hot updates in the database over last N minutestable_statistics
Live TuplesNumber of estimated live tuples in databasetable_statistics
Dead TuplesNumber of estimated dead tuples in databasetable_statistics
Dead tuples percentagePercentage of estimated dead tuples in databasetable_statistics
Last VacuumHours since last vacuum on the databasetable_statistics
Last AutoVacuumHours since last autovacuum on the databasetable_statistics
Last AnalyzeHours since last analyze on the databasetable_statistics
Last AutoAnalyzeHours since last autoanalyze on the databasetable_statistics
Table CountTotal number of tables in databaseoc_table
Function CountTotal number of functions in databaseoc_function
Sequence CountTotal number of sequences in databaseoc_sequence
Index size as a percentage of table sizeSize of the indexes in database, as a percentage of their tables' sizetable_size
Largest index by table-size percentageLargest index in database, calculated as percentage of its table's size, oc_index, table_sizeindex_size, oc_index, table_size
Database Frozen XIDThe age (in transactions before the current transaction) of the database's frozen transaction IDdatabase_frozenxid
Number of attacks detected in the last N minutesThe number of SQL injection attacks occurred in the last N minutessql_protect
Number of attacks detected in the last N minutes by usernameThe number of SQL injection attacks occurred in the last N minutes by last N minutes by usernamesql_protect
Queries that have been cancelled due to dropped tablespacesStreaming Replication: number of queries that have been cancelled due to dropped tablespacesstreaming_replication_db_conflicts
Queries that have been cancelled due to lock timeoutsStreaming Replication: number of queries that have been cancelled due to lock timeoutsstreaming_replication_db_conflicts
Queries that have been cancelled due to old snapshotsStreaming Replication: number of queries that have been cancelled due to old snapshotsstreaming_replication_db_conflicts
Queries that have been cancelled due to pinned buffersStreaming Replication: number of queries that have been cancelled due to pinned buffersstreaming_replication_db_conflicts
Queries that have been cancelled due to deadlocksStreaming Replication: number of queries that have been cancelled due to deadlocksstreaming_replication_db_conflicts
Total events lagging in all slony clustersSlony Replication: total events lagging in all slony clustersslony_cluster
Events lagging in one slony clusterSlony Replication: events lagging in one slony clusterslony_cluster
Lag time (minutes) in one slony clusterSlony Replication: lag time (minutes) in one slony clusterslony_cluster
Total rows lagging in xdb single primary replicationxDB Replication: Total rows lagging in xdb single primary replicationxdb_smr_mmr_replication
Total rows lagging in xdb multi primary replicationxDB Replication: Total rows lagging in xdb multi primary replicationxdb_smr_mmr_replication
Total materialized view bloat in databaseThe total space wasted by materialized views in database, in MBmview_bloat, settings
Largest materialized view (by multiple of unbloated size)Largest materialized view in database, calculated as a multiple of its estimated unbloated size; exclude materialized views smaller than N MBmview_bloat, settings
Highest materialized view bloat in databaseThe most space wasted by a materialized view in database, in MBmview_bloat, settings
Average materialized view bloat in databaseThe average space wasted by materialized views in database, in MBmview_bloat, settings
Materialized view size in databaseThe size of materialized view in database, in MBmview_size
View CountTotal number of views in databaseoc_views
Materialized View CountTotal number of materialized views in databaseoc_views

Templates applicable on schema

Template nameDescriptionProbe dependency
Total table bloat in schemaThe total space wasted by tables in schema, in MBtable_bloat, settings
Largest table (by multiple of unbloated size)Largest table in schema, calculated as a multiple of its own estimated unbloated size; exclude tables smaller than N MBtable_bloat, settings
Highest table bloat in schemaThe most space wasted by a table in schema, in MBtable_bloat, settings
Average table bloat in schemaThe average space wasted by tables in schema, in MBtable_bloat, settings
Table size in schemaThe size of tables in schema, in MBtable_size
Tuples insertedTuples inserted in schema over last N minutestable_statistics
Tuples updatedTuples updated in schema over last N minutestable_statistics
Tuples deletedTuples deleted from schema over last N minutestable_statistics
Tuples hot updatedTuples hot updated in schema, over last N minutestable_statistics
Sequential ScansNumber of full table scans in schema, over last N minutestable_statistics
Index ScansNumber of index scans in schema, over last N minutestable_statistics
Hot update percentagePercentage of hot updates in the schema over last N minutestable_statistics
Live TuplesNumber of estimated live tuples in schematable_statistics
Dead TuplesNumber of estimated dead tuples in schematable_statistics
Dead tuples percentagePercentage of estimated dead tuples in schematable_statistics
Last VacuumHours since last vacuum on the schematable_statistics
Last AutoVacuumHours since last autovacuum on the schematable_statistics
Last AnalyzeHours since last analyze on the schematable_statistics
Last AutoAnalyzeHours since last autoanalyze on the schematable_statistics
Table CountTotal number of tables in schemaoc_table
Function CountTotal number of functions in schemaoc_function
Sequence CountTotal number of sequences in schemaoc_sequence
Index size as a percentage of table sizeSize of the indexes in schema, as a percentage of their table's sizetable_size
Largest index by table-size percentageLargest index in schema, calculated as percentage of its table's size, oc_index, table_sizeindex_size, oc_index, table_size
Materialized view bloatSpace wasted by the materialized view, in MBmview_bloat, settings
Total materialized view bloat in schemaThe total space wasted by materialized views in schema, in MBmview_bloat, settings
Materialized view size as a multiple of unbloated sizeSize of the materialized view as a multiple of estimated unbloated sizemview_bloat
Largest materialized view (by multiple of unbloated size)Largest materialized view in schema, calculated as a multiple of its own estimated unbloated size; exclude materialized view smaller than N MBmview_bloat, settings
Highest materialized view bloat in schemaThe most space wasted by a materialized view in schema, in MBmview_bloat, settings
Average materialized view bloat in schemaThe average space wasted by materialized views in schema, in MBmview_bloat, settings
Materialized view sizeThe size of materialized view, in MBmview_size
Materialized view size in schemaThe size of materialized views in schema, in MBmview_size
View CountTotal number of views in schemaoc_views
Materialized View CountTotal number of materialized views in schemaov_views
Materialized View Frozen XIDThe age (in transactions before the current transaction) of the materialized view's frozen transaction IDmview_frozenxid

Templates applicable on table

Template nameDescriptionProbe dependency
Table bloatSpace wasted by the table, in MBtable_bloat, settings
Table sizeThe size of table, in MBtable_size
Table size as a multiple of unbloated sizeSize of the table as a multiple of estimated unbloated sizetable_bloat
Tuples insertedTuples inserted in table over last N minutestable_statistics
Tuples updatedTuples updated in table over last N minutestable_statistics
Tuples deletedTuples deleted from table over last N minutestable_statistics
Tuples hot updatedTuples hot updated in table, over last N minutestable_statistics
Sequential ScansNumber of full table scans on table, over last N minutestable_statistics
Index ScansNumber of index scans on table, over last N minutestable_statistics
Hot update percentagePercentage of hot updates in the table over last N minutestable_statistics
Live TuplesNumber of estimated live tuples in tabletable_statistics
Dead TuplesNumber of estimated dead tuples in tabletable_statistics
Dead tuples percentagePercentage of estimated dead tuples in tabletable_statistics
Last VacuumHours since last vacuum on the tabletable_statistics
Last AutoVacuumHours since last autovacuum on the tabletable_statistics
Last AnalyzeHours since last analyze on the tabletable_statistics
Last AutoAnalyzeHours since last autoanalyze on the tabletable_statistics
Row CountEstimated number of rows in a tabletable_statistics
Index size as a percentage of table sizeSize of the indexes on table, as a percentage of table's sizetable_size
Table Frozen XIDThe age (in transactions before the current transaction) of the table's frozen transaction IDtable_frozenxid

Global templates

Template nameDescriptionProbe dependency
Agents DownNumber of agents that haven't reported in recentlyN/A
Servers DownNumber of servers that are currently inaccessibleN/A
Alert ErrorsNumber of alerts in an error stateN/A

Audit log alerting

PEM provides alert templates that let you use the Alerting dialog to create an alert that triggers when an ERROR or WARNING statement is written to a log file for a specific server or agent. To open the Alerting dialog, select the server or agent in the PEM client Object browser tree control, and select Management > Alerting.

To create an alert to notify you of error or warning messages in the log file for a specific server, create an alert that uses one of the following alert templates:

  • Number of ERRORS in the logfile on server M in last X hours

  • Number of WARNINGS in the logfile on server M in last X hours

  • Number of ERRORS or WARNINGS in the logfile on server M in last X hours

To create an alert to notify you of error or warning messages for a specific agent, create an alert that uses one of the following alert templates. This functionality is supported only on EDB Postgres Advanced Server.

  • Number of ERRORS in the logfile on agent M in last X hours

  • Number of WARNINGS in the logfile on agent M in last X hours

  • Number of ERRORS or WARNINGS in the logfile on agent M in last X hours

Defining a new alert

Use the PEM client Manage Alerts tab to define, copy, or manage alerts. To open the Manage Alerts tab, select Management > Manage Alerts.

The Manage Alerts tab displays a table of alerts that are defined on the object currently selected in the PEM client tree. You can use the Alerts table to modify an existing alert or to create a new alert.

The Manage Alerts tab

To open the alert editor and create an alert, select the plus sign (+) in the upper-right of the table. The editor opens.

Use the fields on the General tab to provide information about the alert:

  • Enter the name of the alert in the Name field.
  • Use the Template list to select a template for the alert. An alert template is a function that uses one or more metrics or parameters to generate a value to which PEM compares user-specified alert boundaries. If the value returned by the template function evaluates to a value that's within the boundary of a user-defined alert as specified by the Operator and Threshold values fields, PEM:
    • Raises an alert
    • Adds a notice to the Alerts overview display
    • Performs any actions specified on the template
  • Use the Enable? switch to specify if the alert is enabled (Yes) or disabled (No).
  • Use the Interval box to specify how often the alert confirms if the alert conditions are satisfied. Use the Minutes selector to specify an interval value. Use the Default switch to set or reset the Minutes value to the default (recommended) value for the selected template.
  • Use the History retention box to specify the number of days that PEM stores data collected by the alert. Use the Days selector to specify the number of days to store the data. Use the Default switch to set or reset the Days value to the default value (30 days).
  • Use controls in the Threshold values box to define the triggering criteria for the alert. When the value specified in the Threshold values fields evaluates to greater than or less than the system value (as specified with the Operator), PEM raises a Low, Medium or High alert level.
  • Use the Operator list to select the operator for PEM to use when evaluating the current system values:
    • Select a greater-than sign (>) to trigger the alert when the system values are greater than the values entered in the Threshold values fields.
    • Select a less-than sign (<) to trigger the alert when the system values are less than the values entered in the Threshold values fields.
  • Use the Threshold fields to specify the values for PEM to compare to the system values to determine whether to raise an alert. You must specify values for all three thresholds (Low, Medium, and High).

The Parameter Options table contains a list of parameters that are required by the selected template. The table displays both predefined parameters and parameters for which you must specify a value. You must specify a value for any parameter that displays a prompt in the Value column.

PEM can send a notification or execute a script if an alert is triggered or if an alert is cleared. Use the Notification tab to specify how PEM behaves if an alert is raised.

Use the Email notification box to specify the email group to receive an email notification if the alert is triggered at the specified level. Use the Email Groups tab to create an email group that contains the address of the users to notify when an alert is triggered. To access the Email Groups tab, select Email Groups located in the Quick Links menu of the Manage Alerts tab.

  • To instruct PEM to send an email when a specific alert level is reached, set the slider next to an alert level to Yes. Use the list to select the predefined user or group to notify.

You must configure the PEM server to use an SMTP server to deliver email before PEM can send email notifications.

Use the Webhook notification box to specify one or multiple endpoints if the alert is triggered at the specified level. Use the webhooks tab to create a webhook endpoint to receive the notifications when an alert is triggered. To access the Webhooks tab, select Webhooks located in the Quick Links menu of the Manage Alerts tab.

  • Set Enable? to Yes to send the alert notifictions to the webhook endpoint.
  • Set Override default configuration? to Yes to set the customized alert levels as per the requirement. Once it's set to Yes, all the alert levels are enabled to configure.
  • Use the list to select a predefined endpoint to send a notification to for Low alerts?, Medium alerts?, High alerts?, and Cleared alerts?.

Use the Trap notification options to configure trap notifications for this alert:

  • Set Send trap to Yes to send SNMP trap notifications when the state of this alert changes.
  • Set SNMP Ver to v1, v2, or v3 to identify the SNMP version.
  • Use the Low alert, Med alert, and High alert sliders to select the levels of alert to trigger the trap. For example, if you set the slider next to High alert to Yes, PEM sends a notification when an alert with a high-severity level is triggered.

You must configure the PEM server to send notifications to an SNMP trap/notification receiver before notifications can be sent. For sending SNMP v3 traps, pemAgent uses 'User Security Model(USM)', which is in charge of authenticating, encrypting, and decrypting SNMP packets.

While sending SNMP v3 traps, the agent creates the snmp_boot_counter file. This file is created in the location mentioned by the batch_script_dir parameter in agent.cfg. If this parameter isn't configured or if the directory isn't accessible due to authentication restrictions, then the file is created in the operating system temporary directory. If that's also not possible, then the file is created in your home directory.

Use the Nagios notification box to instruct the PEM server to notify Nagios network-alerting software when the alert is triggered or cleared. For more details, see Using PEM with Nagios

  • Set the Submit passive service check result to Nagios switch to Yes to notify Nagios when the alert is triggered or cleared.

  • Use the Script execution box to optionally define a script that executes if an alert is triggered and to specify details about the script execution.

  • Set the Execute script slider to Yes to instruct PEM to execute the provided script if an alert is triggered.

  • Set the Execute on alert cleared slider to Yes to instruct PEM to execute the provided script when the situation that triggered the alert is resolved.

  • Use the Execute script on options to indicate for the script to execute on the PEM server or the monitored server.

  • In the Code field, provide the script for PEM to execute. You can provide a batch/shell script. In the script, you can use placeholders for the following:

    %AlertName% The name of the triggered alert.

    %ObjectName% The name of the server or agent on which the alert was triggered.

    %ThresholdValue% The threshold value reached by the metric when the alert triggered.

    %CurrentValue% The current value of the metric that triggered the alert.

    %CurrentState% The current state of the alert.

    %OldState% The previous state of the alert.

    %AlertRaisedTime% The time that the alert was raised or the most recent time that the alert state was changed.

    To invoke a script on a Linux system, you must modify the entry for the batch_script_user parameter of the agent.cfg file and specify the user to use to run the script. You can either specify a non-root user or root for this parameter. If you don't specify a user or the specified user doesn't exist, then the script doesn't execute. Restart the agent after modifying the file.

    To invoke a script on a Windows system, set the registry entry for AllowBatchJobSteps to true and restart the PEM agent. PEM registry entries are located in HKEY_LOCAL_MACHINE\Software\EnterpriseDB\PEM\agent.

After you define the alert attributes, select Edit to close the alert definition editor and then Save in the upper-right corner of the Alerts table.

To discard your changes, select Refresh. A message prompts you to confirm that you want to discard the changes.

Note

Suppose you need to use the alert configuration placeholder values in an external script. You can do so either by passing them as the command-line arguments or exporting them as environment variables. The external script must have proper execution permissions.

  • You can run the script with any of the placeholders as command-line arguments.

    For example:

    #!/bin/bash
    
    bash <path_to_script>/script.sh "%AlertName%  %AlertLevel% %AlertDetails%"
  • You can define the environment variables for any of the placeholders and then use those environment variables in the script.

    For example:

    #!/bin/bash
    
    export AlertName=%AlertName%
    export AlertState=%AlertState%
    
    bash <path_to_script>/script.sh

Modifying an alert

Use the Alerts table to manage an existing alert or create a new alert. Select an object in the PEM client tree to view the alerts that monitor that object.

You can modify some properties of an alert in the Alerts table:

  • The Alert name column displays the name of the alert. To change the alert name, replace the name in the table and select Save.
  • The Alert template column displays the name of the alert template that specifies properties used by the alert. You can use the list to change the alert template associated with an alert.
  • Use the Alert enable? switch to specify if an alert is enabled (Yes) or disabled (No).
  • Use the Interval column to specify how often PEM checks whether the alert conditions are satisfied. Set the Default switch to No and specify an alternate value, in minutes. Or set the Default switch to Yes to reset the value to its default setting. By default, PEM checks the status of each alert once every minute.
  • Use the History retention field to specify the number of days that PEM stores data collected by the alert. Set the Default switch to No and specify an alternative value in days. Or set the Default switch to Yes to reset the value to its default setting. By default, PEM stores historical data for 30 days.

After modifying an alert, select Save (located in the upper-right corner of the table) to preserve your changes.

To modify other alert attributes, select Edit to the left of an alert name to open an editor. The editor provides access to the complete alert definition.

Use the Alert Details dialog box to modify the definition of the selected alert. After you modify the alert definition, select Save.

Deleting an alert

To mark an alert for deletion, select the alert name in the Alerts table. Then select Delete to the left of the name. The alert remains in the list in red strike-through font.

Delete is a toggle. You can undo the deletion by selecting it a second time. To permanenetly dete the alert defintion, select Save.

Copying an alert

To speed up the deployment of alerts in the PEM system, you can copy alert definitions from one object to one or more target objects.

To copy alerts from an object, select the object in the PEM client tree on the main PEM window. Then, select Management > Copy Alerts. On the Manage Alerts tab, from the Quick Links toolbar, select Copy Alerts.

The Copy Alert Configuration dialog box copies all alerts from the object selected in the PEM client tree to the objects selected on the dialog box. Expand the tree to select nodes to specify as the target objects. The tree displays a red warning indicator next to the source object.

To copy alerts to multiple objects at once, select a parent node of the targets. For example, to copy the alerts from one table to all tables in a schema, select the check box next to the schema. PEM copies alerts only to targets that are the same type as the source object.

Select Ignore duplicates to prevent PEM from updating any existing alerts on the target objects with the same name as those being copied.

Select Replace duplicates to replace existing alerts with alerts of the same name from the source object.

Select Delete Existing Alerts to delete all the alerts from the target object and copy all the alerts from the source object to the target object.

Select Configure Alerts to copy the alerts from the source object to all objects of the same type in or under those objects selected on the Copy Alert Configuration dialog box.

Schedule an alert blackout

You can use the Management > Schedule Alert Blackout to schedule an alert blackout for your Postgres servers and PEM agents during maintenance. Alerts aren't raised during a defined blackout period.

To schedule an alert blackout, select Management > Schedule Alert Blackout.

In the Schedule Alert Blackout dialog box, use the tabs to define the blackout period for servers and agents. On the Server tab, to add a row, select the plus sign (+) at the top-right corner.

Use the Server tab to provide information about an alert blackout period. After you save the blackout period, you can't edit it.

  • Use the Start time field to provide the date and time to start the alert blackout.
  • Use the Duration field to provide the interval for which you want to black out the alerts.
  • Use the Servers field to provide the server name for which you want to black out the alerts. You can also select multiple servers to black out the alerts for all of those servers.

After providing details, select Save. The alerts don't appear on the Alerts dashboard for the scheduled interval of that server.

You can also schedule a blackout period for PEM agents using the Agent tab on the dialog box. To add a row, on the Agent tab, select the plus sign (+) at the top-right corner.

Use the Agent tab to provide the information about an alert blackout period. After you save the blackout period, you can't edit it.

  • Use the Start time field to provide the date and time to start the alert blackout.
  • Use the Duration field to provide the interval for which you want to black out the alerts.
  • Use the Agents field to provide the agent name for which you want to black out the alerts. All server-level alerts for the servers bound to that agent black out.

After providing details, save the details by selecting Save. The alerts aren't displayed on the Alert dashboard for the scheduled interval for that PEM agent.

You can select Clone from the top-right corner of the dialog box to clone the scheduling of an alert blackout. To create the cloned copy of all the selected servers or agents, select the servers or agents you want to clone, and then select Clone. You can edit newly created schedules as needed, and then select Save.

Select Delete from the top-right corner of the dialog box to remove a scheduled alert blackout. Select the servers or agents and then select Delete.

Select a server for which you want to delete the scheduled alert blackout, and then select Delete. The server prompts for confirmation before deleting that row.

You can select Reset to reset the details on the Alert Blackout dialog box to the default settings. Saved blackouts aren't affected.

You can view the scheduled alert blackout details from the event_history table in the pem schema once the schedule is executed. For more information, see Monitoring event history.