Exadata Monitoring Configuration:
A Setup Guide for Our Customers
When technical assistance is required with your Exadata appliance, primary support is obtained by opening a ticket via the Natrinsic Helpdesk Ticketing System. (Please refer to our guide on How to Create a Ticket for more details.) Natrinsic also provides an event driven, 24X7 monitoring service which augments primary support. By following the instructions in this document you can create or modify events on your Exadata appliance(s). Once monitoring is enabled, and provided your support contract includes system monitoring, if specific events are detected a ticket will automatically be opened and investigated by the appropriate Natrinsic Support team.
If you would like to comment or ask questions about this guide, registered users may leave a comment below or email: support@natrinsic.com with the subject line “General Support Inquiry” Within the body of your email provide your company name, your position and the system identifier of at least one system for which you have a current support contract.
Prerequisites
There are certain prerequisites that must be met in order for the scripts being supplied to work. These prerequisite tests must be performed or the monitoring system will not work as designed.
There needs to be two files present for the groups associated with an Exadata. One for all compute and cell nodes and the other for the cell nodes alone. In this document they are referred to as follows:
all_group - the file with each “short” hostname for the compute and cell nodes
cell_group - The file with each “short” name for the cell nodes
These files may already be located in the /root or /opt/oracle.SupportTools currently. If not they must be created by the customer.
There must be the ability to send email (SMTP) from the compute node running the supplied dcli script.
Another prerequisite is that the root user must have user equivalence (that is to say passwordless ssh communication) configured for every compute and cell node.
This simple test (which needs to be performed on every compute and cell node) will show if the user equivalence is currently setup. This example shows two cell nodes being tested:
[root@compute1 ~]# ssh cell1 date
Tue Mar 22 03:21:48 PDT 2011
[root@compute1 ~]# ssh cell2 date
Tue Mar 22 03:21:53 PDT 2011
Or
[root@compute1 ~]# dcli -k -g cell_group
Error: Neither RSA nor DSA keys have been generated for current user.
Run 'ssh-keygen -t dsa' to generate an ssh key pair.
If the command, when executed, asks for a password, or you get the output from the dcli command specified above, then user equivalence is not setup and must be configured as follows:
[root@compute1 ~]# ssh-keygen -t dsa
Generating public/private dsa key pair.
Enter file in which to save the key (/home/celladmin/.ssh/id_dsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_dsa.
Your public key has been saved in /root/.ssh/id_dsa.pub.
The key fingerprint is:
e6:25:1f:2f:22:a9:5c:ec:e4:98:64:67:91:60:ce:9d celladmin@cell1.example.com
This creates the key for the local node now to be exchanged with the other nodes. Now to set this up for all the nodes:
[root@compute1 ~]# dcli -k -g all_group (take the current node out of this file and add it back after this)
The authenticity of host 'cell1 (127.0.0.1)' can't be established.
RSA key fingerprint is 99:86:a5:3f:f1:98:75:53:e8:92:fc:7d:fd:4d:aa:45.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'cell1' (RSA) to the list of known hosts.
root@cell1's password:
The authenticity of host 'cell2 (192.168.56.103)' can't be established.
RSA key fingerprint is 99:86:a5:3f:f1:98:75:53:e8:92:fc:7d:fd:4d:aa:45.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'cell2,192.168.56.103' (RSA) to the list of known hosts.
celladmin@cell2's password:
cell1: ssh key added
cell2: ssh key added
You should now retest from above that that user equivalence now works.
[celladmin@cell1 ~]$ dcli -g cell_group "cellcli -e list cell"
cell1: cell1 online
cell2: cell2 online
For monitoring of each system you need to prepare the system as follows:
# On Every Compute node
$dbmcli
DBMCLI>alter dbserver smtpFrom='Exadata - <db server name>’
DBMCLI>alter dbserver smtpFromAddr=‘<your email address>’
DBMCLI>alter dbserver smtpToAddr=‘<ess_monitor@natrinsic.com>’
DBMCLI>alter dbserver smtpServer=‘<your local mail server>’
DBMCLI>alter dbserver notificationPolicy='critical,warning,clear'
DBMCLI>alter dbserver notificationMethod='mail,snmp'
DBMCLI>alter dbserver validate mail
# On every Storage Cell
$cellcli
CELLCLI>alter cell smtpFrom='Exadata - <cell server name>’
CELLCLI>alter cell smtpFromAddr=‘<your email address>’
CELLCLI>alter cell smtpToAddr=‘<ess_monitor@natrinsic.com>’
CELLCLI>alter cell smtpServer=‘<your local mail server>’
CELLCLI>alter cell notificationPolicy='critical,warning,clear'
CELLCLI>alter cell notificationMethod='mail,snmp'
CELLCLI>alter cell validate mail