Question: I have installed PowerHA, now what?
Answer: Before PowerHA can manage and keep your application highly available, you need to tell PowerHA about your cluster and the application. There are 4 steps:
Step 1) Define the nodes that will keep your application highly available
The local node (the one where you are configuring PowerHA) is assumed to be one of the cluster nodes and you must give PowerHA the name of the other nodes that make up the cluster. Just enter a hostname or IP address for each node.
Step 2) Define the application you want to keep highly available
There are 3 things you need to tell PowerHA about the application:
Step 3) Verify and synchronize the cluster
PowerHA will discover all the networks and disks connected to the nodes. A verification step will ensure that the cluster configuration will be able to keep the application highly available. When successful the configuration will be copied to the rest of the nodes in the cluster.
Step 4) Manage the application
When you start PowerHA it will begin managing the application and keeping it highly available. You can also use the maintenance facilities provided by PowerHA to move the application between nodes for maintenance purposes.
To see just how easy it is to configure PowerHA, look for Using the SMIT Assistant in Chapter 11 of the Installation Guide
Question: Why does PowerHA require so many subnets for IP address takeover?
Answer: PowerHA (using RSCT) determines adapter state by sending heartbeats across a specific network interface—as long as heartbeat messages can be sent through an interface, the interface is considered alive. Prior to AIX V5, AIX did not allow more than one interface to own a subnet route but in AIX V5.1 multiple interfaces can have a route to the same subnet. This is sometimes referred to as multipath routing or route striping and when this situation exists, AIX will multiplex outgoing packets destined for a particular subnet across all interfaces with a route to that subnet. This interferes with RSCT's ability to reliably send heartbeats to a specific interface. Therefore the subnetting rules for boot, service and persistent labels are such that there will never be a duplicate subnet route created by the placement of these addresses.
PowerHA V5 includes a new feature whereby you may be able to avoid some of the subnet requirements by configuring PowerHA to use a different set of IP alias addresses for heartbeat. With this feature you provide a base or starting address and PowerHA calculates a set of addresses in proper subnets—when cluster services are active, PowerHA adds these addresses as IP alias addresses to the interfaces and then uses these alias addresses exclusively for heartbeat traffic. You can then assign your "regular" boot, service and persistent labels in any subnet, but be careful: although this feature avoids multipath routing for heartbeat, multipath routing may adversely affect your application. Heartbeat via IP Aliasing is discussed in Chapter 2 of the Concepts and Facilities Guide and Chapter 3 of the Administration and Troubleshooting Guide. View the online documentation for PowerHA.
Question: Does PowerHA have any limits?
Answer: The functional limits for PowerHA (e.g. number of nodes and networks) can be found in Chapter 1 of the Planning and Installation Guide. View the online documentation for PowerHA.
Question: How can I avoid the nameServer as a single point-of-failure?
Answer: 1) Make the nodes look at /etc/hosts first before the nameServer by creating a /etc/netsvc.conf file with the following entry:
tells it to look at /etc/hosts first and then the nameServer
2) Remove /etc/resolv.conf (or modify name to save it for later use) so it looks for name resolution in /etc/hosts first.
For information on updating the /etc/hosts file and nameServer configuration, Installation Guide. View the online documentation for PowerHA.
Question: What is a config_too_long event?
Answer: The config_too_long event is an informational event run by PowerHA whenever a cluster event runs for longer that a preset time. This can occur when:
If the config_too_long event is run, you should check the hacmp.out file to determine the cause and if manual intervention is required. For more information on recovery after an event failure, refer to
Recover from PowerHA Script Failure
in Chapter 18 of the
Administration and Troubleshooting Guide
Question: Do all cluster nodes need to be at the same version of PowerHA and AIX operating system?
Answer: No, though there are some restrictions when running mixed mode clusters.
Mixed levels of AIX on cluster nodes do not cause problems for PowerHA as long as the level of AIX is adequate to support the level of PowerHA being run on that node. All cluster operations are supported in such an environment. The PowerHA install and update packaging will enforce the minimum level of AIX required on each system.
As a matter of practicality, it is recommended that all nodes be at the same levels of operating system and PowerHA whenever possible. Keeping, the operating system, PowerHA and the application at the same level on all nodes will make the administration of the cluster easier and less error prone, and will go a long way towards reducing the frustration of the administrators. The Planning Guide has advice for effectively managing different installation and migration scenarios.
Question: Why do I need multiple heart beat networks?
Answer: Cluster health management involves each node communicating with all other cluster member nodes. One of the key health management message exchanged is the heart beat. These heartbeats form the foundation for detecting a node failure from the rest of the surviving nodes in the cluster. A node is declared to be dead by the cluster if it does not heartbeat within the "Node failure detection time". Redundancy in communication channels between the nodes in the cluster helps ensures that failure of the communication links does not result in false node failure indications.
Note that node failure detection will result in PowerHA to initiate a takeover on the standby node per policy. However in the case of false node failure detection, since the primary node is still alive, PowerHA takeover can cause both nodes to have the same IP address, and can cause both the nodes to try to own and access the shared disks. This situation is sometimes referred to as "split brain" or "partitioned cluster". In these circumstances, Data corruption could occur.
PowerHA therefore strongly recommends that there be as many communication links between nodes as possible. For PowerHA v7 Clusters, Ethernet networks, SAN network and Disk (Repository) based heartbeat communication is possible. For PowerHA v6 clusters with more than two nodes, the most reliable configuration includes two non-IP networks on each node. The distance limitations on non-IP links—particularly RS-232—has often made this requirement difficult to meet. For such clusters, PowerHA disk heartbeating should be strongly considered.
Question: Can I put different types of processors, communications adapters, or disk subsystems in the same cluster?
Answer: In general, yes, as long as the individual components are supported by PowerHA. Note that there are some combinations which may not be reasonable or desirable. For example, putting two Ethernet adapters that run at different speeds on the same network will generally force all adapters on the network to run at the speed of the slower one. Likewise, having a low powered processor back up a high-powered processor may result in unacceptable performance should PowerHA have to run the application on the lower powered one. (But see the questions on dynamic LPAR and CUoD for a way of dealing with this). As long as AIX and the hardware support the interconnections, PowerHA will support them as well.
Question: What kinds of applications are best suited for a high availability environment?
Answer: PowerHA detects failures in the cluster then moves or restarts resources in order to keep the application highly available. For an application to work well in a high availability environment, the application itself must be capable of being managed (start, stop, restart) programmatically (no user intervention required) and must have no "hard coded" dependencies on specific resources. For example, if the application relies on the hostname of the server (and cannot dynamically accept a change in hostname), then it is practically impossible to restart the application on a backup server after a failure.
Question: Can I use Etherchannel with PowerHA?
Answer: See Using Etherchannel with PowerHA.
Question: Can I use an existing Enhanced Concurrent Mode volume group for disk heartbeat? Or do I need to define a new one?
Answer: To achieve the highest levels of availability under the widest range of failure scenarios, the best practice would be to configure one disk heartbeat connection per physical disk enclosure (or LUN).
The heartbeat operation itself involves reading and writing messages from a non-data area of the shared disk. Although the space used for heartbeat messages does not decrease the space available for the application (it is in the reserved area of the disk) there is some overhead when the disk seeks back and forth between the reserved area and the application data area.
If you configure the disk heartbeat path using the same disk and vg as is used by the application, the best practice is to select a disk which does not have frequently accessed or performance critical application data: although the disk heartbeat overhead is small (2-4 seeks/sec), it could potentially impact application performance or, conversely, excess application access could cause the disk hb connection to appear to go up and down.
Ultimately the decision of which disk and volume group to use for heartbeat depends on what makes sense for your shared disk environment and management procedures. For example, using a separate vg just for heartbeat isolates the heartbeat from the application data, but adds another volume group that has to be maintained (during upgrades, changes, etc) and consumes another LUN.
If you decide on a separate vg for heartbeat, it does not need to be included in an PowerHA resource group, however, the CSPOC utilities use a resource group node list as the set of nodes to perform operations: including the vg in a resource group with just the (sub)set of nodes connected to the disk will let you take advantage of the CSPOC functions. You can also define and use a disk which is not part of any volume group, though such a setup would have to be manually configured and maintained.