Management Module failover overview

There are two types of Management Module (MM) failover:

Controlled failover: The user triggers this type of failover by rebooting the Active MM or running the redundancy switchover command.
Uncontrolled failover: This type of failover is triggered by unexpected events like a crash on the Active MM or hot removal of the Active MM.

In a dual MM chassis, the Standby MM detects the failover in one of the following ways:

A mailbox interrupt is received from the Active MM to indicate takeover. This interrupt can come for controlled or uncontrolled failover (except for a hot removal).
Active MM hot removal detection.
Heartbeat loss detected on the Standby MM for more than 10 seconds.
NOTE:
If the Active MM is not responding and is still not detected by the first two methods, it will be caught by this method.

Failover requirements:

The Standby MM must be present to trigger a failover. An Unassigned MM will never trigger a failover.
The Redundant Management Daemon (hpe-rdntmgmtd) is responsible for triggering failover from the Standby MM.
When a failover is triggered, the Standby MM takes over and becomes Active and the old Active MM is rebooted.

Standby recover requirements:

The Active MM must be present to trigger a recover.
The Redundant Management Daemon (hpe-rdntmgmtd) is responsible for triggering recover from the Active MM.
When a recover is triggered, the Active MM reboots the nonresponsive Standby MM. This action occurs for any of the following conditions:

Condition: Heartbeat lost from Active MM:

The failover monitor thread on the Standby MM will increment the heartbeat failed count.
The hpe-rdntmgmtd daemon on the Standby MM will:
- Detect the failover condition due to heartbeat fail count increasing past the maximum of 10 and triggering failover
- Initiate reboot of the Active MM.
Active MM will join as a standby after reboot.

Condition: Heartbeat lost from Standby MM:

The recover monitor thread on the Active MM will increment the heartbeat failed count.
The hpe-rdntmgmtd daemon on the Active MM will:
- Detect the recover condition due to heartbeat fail count increasing past the maximum of 7 and triggering recover.
- Initiate reboot of Standby MM.
Standby MM will join as a standby after reboot.

Condition: Planned reboot of Active MM:

A planned reboot on the Active MM will send a failover command to the Standby MM.
The hpe-rdntmgmtd daemon on the Standby MM will:
- Process this command and perform a failover immediately instead of waiting for the failover monitor to detect it using heartbeats.
- Initiate reboot of the Active MM.
Active MM will join as a standby after reboot.

Condition: Removal of Active MM:

Removal of the Active MM from Slot 1 triggers the hpe-rdntmgmtd daemon on the Standby MM to initiate failover immediately instead of waiting for the failover monitor to detect it using heartbeats.
Active MM will join as a standby after reboot.

Condition: Crash on Active MM:

A crash on the Active MM is handled by the crash handler, which sends a failover command to the Standby MM.
The hpe-rdntmgmtd daemon on the Standby MM will:
- Process this command and perform failover immediately instead of waiting for the failover monitor to detect it using heartbeats.
- Initiate reboot of the Active MM.
Active MM will join as a standby after reboot.

Condition: redundancy switchover command:

User executes the redundancy switchover command on the Active MM.
This action will send a takeover signal to the Standby MM and reboot the Active MM.
The hpe-rdntmgmtd daemon on Standby MM will process this takeover signal and perform failover immediately.
Active MM will join as a standby after reboot.

Why did my second MM not take over after Active failed?

This action will happen if the second MM is not Standby-Ready.

NOTE:

The second MM must be elected as Standby and in a ready state before failover. If not, a double fault occurs and the second MM will not take over.