Print | Rate this content

HPE MSA 1050 Storage - Troubleshooting

Troubleshooting

These procedures are intended to be used only during initial configuration, for the purpose of verifying that hardwaresetup is successful. They are not intended to be used as troubleshooting procedures for configured systems using production data and I/O.

top

USB CLI port connection

MSA 1050 controllers feature a CLI port employing a Mini-USB Type B form factor. If user encounter problems communicating with the port after cabling computer to the USB device, user may need to either download a device driver (Windows), or set appropriate parameters via an operating system command (Linux).

top

Fault isolation methodology

MSA 1050 controllers provide many ways to isolate faults. This section presents the basic methodology used to locate faults within a storage system, and to identify the associated Field Replaceable Units (FRUs) affected.Use the SMU to configure and provision the system upon completing the hardware installation. As part of this process, configure and enable event notification so the system will notify user whena problem occurs that is at or above the configured severity. With event notification configured and enabled, user can follow there commended actions in the notification message to resolve the problem, as further discussed in the options presented below.

Basic steps

The basic fault isolation steps are listed below:

Options available for performing basic steps

When performing fault isolation and troubleshooting steps, select the option or options that best suit siteenvironment. Use of any option (Four options are described below) is not Mutually-Exclusive to the use of another option. User can use the SMU to check the health icons/values for the system and its components to ensure that everything is okay, or to drill down to a problem component. If user discover a problem, both the SMU and the CLI providere commended action text online. Options for performing basic steps are listed according to frequency of use:

  • Use the SMU.

  • Use the CLI.

  • Monitor event notification.

  • View the enclosure LEDs.

Use the SMU

The SMU uses health icons to show oK, degraded, fault, or unknown status for the system and its components. The SMU enables user to monitor the health of the system and its components. If any component has a problem, the system health will be degraded, fault, or unknown. Use the SMU to drill down to find each component that has a problem, and follow actions in the recommendation field for the component to resolve the problem.

Use the CLI

As an alternative to using the SMU, user can run the show system command in the CLI to view the health of the systemand its components. If any component has a problem, the system health will be degraded, fault, or unknown, and those components will be listed as unhealthy components. Follow the recommended actions in the component health recommendation field to resolve the problem.

Monitor event notification

With event notification configured and enabled, user can view event logs to monitor the health of the system and its components. If a message tells user to check whether an event has been logged, or to view information about an event inthe log, user can do so using either the SMU or the CLI. Using the SMU, user would view the event log and then click on theevent message to see detail about that event. Using the CLI, would run the show events detail command (With additional parameters to filter the output) to see the detail for an event.

View the enclosure LEDs

User can view the LEDs on the hardware to identifycomponent status. If a problem prevents access to either the SMU or the CLI, this is the only option available. However, monitoring/management is often done at a management console using storage management interfaces, rather than relying on line-of-sight to LEDs of racked hardware components.

Performing basic steps

User can use any of the available options in performing the basic steps comprising the fault isolation methodology.

Gather fault information

When a fault occurs, it is important to gather as much information as possible. Doing so will help user determine the correct action needed to remedy the fault.

Begin by reviewing the reported fault:

  • Is the fault related to an internal data path or an external data path?

  • Is the fault related to a hardware component such as a disk drive module, controller module, or power supply? By isolating the fault to one of the components within the storage system, user will be able to determine the necessary action more quickly.

Determine where the fault is occurring

Once user have an understanding of the reported fault, review the enclosure LEDs. The enclosure LEDs are designed toalert users of any system faults, and might be what alerted the user to a fault in the first place.

When a fault occurs, the Fault ID status LED on the enclosure right ear illuminates. Check the LEDs on the back of the enclosure to narrow the fault to a FRU, connection, or both. The LEDs alsohelp user identify the location of a FRU reporting a fault.

Use the SMU to verify any faults found while viewing the LEDs. The SMU is also a good tool to use in determining where the fault is occurring if the LEDs cannot be viewed due to the location of the system. The SMU provides user with a visual representation of the system and where the fault is occurring. It can also provide more detailed information about FRUs,data, and faults.

Review the event logs

The event logs record all system events. Each event has a numeric code that identifies the type of event that occurred,and has one of the following severities:

  • Critical. A failure occurred that may cause a controller to shut down. Correct the problem immediately.

  • Error. A failure occurred that may affect data integrity or system stability. Correct the problem as soon as possible.

  • Warning. A problem occurred that may affect system stability, but not data integrity. Evaluate the problem andcorrect it if necessary.

  • Informational. A configuration or state change occurred, or a problem occurred that the system corrected. Noimmediate action is required.

For information about specific events, see the Event Descriptions Reference Guide, Click here to view the information .

The event logs record all system events. It is very important to review the logs, not only to identify the fault, but also tosearch for events that might have caused the fault to occur. For example, a host could lose connectivity to a disk group ifa user changes channel settings without taking the storage resources assigned to it into consideration. In addition, the type of fault can help user isolate the problem to either hardware or software.

Isolate the fault

Occasionally it might become necessary to isolate a fault. This is particularly true with data paths, due to the number ofcomponents comprising the data path. For example, if a Host-Side data error occurs, it could be caused by any of the components in the data path: controller module, cable, connectors, switch, or data host.

If the enclosure does not initialize

It may take up to two minutes for the enclosures to initialize. If the enclosure does not initialize:

  • Perform a rescan.

  • Power cycle the system.

  • Make sure the power cord is properly connected, and check the power source that it is connected to.

  • Check the event log for errors.

Correcting enclosure IDs

When installing a system with drive enclosures attached, the enclosure IDs might not agree with the physical cablingorder. This is because the controller might have been previously attached to some of the same enclosures during factory testing, and it attempts to preserve the previous enclosure IDs if possible. To correct this condition, make sure that both controllers are up, and perform a rescan using the SMU or the CLI. This will reorder the enclosures, but can take up to two minutes for the enclosure IDs to be corrected.

To perform a rescan using the CLI, type the following command: rescan

To rescan using the SMU:

  1. Verify that both controllers are operating normally.

  2. Do one of the following:

    • Point to the System tab and select Rescan Disk Channels.

    • In the System topic, select Action, Rescan Disk Channels .

  3. Click Rescan.

top

Stopping I/O

When troubleshooting disk drive and connectivity faults, stop I/O to the affected disk groups from all hosts and remotesystems as a data protection precaution. As an additional data protection precaution, it is recommended to conduct regularly scheduled backups of data.

NOTE: Stopping I/O to a disk group is a host-side task, and falls outside the scope of this document.

When on-site, user can verify there is no I/O activity by briefly monitoring the system LEDs. When accessing the storage system remotely, this is not possible. Remotely, user can use the show disk-group-statistics CLI command to determine if input and output has stopped. Perform these steps:

  1. Using the CLI, run the show disk-group-statistics command.The Reads and Writes outputs show the number of these operations that have occurred since the statistic was lastreset, or since the controller was restarted. Record the numbers displayed.

  2. Run the show disk-group-statistics command a second time.This provides user a specific window of time (the interval between requesting the statistics) to determine if data isbeing written to or read from the disk group. Record the numbers displayed.

  3. To determine if any reads or writes occur during this interval, subtract the set of numbers user recorded in step 1 fromthe numbers user recorded in step 2.
    • If the resulting difference is zero, then I/O has stopped.

    • If the resulting difference is not zero, a host is still reading from or writing to this disk group. Continue to stop I/Ofrom hosts, and repeat step 1 and step 2 until the difference in step 3 is zero.

top

Diagnostic steps

This section describes possible reasons and actions to take when an LED indicates a fault condition during initial system setup.

NOTE: Once event notification is configured and enabled using the SMU, user can view event logs to monitor the healthof the system and its components using the GUI.

In addition to monitoring LEDs via line-of-sight observation of racked hardware components when performing diagnostic steps, user can also monitor the health of the system and its components using the management interfaces. Be mindful of this when reviewing the Actions column in the diagnostics tables, and when reviewing the step procedures provided inthis chapter.

Is the enclosure front panel Fault/Service Required LED amber?

Diagnostics LED status: Front panel Fault/Service Required
Answer
Possible Reasons
Actions
No
System functioning properly.
No action required.
Yes
A fault condition exists/occurred.
If installing an I/O module FRU, the module has not gone online and likely failed its self-test.
  • Check the LEDs on the back of the controller enclosure to narrowthe fault to a FRU, connection, or both.
  • Check the event log for specific information regarding the fault. Follow any recommended actions.
  • If installing an IOM FRU, try removing and reinstalling the new IOM, and check the event log for errors.
  • If the above actions do not resolve the fault, isolate the fault, andcontact an authorized service provider for assistance. Replacement may be necessary.

Is the enclosure rear panel FRU OK LED off?

Diagnostics LED status: Rear panel FRU OK
Answer
Possible Reasons
Actions
No (Blinking)
System functioning properly. System is booting.
No action required.
Wait for system to boot.
Yes
The controller module is not powered on.
The controller module has failed.
  • Check that the controller module is fully inserted and latched inplace, and that the enclosure is powered on.

  • Check the event log for specific information regarding the failure.

Is the enclosure rear panel Fault/Service Required LED amber?

Diagnostics LED status: Rear panel Fault/Service Required
Answer
Possible reasons
Actions
No
System functioning properly.
No action required.
Yes (Blinking)
One of the following errors occurred:
  • Hardware-controlled power-up error

  • Cache flush error

  • Cache Sself-Refresh error

  • Restart this controller from the other controller using the SMU or the CLI.

  • If the above action does not resolve the fault, remove the controller and reinsert it.

  • If the above action does not resolve the fault, contact anauthorized service provider for assistance. It may be necessary to replace the controller.

Are both disk drive module LEDs off (Online/Activity and Fault/UID)?

Diagnostics LED status: Front panel disks Online/Activity and Fault/UID
Answer
Possible Reasons
Actions
Yes
  • There is no power.

  • The disk is offline.

  • The disk is not configured.

Check that the disk drive is fully inserted and latched in place, andthat the enclosure is powered on.

Is the disk drive module Fault/UID LED blinking amber?

Diagnostics LED status: Front panel disks Fault/UID
Answer
Possible Reasons
Actions
No, but the Online/ActivityLED is blinking.
The disk drive is rebuilding.
No action required.
CAUTION: Do not remove a disk drive that is reconstructing. Removing a reconstructing disk drive might terminate the current operation and cause data loss.
Yes, and the Online/Activity LED is off.
The disk drive is offline. A predictive failure alert may have been received for this device.
  • Check the event log for specific information regarding the fault.

  • Isolate the fault.

  • Contact an authorized service provider for assistance.

Yes, and the Online/ActivityLED is blinking.
The disk drive is active, but a predictive failure alert may have been received for this device.
  • Check the event log for specific information regarding the fault.

  • Isolate the fault.

  • Contact an authorized service provider for assistance.

Is a connected host port Host Link Status LED off?

Diagnostics LED status: Rear panel Host Link Status
Answer
Possible Reasons
Actions
No
System functioning properly.
No action required
Yes
The link is down.
  • Check cable connections and reseat if necessary.

  • Inspect cables for damage. Replace cable if necessary.

  • Swap cables to determine if fault is caused by a defective cable. Replace cable if necessary.

  • Verify that the switch, if any, is operating properly. If possible, test with another port.

  • Verify that the HBA is fully seated, and that the PCI slot ispowered on and operational.

  • In the SMU, review event logs for indicators of a specific fault in ahost data path component. Follow any recommended actions.

  • Contact an authorized service provider for assistance.

Is a connected port Expansion Port Status LED off?

Diagnostics LED status: Rear panel Expansion Port status
Answer
Possible Reasons
Actions
No
System functioning properly.
No action required
Yes
The link is down.
  • Check cable connections and reseat if necessary.

  • Inspect cables for damage. Replace cable if necessary.

  • Swap cables to determine if fault is caused by a defective cable. Replace cable if necessary.

  • Verify that the switch, if any, is operating properly. If possible, test with another port.

  • Verify that the HBA is fully seated, and that the PCI slot is powered on and operational.

  • In the SMU, review event logs for indicators of a specific fault in a host data path component. Follow any recommended actions.

  • Contact an authorized service provider for assistance.

Is the power supply Input Power Source LED off?

Diagnostics LED status: Rear panel power supply Input Power Source
Answer
Possible reasons
Actions
No
System functioning properly.
No action required
Yes
The power supply is not receiving adequate power.
  • Verify that the power cord is properly connected and check the power source to which it connects.

  • Check that the power supply FRU is firmly locked into position.

  • In the SMU, check the event log for specific information regarding the fault. Follow any recommended actions.

  • If the above action does not resolve the fault, isolate the fault, and contact an authorized service provider for assistance.

Is a connected port Network Port Link Status LED off?

Diagnostics LED status: Rear panel Network Port Link Status
Answer
Possible reasons
Actions
No
System functioning properly.
No action required
Yes
The link is down.
Use standard networking troubleshooting procedures to isolate faults onthe network.

Is the power supply Voltage/Fan Fault/Service Required LED amber?

Diagnostics LED status: Rear panel power supply: Voltage/Fan Fault/Service Required
Answer
Possible reasons
Actions
No
System functioning properly.
No action required
Yes
The power supply unit or a fan is operating at an unacceptable voltage/RPM level, or has failed.
When isolating faults in the power supply, remember that the fans in both modules receive power through a common bus on the midplane, so if apower supply unit fails, the fans continue to operate normally.
  • Check that the power supply FRU is firmly locked into position.

  • Check that the power cable is connected to a power source.

  • Check that the power cable is connected to the power supply module.

top

Controller failure

Cache memory is flushed to CompactFlash in the case of a controller failure or power loss. During the write to CompactFlash process, only the components needed to write the cache to the CompactFlash are powered by the super capacitor. This process typically takes 60 seconds per 1GB of cache. After the cache is copied to CompactFlash,the remaining power left in the supercapacitor is used to refresh the cache memory. While the cache is being maintainedby the supercapacitor, the Cache Status LED flashes at a rate of 1/10 second on and 9/10 second off.

NOTE: Transportable cache only applies to Single-Controller configurations. In dual controller configurations,there is no need to transport cache from a failed controller to a replacement controller because the cache is duplicated between the peer controllers (Subject to volume cache optimization setting).

If the controller has failed or does not start, is the Cache Status LED on/blinking?

Diagnostics LED status: Rear panel Cache Status
Answer
Actions
No, the Cache LED status is off, and the controller does not boot.
If valid data is thought to be in Flash, see Transporting cache given below otherwise,replace the controller module.
No, the Cache Status LED is off, and the controller boots.
The system has flushed data to disks. If the problem persists, replace the controller module.
Yes, at a strobe 1:10 rate - 1 Hz, and the controller doesnot boot.
See Transporting cache given below.
Yes, at a strobe 1:10 rate - 1 Hz, and the controller boots.
The system is flushing data to CompactFlash. If the problem persists, replace the controller module.
Yes, at a blink 1:1 rate - 1 Hz, and the controller doesnot boot.
See Transporting cache given below.
Yes, at a blink 1:1 rate - 1 Hz, and the controller boots.
The system is in Self-Refresh mode. If the problem persists, replace the controller module.

Transporting cache

To preserve the existing data stored in the CompactFlash, user must transport the CompactFlash from the failedcontroller to a replacement controller using the procedure outlined in HPE MSA Controller Module Replacement Instructions shipped with the replacement controller module. Failure to use this procedure will result in the loss of datastored in the cache module.

CAUTION: Remove the controller module only after the copy process is complete, which is indicated by the CacheStatus LED being off, or blinking at 1:10 rate.

top

Isolating a Host-Side connection fault

During normal operation, when a controller module host port is connected to a data host, the port?s host link status/link activity LED is green. If there is I/O activity, the LED blinks green. If data hosts are having trouble accessing the storage system, and user cannot locate a specific fault or cannot access the event logs, use the following procedure. This procedure requires scheduled downtime.

NOTE: Do not perform more than one step at a time. Changing more than one variable at a time can complicatethe troubleshooting process.

Host-Side connection troubleshooting featuring host ports with Small Form factor Pluggable (SFP)

The procedure below applies to MSA 1050 controller enclosures employing SFP transceiver connectors in 8Gb FC, 10 GbE iSCSI, or 1Gb iSCSI host interface ports. In the following procedure, SFP and host cable isused to refer to FC or iSCSI controller ports used for I/O or replication.

NOTE: When experiencing difficulty diagnosing performance problems, consider swapping out one SFP at a time to seeif performance improves.
  1. Halt all I/O to the storage system.

  2. Check the host link status/link activity LED.If there is activity, halt all applications that access the storage system.

  3. Check the Cache Status LED to verify that the controller cached data is flushed to the disk drives.

    • Solid - Cache contains data yet to be written to the disk.

    • Blinking - Cache data is being written to CompactFlash.

    • Flashing at 1/10 second on and 9/10 second off - Cache is being refreshed by the supercapacitor.

    • Off - Cache is clean (No unwritten data).

  4. Remove the SFP and host cable and inspect for damage.

  5. Reseat the SFP and host cable.Is the host link status/link activity LED on?

    • Yes - Monitor the status to ensure that there is no intermittent error present. If the fault occurs again, clean the connections to ensure that a dirty connector is not interfering with the data path.

    • No - Proceed to the next step.

  6. Move the SFP and host cable to a port with a known good link status.This step isolates the problem to the external data path (SFP, host cable, and Host-Side devices) or to the controller module port.Is the host link status/link activity LED on?
    • Yes - User now know that the SFP, host cable, and host-side devices are functioning properly. Return the SFP and cable to the original port. If the link status/link activity LED remains off, user have isolated the fault to the controller module port. Replace the controller module.

    • No - Proceed to the next step.

  7. Swap the SFP with the known good one.Is the host link status/link activity LED on?
    • Yes - User have isolated the fault to the SFP. Replace the SFP.
    • No - Proceed to the next step.
  8. Re-insert the original SFP and swap the cable with a known good one.Is the host link status/link activity LED on?
    • Yes - User have isolated the fault to the cable. Replace the cable.

    • No -Proceed to the next step.

  9. Verify that the switch, if any, is operating properly. If possible, test with another port.

  10. Verify that the HBA is fully seated, and that the PCI slot is powered on and operational.

  11. Replace the HBA with a known good HBA, or move the host side cable and SFP to a known good HBA.Is the host link status/link activity LED on?
    • Yes - User have isolated the fault to the HBA. Replace the HBA.

    • No - It is likely that the controller module needs to be replaced.

  12. Move the cable and SFP back to its original port.Is the host link status/link activity LED on?
    • Yes - Monitor the connection for a period of time. It may be an intermittent problem, which can occur withdamaged SFPs, cables, and HBAs.

    • No - The controller module port has failed. Replace the controller module.

Host-side connection troubleshooting featuring SAS host ports

The procedure below applies to MSA 1050 SAS controller enclosures employing 12Gb SFF-8644 connectors in the HDmini-SAS host ports used for I/O.

  1. Halt all I/O to the storage system click here to view Stopping I/O .

  2. Check the host activity LED. If there is activity, halt all applications that access the storage system.

  3. Check the Cache Status LED to verify that the controller cached data is flushed to the disk drives.
    • Solid - Cache contains data yet to be written to the disk.

    • Blinking - Cache data is being written to CompactFlash.

    • Flashing at 1/10 second on and 9/10 second off - Cache is being refreshed by the supercapacitor.

    • Off - Cache is clean (No unwritten data).

  4. Reseat the host cable and inspect for damage.Is the host link status LED on?
    • Yes - Monitor the status to ensure that there is no intermittent error present. If the fault occurs again, clean the connections to ensure that a dirty connector is not interfering with the data path.

    • No - Proceed to the next step.

  5. Move the host cable to a port with a known good link status. This step isolates the problem to the external data path (Host cable and host-side devices) or to the controller module port.Is the host link status LED on?
    • Yes - User now know that the host cable and host-side devices are functioning properly. Return the cable to the original port. If the link status LED remains off, user have isolated the fault to the controller module port. Replace the controller module.

    • No - Proceed to the next step.

  6. Verify that the switch, if any, is operating properly. If possible, test with another port.

  7. Verify that the HBA is fully seated, and that the PCI slot is powered on and operational.

  8. Replace the HBA with a known good HBA, or move the host side cable to a known good HBA.Is the host link status LED on?
    • Yes - User have isolated the fault to the HBA. Replace the HBA.

    • No - It is likely that the controller module needs to be replaced.

  9. Move the host cable back to its original port.Is the host link status LED on?
    • No - The controller module port has failed. Replace the controller module.

    • Yes - Monitor the connection for a period of time. It may be an intermittent problem, which can occur with damaged cables and HBAs.

top

Isolating a controller module expansion port connection fault

During normal operation, when a controller module expansion port is connected to a drive enclosure, the expansion portstatus LED is green. If the connected port's expansion port LED is off, the link is down. Use the following procedure to isolate the fault. This procedure requires scheduled downtime.

NOTE: Do not perform more than one step at a time. Changing more than one variable at a time can complicate the troubleshooting process.
  1. Halt all I/O to the storage system.

  2. Check the host activity LED.If there is activity, halt all applications that access the storage system.

  3. Check the Cache Status LED to verify that the controller cached data is flushed to the disk drives.

    • Solid - Cache contains data yet to be written to the disk.

    • Blinking - Cache data is being written to CompactFlash.

    • Flashing at 1/10 second on and 9/10 second off - Cache is being refreshed by the supercapacitor.

    • Off - Cache is clean (No unwritten data).

  4. Reseat the expansion cable, and inspect it for damage.Is the expansion port status LED on?

    • Yes - Monitor the status to ensure there is no intermittent error present. If the fault occurs again, clean theconnections to ensure that a dirty connector is not interfering with the data path.
    • No - Proceed to the next step.
  5. Move the expansion cable to a port on the controller enclosure with a known good link status.This step isolates the problem to the expansion cable or to the controller module expansion port.Is the expansion port status LED on?

    • Yes - User now know that the expansion cable is good. Return the cable to the original port. If the expansion portstatus LED remains off, user have isolated the fault to the controller module expansion port. Replace the controller module.

    • No - Proceed to the next step.

  6. Move the expansion cable back to the original port on the controller enclosure.

  7. Move the expansion cable on the drive enclosure to a known good expansion port on the drive enclosure.Is the expansion port status LED on?
    • Yes - User have isolated the problem to the drive enclosure port. Replace the expansion module.

    • No - Proceed to the next step.

  8. Replace the cable with a known good cable, ensuring the cable is attached to the original ports used by the previouscable.Is the host link status LED on?
    • Yes - Replace the original cable. The fault has been isolated.

    • No - It is likely that the controller module must be replaced.

top

Isolating remote snap replication faults

Remote Snap replication is a licensed disaster-recovery feature that performs asynchronous replication of block-level data from a volume in a primary storage system to a volume in a secondary system. Remote Snap creates an internal snapshot of the primary volume, and copies changes to the data since the last replication to the secondary system via iSCSI or FC links. The primary volume exists in a primary pool in the primary storage system. Replication can be completed using either the SMU or CLI.

Replication setup and verification

After storage systems and hosts are cabled for replication, user can use the SMU to prepare to use the Remote Snap feature. Optionally, user can use SSH to access the IP address of the controller module and access the Remote Snap feature using the CLI.

NOTE: Refer to the following references for more information about replication setup:
  • See HPE Remote Snap technical white paper for replication best practices: MSA Remote Snap Software.
  • See HPE MSA 1050/2050 SMU Reference Guide for procedures to setup and manage replications.
  • See HPE MSA 1050/2050 CLI Reference Guide for replication commands and syntax.
  • See HPE MSA Event Descriptions Reference Guide for replication event reporting.

Basic information for enabling the MSA 1050 FC or iSCSI controller enclosures for replication supplements thetroubleshooting procedures that follow.

  • Familiarize yourself with replication content provided in the SMU Reference Guide.

  • For best practices concerning replication-related tasks, see the technical white paper.

  • In order to replicate an existing volume to a pool on the peer ensure both the primary and secondary systems have the Remote Snap license installed and follow these steps:

    • Find the port address on the secondary system:Using the CLI, run the show ports command on the secondary system.

    • Verify that ports on the secondary system can be reached from the primary system using either method below:

      • Run the query Peer-Connection CLI command on the primary system, using a port address obtained from the output of the show ports command above.

      • In the SMU Replications topic, select Action, Query Peer Connection.Ensure a pool exists on the secondary system.

    • Create a peer connection.To create a peer connection, use the create peer-connection CLI command or in the SMU Replications topic,select Action, Create Peer Connection.

    • Create a replication set.To create a replication set, use the create replication-set CLI command or in the SMU Replications topic,select Action, Create Replication Set .

    • Replicate.To initiate replication, use the replicate CLI command or in the SMU Replications topic, select Action, Replicate.

  • For descriptions of Replication-Related events, the Event Descriptions reference guide.

Diagnostic steps for replication setup

The tables in this subsection show menu navigation for replication using the SMU.

NOTE: Remote Snap must be licensed on all systems configured for replication, and the controller module firmware must be compatible on all systems licensed for replication.

Can you successfully use the Remote Snap feature?

Diagnostics for replication setup: Using Remote Snap feature
Answer
Possible Reasons
Actions
Yes
System functioning properly.
No action required.
No
Remote Snap is not licensed on each controller enclosure used for replication.
Verify licensing of the optional feature per system:
  • In the Home topic in the SMU, select Action., Install License

  • The License Settings panel opens and displays information about each licensed feature.

  • If the Replication feature is not enabled, obtain and install a valid license for Remote Snap.

  • For more information on licensing, see the? Installing a license? chapter inthe SMU Reference Guide.

No
Compatible firmware revision supporting Remote Snap is not running on each system used for replication.
Perform the following actions on each system used for replication:
  • In the System topic, select Action > , Update Firmware.The Update Firmware panel opens. The Update Controller Modules tab shows firmware versions installed in each controller.

  • If necessary, update the controller module firmware to ensure compatibility with other systems.

  • For more information on compatible firmware, see the Updating firmware chapter in the SMU Reference Guide.

No
Invalid cabling connection. (If multiple controller enclosures areused, check the cabling for each system)
Verify controller enclosure cabling.
  • Verify use of proper cables.

  • Verify proper cabling paths for host connections.

  • Verify cabling paths between replication ports and switches are visible to one another.

  • Verify that cable connections are securely fastened.

  • Inspect cables for damage and replace if necessary.

No
A system does not have a pool configured.
Configure each system to have a storage pool.

Can you replicate a volume?

NOTE: Remote Snap must be licensed on all systems configured for replication, and the controller module firmware must be compatible on all systems licensed for replication.
Diagnostics for replication setup: Replicating a volume
Answer
Possible Reasons
Actions
Yes
System functioning properly.
No action required.
No
Remote Snap is not licensed on each controller enclosure used for replication.
See actions described in Can you successfully use the Remote Snap feature? given above.
No
Nonexistent replication set.
  • Determine existence of primary or secondary volumes.

  • If a replication set has not been successfully created, use the Replications topic: select Action, Create Replication Setto create one.

  • Review event logs (In the footer, click the events panel and select ShowEvent List) for indicators of a specific fault in a replication data path component. Follow any recommended actions.

No
Network error occurred during in-progress replication.
  • Review event logs for indicators of a specific fault in a replication datapath component. Follow any recommended actions.

  • Click in the Volumes topic, then click on a volume name in the volumeslist. Click the Replication Sets tab to display replications and associatedmetadata.

  • Replications that enter the suspended state can be resumed manually (See the SMU reference guide for additional information).

    G
No
Communication link is down.
Review event logs for indicators of a specific fault in a host or replication datapath component.

Has a replication run successfully?

Diagnostics for replication setup: Checking for a successful replication
Answer
Possible Reasons
Actions
Yes
System functioning properly.
No action required.
No
Last Successful Run shows N/A.
  • In the Volumes topic, click on the volume that is a member of the replication set.

  • Select the Replication Sets table.

  • Check the Last Successful Run information.

  • If a replication has not run successfully, use the SMU to replicate asdescribed in the section about working in the Replications topic within the SMU Reference Guide.

No
Communication link is down.
Review event logs for indicators of a specific fault in a host or replication datapath component.

Has a replication run successfully?

Diagnostics for replication setup: Checking for a successful replication
Answer
Possible Reasons
Actions
Yes
System functioning properly.
No
Last Successful Run shows N/A.
  • In the Volumes topic, click on the volume that is a member of the replication set
    • Select the Replication Sets table.

    • Check the Last Successful Run information.

  • If a replication has not run successfully, use the SMU to replicate as described in the section about working in the Replications topic within the SMU Reference Guide.

No
Communication link is down.
Review event logs for indicators of a specific fault in a host or replication datapath component.

top

Resolving voltage and temperature warnings

  1. Check that all of the fans are working by making sure the Voltage/Fan Fault/Service Required LED on each powersupply is off, or by using the SMU to check enclosure health status.

    • In the lower corner of the footer, overall health status of the enclosure is indicated by a health status icon. Formore information, point to the System tab and select View System to see the System panel. User can select from Front, Rear, and Table views on the System panel. If user point to a component, its associated metadata andhealth status displays onscreen. Click here to view Options available for performing basic steps for a description of health status icons and alternatives for monitoring enclosure health.

  2. Make sure that all modules are fully seated in their slots with latches locked.

  3. Make sure that no slots are left open for more than two minutes. If user need to replace a module, leave the old module in place until user have the replacement or use a blank moduleto fill the slot. Leaving a slot open negatively affects the airflow and can cause the enclosure to overheat.

  4. Make sure there is proper air flow, and no cables or other obstructions are blocking the front or rear of the array.

  5. Try replacing each power supply module one at a time.

  6. Replace the controller modules one at a time.

  7. Replace SFPs one at a time (MSA 1050 FC or iSCSI storage systems).

Sensor locations

The storage system monitors conditions at different points within each enclosure to alert user to problems. Power, coolingfan, temperature, and voltage sensors are located at key points in the enclosure. In each controller module and expansionmodule, the Enclosure Management Processor (EMP) monitors the status of these sensors to perform SCSI Enclosure Services (SES) functions.The following sections describe each element and its sensors.

Power supply sensors

Each enclosure has two fully redundant power supplies with load-sharing capabilities. The power supply sensors described in the following table monitor the voltage, current, temperature, and fans in each power supply. If the power supply sensors report a voltage that is under or over the threshold, check the input voltage.

Power supply sensor descriptions
Description
Event/Fault ID LED Condition
Power supply 1
Voltage, current, temperature, or fan fault
Power supply 2
Voltage, current, temperature, or fan fault

Cooling fan sensors

Each power supply includes two fans. The normal range for fan speed is 4,000 to 6,000 RPM. When a fan speed dropsbelow 4,000 RPM, the EMP considers it a failure and posts an alarm in the storage system event log. The following table lists the description, location, and alarm condition for each fan. If the fan speed remains under the 4,000 RPM threshold, the internal enclosure temperature may continue to rise. Replace the power supply reporting the fault.

Cooling fan sensor descriptions
Description
Location
Event/Fault ID LED Condition
Fan 1
Power supply 1
< 4,000 RPM
Fan 2
Power supply 1
< 4,000 RPM
Fan 3
Power supply 2
< 4,000 RPM
Fan 4
Power supply 2
< 4,000 RPM

During a shutdown, the cooling fans do not shut off. This allows the enclosure to continue cooling.

Temperature sensors

Extreme high and low temperatures can cause significant damage if they go unnoticed. When a temperature fault isreported, it must be remedied as quickly as possible to avoid system damage. This can be done by warming or coolingthe installation location.

Controller platform temperature sensor descriptions
Description
Normal Operating Range
Warning Operating Range
Critical Operating Range
Shutdown Values
CPU temperature (Internal digital thermal sensor)
2°C-98°C
0°C-1°C,99°C-104°C
None
Greater or equal to 0°C and less than equal to 104°C
SAS2008 internal digital sensor
3°C-112°C
0°C-2°C,113°C-115°C
None
Greater or equal to 0°C and less than equal to 115°C
Supercapacitor pack thermistor
0°C-50°C
None
None
None
Onboard temperature 1
0°C-70°C
None
None
None
Onboard temperature 2
0°C-70°C
None
None
None
Onboard temperature 3
0°C-70°C
None
None
None

When a power supply sensor goes out of range, the Fault/ID LED illuminates amber and an event is logged.

Power supply temperature sensor descriptions
Description
Normal Operating Range
Power Supply 1 temperature
-10°C - 80°C
Power Supply 2 temperature
-10°C - 80°C

Power supply module voltage sensors

Power supply voltage sensors ensure that the enclosure power supply voltage is within normal ranges. There are three voltage sensors per power supply.

Voltage sensor descriptions
Description
Normal Operating Range
Power supply 1 voltage, 12 V
< 11.00 V
> 13.00 V
Power supply 1 voltage, 5 V
< 4.00 V
> 6.00 V
Power supply 1 voltage, 3.3 V
< 3.00 V
> 3.80 V

top

Legal Disclaimer: Products sold prior to the November 1, 2015 separation of Hewlett-Packard Company into Hewlett Packard Enterprise Company and HP Inc. may have older product names and model numbers that differ from current models.

Provide feedback

Please rate the information on this page to help us improve our content. Thank you!
Document title: HPE MSA 1050 Storage - Troubleshooting
Document ID: emr_na-a00035740en_us-2
How helpful was this document?
How can we improve this document?
Note: Only English language comments can be accepted at this time.
Please wait while we process your request.