Print | Rate this content

HPE ProLiant Gen10 Servers - Troubleshooting Hardware Problems

Hardware issues

Procedures for all ProLiant servers.

The procedures in this section are comprehensive and include steps about or references to hardware features that may not be supported by the server user are troubleshooting.

CAUTION: Before removing or replacing any processors, be sure to follow the guidelines provided in Processor troubleshooting guidelines. Failure to follow the recommended guidelines can cause damage to the system board, requiring replacement of the system board.

top

Power issues

Server does not power on.

Symptom

The server does not power on.

Action

  1. Review the Server Health Summary.

  2. Click here to view the Power-on issues flowchart .

top

Power source issues

Symptom

The server does not power on.

Cause

  • The server is not powered on.

  • Components or cables might not be properly connected or seated.

  • The grounded power outlet is not working.

  • The power cord is not functional.

  • The power strip is not functional.

  • The circuit breaker is in the off position.

  • The line voltage is insufficient for the load.

  • Sufficient power is not allocated to support the server.

Action

  1. Press the power On/Standby button to be sure it is on. If the server has a Power On/Standby button that returns to its original position after being pressed, be sure user press the switch firmly.

  2. Be sure no loose connections exist. For more information, click here to view Resolving loose connections .

  3. Verify that the power LEDs are on the power supplies are illuminated.

  4. Plug another device into the grounded power outlet to be sure the outlet works. Also, be sure the power source meets applicable standards.

  5. Replace the power cord with a known functional power cord to be sure it is not faulty.

  6. Replace the power strip with a known functional power strip to be sure it is not faulty.

  7. Be sure the proper circuit breaker is in the On position.

  8. Have a qualified electrician check the line voltage to be sure it meets the required specifications.

  9. If enclosure dynamic power capping or enclosure power Limit is enabled on supported servers, be sure there is sufficient power allocation to support the server.

top

Power supply issues

Symptom

The power supply is not functioning or not working properly.

Cause

  • The power supply might not be fully seated.

  • AC power is unavailable.

  • The power supply failed.

  • The power supply is in standby mode.

  • The power supply has exceeded the current limit.

  • The power supply is not supported on the server.

  • The power is not sufficient for the hardware installed.

  • Redundant power supplies are configured but the power supplies are not compatible.

Action

  1. Be sure no loose connections exist. For more information, click here to view Resolving loose connections .

  2. If the power supplies have LEDs, be sure they indicate that each power supply is working properly:

    1. If the LEDs indicate an issue with a power supply (red, amber, or off), then check the power source.

    2. If the power source is working properly, then replace the power supply.

  3. Be sure the system has enough power, particularly if user recently added hardware, such as drives. Remove the newly added component and if the issue is no longer present, then additional power supplies are required.

  4. If running a redundant configuration, be sure that all of the power supplies in the system have the same spare part number and are supported by the server.

  5. For further troubleshooting, Click here to view the Power-on issues flowchart .

Insufficient power supply configuration

Symptom

The power supply configuration for the server is insufficient to meet the power requirements for the server.

Cause

The current power supply configuration is not sufficient to operate the server.

Action

  1. Verify that the power supplies support the power requirements for the server configuration.

  2. Verify that all power supplies are supported for this server.

  3. If the power supplies have LEDs, check the LEDs to see if they indicate an issue:

    1. If the LEDs display red, replace the power supply.

    2. If the LEDs display amber, then the power supply is in standby mode. Press and hold the Power On/Standby button on the server.

    3. If the LEDs are off, check the power source. Then, power on the server again.

    4. If the power source is working properly, and the LEDs remain off, replace the power supply.

  4. Verify that the system has enough power, particularly if user recently added hardware, such as drives. To verify that the system has enough power, remove the newly added component. If the issue is no longer present, then additional power supplies are required.

top

UPS issue

Symptom

UPS issues UPS is not working properly

Cause

  • The UPS switch is not in the ON position.

  • The UPS batteries are not charged to the proper level.

  • The UPS software is not up-to-date.

  • The UPS power cord is not connected.

  • The UPS power cord is not the correct type for the UPS and the country in which the server is located.

Action

  1. Be sure the UPS batteries are charged to the proper level for operation.

  2. Be sure the UPS power switch is in the ON position.

  3. Be sure the UPS software is updated to the latest version.

  4. Be sure the power cord is the correct type for the UPS and the country in which the server is located.

  5. Be sure the line cord is connected.

  6. Be sure each circuit breaker is in the ON position, or replace the fuse if needed. If this occurs repeatedly, contact an authorized service provider.

  7. Check the UPS LEDs to be sure a battery or site wiring issue has not occurred.

  8. If the UPS sleep mode is initiated, disable sleep mode for proper operation. The UPS sleep mode can be turned off through the configuration mode on the front panel.

  9. Change the battery to be sure damage was not caused by excessive heat, particularly if a recent air conditioning outage has occurred.

Low battery warning is displayed

Symptom

A low battery warning is displayed on the UPS.

Cause

  • The batteries need to be charged.

  • The batteries are failing to hold a charge.

  • The batteries are faulty.

Action

  1. Plug the UPS into an AC grounded outlet for at least 24 hours to charge the batteries.

  2. Test the batteries.

  3. Replace the batteries if necessary.

  4. Be sure the alarm is set appropriately by changing the amount of time given before a low battery warning.

One or more LEDs on the UPS is red

Symptom

One or more of the UPS LEDs is red.

Cause

  • Unsupported hardware.

  • Incomplete population of a memory bank.

  • Connection of the data cable, but not the power cable, of a new device.

Action

  1. Be sure the hardware being installed is a supported option on the server.

  2. Be sure that the new hardware is installed properly.

    To be sure that all requirements are met, see the device, server, and OS documentation.

  3. Be sure no memory, I/O, or interrupt conflicts exist.

  4. Be sure that no loose connections exist.

  5. Be sure that all cables are connected to the correct locations and are the correct lengths.

  6. Be sure that other components were not accidentally unseated during the installation of the new hardware component.

  7. Be sure all necessary software updates, such as device drivers, ROM updates, and patches, are installed and current. Be sure that the correct version for the hardware is installed.

top

General hardware issues

New hardware issues

The hardware is not functioning normally.

Cause

  • Unsupported hardware.

  • Incomplete population of a memory bank.

  • Connection of the data cable, but not the power cable, of a new device.

Action

  1. Be sure the hardware being installed is a supported option on the server.

  2. Be sure the issue is not caused by a change to the hardware release.

  3. Be sure the new hardware is installed properly.

  4. Be sure no memory, I/O, or interrupt conflicts exist.

  5. Be sure no loose connections exist. For more information, click here to view Resolving loose connections .

  6. Be sure all cables are connected to the correct locations and are the correct lengths.

  7. Be sure other components were not accidentally unseated during the installation of the new hardware component.

  8. Be sure all necessary software updates, such as device drivers, ROM updates, and patches, are installed and current, and the correct version for the hardware is installed. For example, if user are using a Smart Array controller, user need the latest Smart Array Controller device driver. Uninstall any incorrect drivers before installing the correct drivers.

  9. After installing or replacing boards or other options, verify that the system recognizes all changes to the hardware in the BIOS/Platform Configuration (RBSU) or in the options setup in UEFI System Utilities. If the new hardware is not configured properly, user may receive a POST error message indicating a configuration error.

  10. Be sure all switch settings are set correctly.

  11. Be sure all boards are properly installed in the server.

  12. Uninstall the new hardware.

Unknown issue

Symptom

The server is not functioning properly, but the specific cause is unknown.

Action

  1. Check the server LEDs to see if any statuses indicate the source of the issue.

  2. Power down and disconnect power to the server. Remove all power sources to the server.

  3. Be sure no loose connections exist.

  4. Following the guidelines and cautionary information in the server documentation, reduce the server to the minimum hardware configuration by removing all cards or devices that are not necessary to power on the server. Keep the monitor connected to view the server power-on process.

  5. Reconnect power, and then power on the system.

    • CAUTION: Only authorized technicians trained by Hewlett Packard Enterprise should attempt to remove the system board. If user believe the system board requires replacement, contact Hewlett Packard Enterprise technical support before proceeding.
    • Before removing or replacing any processors, be sure to follow the guidelines provided in Processor troubleshooting guidelines on page 15. Failure to follow the recommended guidelines can cause damage to the system board, requiring replacement of the system board.
    • If the system fails in this minimum configuration, one of the primary components has failed. If user havealready verified that the processor, power supply, and memory are working before getting to this point,replace the system board. If not, be sure each of those components is working.

    • If the system boots and video is working, add each component back to the server one at a time,restarting the server after each component is added to determine if that component is the cause of theissue. When adding each component back to the server, be sure to disconnect power to the server andfollow the guidelines and cautionary information in the server documentation.

Third-party device issues

Symptom

  • A third-party device is not recognized by the server or the OS.

  • A third-party device is not operating as expected.

Cause

  • The device is not supported on the server or OS.

  • The device is not installed properly.

Action

  1. Verify that the server and operating system support the device.

  2. Verify that the latest device drivers are installed.

  3. Verify that the device is installed properly. Click here for more information on which PCIe technology is supported and for the slot PCIe bus width .

Testing the device

Procedure

  1. Uninstall the device.If the server works when the device is removed, then one of the following issues exists:
    • An issue exists with the device.

    • The server does not support the device.

    • The device conflicts with another device.

  2. If there is only one device on a bus, verify that the bus works by installing a different device on the bus.

  3. To determine if the device is working, install the device:

    1. In a PCIe slot on a different bus.

    2. In the same slot in another working server of the same or similar design.

    Restart the server each time the device is reinstalled.If the board works in any of these slots, either the original slot is bad or the board was not properly seated.Reinstall the board into the original slot to verify.

  4. If user are testing a board (or a device that connects to a board):

    1. Test the board with all other boards removed.

    2. Test the server with only that board removed.

  5. Clear the NVRAM.

  6. Verify that the PCIe device or graphics controller does not need additional power to operate.

top

Internal system issues

Drive issues (hard drives and solid state drives)

Drives are failed

Symptom

The drives are failed.

Action

  1. Be sure no loose connections exist.For more information, click here to view Resolving loose connections .

  2. Check to see if an update is available for any of the following:

    • Smart Array Controller firmware.

    • Dynamic Smart Array driver.

    • Host bus adapter firmware.

    • Expander backplane SEP firmware.

    • System ROM.

  3. Be sure the drive or backplane is cabled properly.

  4. Be sure the drive data cable is working by replacing it with a known functional cable.

  5. Be sure drive blanks are installed properly when the server is operating.Drives may overheat and cause sluggish response or drive failure.

  6. Run HPE SSA and check the status of the failed drive.

  7. Be sure the replacement drives within an array are the same size or larger.

  8. Be sure the replacement drives within an array are the same drive type, such as SAS, SATA, or SSD.

  9. Power cycle the server.

If the drive shows up, check to see if the drive firmware needs to be updated.

Drives are not recognized

Symptom

Drives are not recognized.

Action

  1. Be sure that no power issues exist.

  2. Be sure that no loose connections exist.

  3. Check for available updates on any of the following components:

    • Smart Array Controller firmware.

    • HPE Smart Array S100i SR Gen10 driver.

    • HBA firmware.

    • Expander backplane SEP firmware.

    • System ROM.

  4. Be sure that the drive or backplane is cabled properly.

  5. Check the drive LEDs to be sure that they indicate normal function.

  6. Be sure that the drive is supported.

  7. Power cycle the server.If the drive appears, check to see if the drive firmware requires an update.

  8. Be sure that the drive bay is not defective by installing the hard drive in another bay.

  9. When the drive is a replacement drive on an array controller, be sure that the drive is the same type andof the same or larger capacity than the original drive

  10. When using an array controller, be sure that the drive is configured in an array. Run HPE SSA.

  11. Be sure that the correct controller drivers are installed and that the controller supports the hard drivesbeing installed.

  12. If a storage enclosure is used, be sure that the storage enclosure is powered on.

  13. If a SAS switch is used, be sure that disks are zoned to the server using the Virtual SAS Manager.

  14. If the HPE Smart Array S100i SR Gen10 is installed on the server, be sure that RAID mode is enable.

Data is inaccessible

Symptom

The data on the drives is inaccessible.

Cause

  • The files are corrupt.

  • Viruses exist on the server.

  • A TPM is installed but not properly enabled on the server.

Action

  1. Be sure the files are not corrupt. Run the repair utility for the operating system.

  2. Be sure no viruses exist on the server. Run a current version of a virus scan utility.

  3. When a TPM is installed and is being used with BitLocker, be sure the TPM is enabled .

  4. When a TPM is installed, be sure that the TPM is configured for a mode that is compatible with the OSrunning on the server. Verify that the OS supports the version of TPM installed and configured on theserver.

  5. When a TPM 2.0 is installed, verify that the server is configured for UEFI boot mode.

  6. When migrating encrypted data to a new server, be sure to follow the recovery procedures in the operatingsystem documentation.

Server response time is slower than usual

Symptom

The server response time is slower than usual.

Cause

  • The drive is full.
  • Operating system encryption technology is causing a decrease in performance.
  • A recovery operation is pending on the logical drive.

Action

  1. Be sure the drive is not full. If needed, increase the amount of free space on the drive. Hewlett PackardEnterprise recommends that drives have a minimum of 15 percent free space.

  2. Review information about the operating system encryption technology, which can cause a decrease inserver performance.

  3. Use HPE SSA to verify that a recovery operation is not pending on the logical drive.

HPE SmartDrive icons or LEDs illuminate errors for the wrong drive or an error message isdisplayed in POST, HPE SSA, or HPE SSADUCLI

Symptom

  • HPE SmartDrive icons or LEDs illuminate indicating an error.

  • An error message is displayed in POST, HPE SSA, or HPE SADUCLI.

Action

Verify that the cabling from the drive backplane to the system board is correct.

SSD Smart Wear error

Symptom

A POST message or an IML message is received.

Cause

The device is approaching the maximum usage limit for writes to the device.

Action

Replace the device.

512e Physical drive support

HPE Smart Storage Administrator is able to detect and correct performance issues caused by non-optimallogical drive alignment for 512e physical drives.

The following scenarios indicate drive support is needed:

  • Multiple logical drives exist in a single array.

  • An array consists of one or more 512e physical drives.

  • At least one of the logical drives in the array is not aligned on a native block boundary. For current 512edrives, the native block boundary is 4K.

As a response, HPE SSA will display a warning indicating the logical drive is not optimally aligned and thatperformance of the logical drive will not be optimal. Additionally, the array will present a "Re-align LogicalDrive" button if the following scenario is met:

  • There is enough free space in the array to move the logical drive to be aligned to the native 4K boundary.

  • The controller is capable of performing the transformation (requires a cache module with a fully-chargedbattery or capacitor connected).

  • The controller does NOT have SmartCache enabled.

Diagnosing array problems

Diagnostic tools

To troubleshoot array problems and generate feedback about arrays, use the following diagnostic tools:

  • Event Notification Service

    This utility reports array events to the Microsoft Windows system event log and IML. User can obtain theutility from the SmartStart CD or the Hewlett Packard Enterprise website. When prompted for productinformation, enter the server model name.

  • HPE Insight Diagnostics

    Insight Diagnostics is a tool that displays information about the system hardware configuration andperforms tests on the system and its components, including drives if they are connected to Smart Arraycontrollers. This utility is available on the Hewlett Packard Enterprise website.

  • POST messages

    Smart Array controllers produce diagnostic error messages (POST messages) at reboot. Many POSTmessages suggest corrective actions.

  • HPE Smart Storage Administrator

  • HPE Smart Storage Administrator Diagnostics Utility CLI

    This standalone diagnostic utility provides configuration and error information about array controllers,storage enclosures, drive cages, logical drives, physical drives, and tape drives. For any supported SSDs,the utility provides current usage level and remaining expected lifetime.

Storage controller issues

General controller issues

  • The controller is not visible during the POST process.
  • The controller shows errors during the POST process.

Cause

  • The hardware is physically damaged.

  • The controller is not supported on the server.

  • The controller is not seated properly.

  • The controller is faulty.

  • The firmware is outdated.

Action

  1. Verify that the controller is supported for the server.

  2. Verify that the controller is not physically damaged.

  3. If the controller is recognized by the system BIOS, then reseat the controller.

  4. Run controller diagnostics and follow the steps displayed.

  5. Update the firmware.

  6. Download the Active Health System log and use the AHSV to read, diagnose, and resolve issues.

  7. If user are unable to resolve the issue, submit a case to Hewlett Packard Enterprise technical supportthrough AHSV.

  8. Replace the controller.

Controllers are no longer redundant

Symptom

  • The controller shows errors during the POST process.
  • The cache is disabled.

Cause

  • The hardware on one or more controllers is physically damaged.

  • One or more controllers are not supported on the server.

  • The controllers are not compatible for redundant operation.

  • One or more controllers are not installed properly.

  • The firmware on one or more controllers is outdated or not compatible.

  • The HPE Smart Storage Battery is not installed.

  • The HPE Smart Storage Battery is not connected to the system board properly.

  • The cache module cable is not connected to the PCIe riser board (for controllers installed on a PCIe riserboard).

Action

  1. Verify that the controllers are supported for the server.

  2. Verify that both controllers are installed or seated properly.

  3. Verify that the controllers are compatible controller models.

  4. Verify that the controller firmware versions are compatible and current.

  5. Verify that the controller cache sizes are compatible.

  6. Verify that the HPE Smart Storage Battery is installed and connected properly.

  7. Verify that all controller cabling is connected properly.

  8. If the issue persists, download the Active Health System log.

Data located on drives accessed in RAID mode is not compatible with data accessed fromnon-RAID mode

Symptom

Data located on drives accessed in RAID mode is not compatible with data accessed from non-RAID modeand data located on drives accessed in non-RAID mode is not compatible with data accessed from RAIDmode.

Action

Hewlett Packard Enterprise recommends that user access drive data only when the same RAID or non-RAIDmode is enabled. Back up and restore the data on the drives.

The Smart Array controller does not show logical drives after moving drives to a newserver or JBOD

Symptom

The Smart Array controller does not show logical drives after moving drives to a new server or JBOD.

Cause

A drive migration issue occurred.

Action

Be sure to follow all drive roaming rules when migrating drives.

Drive roaming

Drive roaming lets user move disk drives and arrays while maintaining data availability. User can move one ormore disk drives in a configured logical drive to a different bay position as long as the new bay position isaccessible by the same controller. In addition, user can move a complete array from one controller to another,even if controllers are in different servers. The logical drive status must be good before user move physicaldrives to a new bay position.

Drive roaming is an offline feature.

There is no method for removing an array while the server is online andthen moving it to a new physical location.

Data failure or disk errors on a server with a 10SFF drive backplane or a 12LFF drivebackplane

Symptom

Data failure or disk errors occur on a server with a 10SFF or 12LFF drive backplane.

Cause

The drive backplane is not cabled properly to the controller.

Action

Be sure that the drive backplane ports are connected to only one controller. Only one cable is required toconnect the backplane to the controller. The second port on the backplane is cabled to the controller toprovide additional bandwidth.

HPE Smart Array S100i SR Gen10 drives are not found when RAID mode is disabled

Symptom

The HPE Smart Array S100i SR Gen10 drives are not found when RAID mode is disabled.

Cause

When an HPE Smart Array S100i SR Gen10 is enabled on a server and RAID mode is disabled in the UEFISystem Utilities, then the drives are listed as AHCI drives or HPE H220i drives and the RAID controller is notfound in POST or device manager. When RAID mode is enabled, the drives appear as HPE Smart ArrayS100i SR Gen10 drives.

Action

  1. To access the UEFI System Utilities, press the F9 key during the startup process.

  2. From the Systems Utilities screen, select System Configuration ,BIOS/Platform Configuration(RBSU) ,System Options,SATA Controller Options, Embedded SATA Configuration , EnableDynamic Smart Array RAID Support .

  3. Save user setting.

  4. Reboot the server.

HPE Smart Array S100i SR Gen10 drives are not recognized

Symptom

When installing an OS, the OS installation does not recognize the HPE Smart Array S100i SR Gen10 drives.

Action

  1. Manually install the HPE Smart Array S100i SR Gen10 drivers.

top

Fan and thermal issues

General fan issues

Symptom

Cause

  • The fans are not seated properly.

  • The fan configuration does not meet the functional requirements of the server.

  • The server is not ventilated properly.

  • One or more required fans are not installed.

  • Required fan blanks are not installed.

  • Error messages are displayed during POST or in the IML.

  • One or more fans are not functioning.

Action

  1. Be sure the fans are properly seated and working:a. Follow the procedures and warnings in the server documentation for removing the access panels andaccessing and replacing fans.a. Unseat, and then reseat, each fan according to the proper procedures.b. Replace the access panels, and then attempt to restart the server.

  2. Be sure the fan configuration meets the functional requirements of the server.

  3. Be sure no ventilation issues exist. If the server is operated for an extended period of time with theaccess panel removed, airflow might be impeded, causing thermal damage to components.

  4. Be sure no POST error messages are displayed while booting the server that indicate temperatureviolation or fan failure information.For the temperature requirements for the server.

  5. Use iLO or an optional IML viewer to access the IML to see if any event list error messages relating tofans are listed.

  6. In the iLO web interface, navigate to the Information, System Information page and verify thefollowing information:
    1. Click the Fans tab and verify the fan status and fan speed.

    2. Click the Temperatures tab and verify the temperature readings for each location on theTemperatures tab. If a hot spot is located, then check the airflow path for blockage by cables andother material.

  7. Replace any required non-functioning fans and restart the server.

  8. Be sure all fan slots have fans or blanks installed.

  9. Verify the fan airflow path is not blocked by cables or other material.

  10. For HPE BladeSystem c-Class enclosure fan issues, review the fan section of Onboard AdministratorSHOW ALL and the FAN FRU low-level firmware.

Fans running at a higher than expected speed

Symptom

The fans are running at a higher speed than expected.

Cause

  • An air baffle or blank is missing or not installed properly and causing a disruption of the airflow.

  • The processor heatsink is not installed as indicated in the server documentation.

  • A supported fan is not installed in the server.

Action

  1. Update the server to the latest firmware versions, such as iLO firmware, system BIOS, option firmware,etc.

  2. Verify that all air baffles and required blanks, such as drive blanks, processor heatsink blanks, powersupply blanks, etc., are installed.

  3. Verify that the correct processor heatsink is installed.

  4. Verify that the correct fan is installed, if the system supports both standard fans and performance fans.

Excessive fan noise (high speeds)

Symptom

Fans are operating at high speeds with excessive noise.

Cause

Fans can generate noise if running at a high speed (as expected) or when at low speed if there is an issuewith the fan.

Action

  1. In the iLO web interface, navigate to the Information > System Information page.

  2. Click the Fans tab.

  3. Verify the fan status and fan speed.Fan speeds greater than 60% are expected to be loud.

  4. If the fan is running at a higher speed than expected

Excessive fan noise (low speeds)

Symptom

Abnormal/rattling noise observed at low fan speeds might indicate an issue with the fan.

Action

Replace the fan.

Hot-plug fan issues

Action

  1. Check the LEDs to be sure the hot-plug fans are working.

    NOTE: For servers with redundant fans, backup fans may spin up periodically to test functionality. This ispart of normal redundant fan operation.
  2. Verify that there are no POST error messages displayed.If a POST error message is displayed, complete the steps needed to resolve the error.

  3. Verify that the hot-plug fan meets the requirements for the server.

HPE BladeSystem c-Class enclosure fans are operating at a high speedSymptomAll fans in an HPE BladeSystem c-Class enclosure are operating at a high speed while fans in the otherenclosures are operating at normal speed.

Action

If all fan LEDs are solid green but the fans in the chassis are operating at a higher speed than normal, thenaccess the following information from the Onboard Administrator or iLO:

  • Review the FAN section in OA SHOW ALL to locate the fan zone that is consuming more FAN speed.

  • Verify the virtual FAN value for the affected servers within the FAN zone that indicates which server isconsuming a lot of FAN speed.

  • Possible indicators could be either an outdated ROMBIOS or iLO firmware, or a server repeatedly goingthrough POST or reboot.

If a single fan is operating at approximately 80% and the issue is resolved after resetting the OnboardAdministrator, then upgrade to Onboard Administrator firmware 3.60 or later to resolve the issue.

top

Memory issues

General memory issues

Symptom

A DIMM error occurred or a DIMM failed.

Cause

  • The memory does not meet server requirements.

  • A DIMM has failed.

  • Third-party memory is installed on the server.

  • The DIMM is not properly seated.

Action

  • Isolate and minimize the memory configuration. Use care when handling DIMMs.

  • Be sure that the DIMMs meet the server requirements and is installed as required by the server.Some servers might require that memory channels are populated fully or that all memory within a memorychannel is of the same size, type, and speed.

  • Check any server LEDs that correspond to memory slots.

  • Remove any third-party memory.

  • Update the system ROM to the latest version.

  • Reseat the DIMM.

  • Replace the DIMM.

Isolating and minimizing the memory configuration

When troubleshooting memory issues, sometimes it is necessary to isolate DIMMs in a minimumconfiguration to determine which DIMM failed.PrerequisitesUse care when handling DIMMs.

Procedure

  1. If user is unsure which DIMM has failed, test each channel of DIMMs by removing all other DIMMs.

  2. Isolate the failed DIMM by switching each DIMM in a channel with a known working DIMM.Server is out of memory

Symptom

  • The server is out of memory.

  • A POST error message or an IML message is displayed.

Cause

The memory is not configured properly.

  • An OS error is indicated.

Action

  1. Be sure the memory is configured properly.

  2. Be sure no operating system errors are indicated.

  3. Update the system ROM to the latest version.

DIMM configuration errors

Symptom

A POST error message or an IML message is displayed.

Cause

  • The DIMM configuration does not support the Advanced Memory Protection setting configured for theserver.

  • The memory channel was not populated in the correct order.

  • An unsupported DIMM is installed in the server.

  • The corresponding processor is not installed.

Action
  • Verify that the DIMMs are installed according to the DIMM population guides.

  • Verify that the Advanced Memory Protection settings and DIMMs are installed according to the DIMMpopulation guidelines.

  • Verify that the DIMMs are supported on the server.

  • Be sure that the associated processor is installed for all DIMMs on the server.

  • Update the system ROM to the latest version.

Server fails to recognize existing memory

Symptom

The server does not recognize existing memory.

Cause

  • The server does not support the processor installed in the server.

  • The associated processor is not installed for all DIMMs in the server.

  • The memory is not configured properly.

  • The DIMM is degraded.

  • The DIMM is not installed or seated properly.

Action

  1. Verify that the server supports associated processor installed for the DIMM.

  2. Verify that the associated processor is installed for all DIMMs in the server.

  3. Verify that the memory is configured properly.

  4. Reseat the memory.

  5. Replace all degraded DIMMs.

  6. Update the system ROM.

Server fails to recognize new memory

Symptom

The server does not recognize new memory installed on the server.

Cause

  • The memory is not supported on this server.

  • The memory is not installed according to the server requirements.

  • The memory limits are exceeded for the server.

  • The processor is not supported on the server.

  • The memory is not installed or seated properly.

Action

  1. Be sure the memory is the correct type for the server.

  2. Be sure the memory is installed according to the server requirements.

  3. Be sure to not to exceed the memory limits of the server or operating system.

  4. Be sure the server supports the number of processor cores.Some server models support only 32 cores and this might reduce the amount of memory that is visible.

  5. Be sure no Event List error messages are displayed in the IML.

  6. Be sure the memory is seated properly.

  7. Be sure no conflicts are occurring with existing memory. Run the server setup utility.

  8. Test the memory by installing the memory into a known working server.Be sure the memory meets the requirements of the new server on which user are testing the memory.

  9. Update the system ROM to the latest version.

  10. Replace the memory.

Uncorrectable memory error

Symptom

  • A POST error message or an IML message is displayed.

  • Stop error or blue screen (Windows).

  • Purple diagnostic screen (Linux).

  • Linux kernel panic.

  • A system ?hang?.

  • A system ?freeze?.

  • ASR.

  • Server restarts or powers down unexpectedly.

  • Parity errors occur.

Cause

  • The DIMM is not installed or seated properly.

  • The DIMM has failed.

Action

  1. Reseat the DIMM.

  2. Update the system ROM to the latest version.

  3. If the issue still exists, then replace the DIMM.

Correctable memory error threshold exceeded

Symptom

  • Performance is degraded.
  • The memory LED is amber.
  • ECC errors occur with no other symptoms.

Cause

  • The DIMM is not installed or seated properly.

  • The DIMM has failed.

Action

  1. Update the system ROM to the latest version.

  2. Replace the DIMM.

top

Processor issues

Troubleshooting the processor

Symptom

A POST error message or an IML message is received.

Cause

  • One or more processors are not supported by the server.

  • The processor configuration is not supported by the server.

  • The server ROM is not current.

  • A processor is not seated properly.

  • A processor has failed.

Action

  1. Be sure each processor is supported by the server and is installed as directed in the serverdocumentation. The processor socket requires very specific installation steps and only supportedprocessors should be installed.

  2. Be sure the server ROM is current.

  3. Be sure user are not mixing processor stepping, core speeds, or cache sizes if this is not supported on theserver.

  4. If the server has only one processor installed, reseat the processor. If the issue is resolved after user restartthe server, the processor was not installed properly.

  5. If the server has only one processor installed, replace it with a known functional processor. If the issue isresolved after user restart the server, the original processor failed.

  6. 6. If the server has multiple processors installed, test each processor:
    1. Remove all but one processor from the server. Replace each with a processor terminator board orblank, if applicable to the server.

    2. Replace the remaining processor with a known functional processor. If the issue is resolved after userrestart the server, a fault exists with one or more of the original processors. Install each processor one by one, restarting each time, to find the faulty processor or processors. At each step, be sure the serversupports the processor configurations.

Uncorrectable machine check exception

Symptom

A POST error message or an IML message is received indicating an uncorrectable machine check exception.

Action

CAUTION: Before removing or replacing any processors, be sure to follow the guidelines provided in Processortroubleshooting guidelines on page 15. Failure to follow the recommended guidelines can causedamage to the system board, requiring replacement of the system board.

Replace the processor.

MicroSD card issues

The system does not boot from the microSD cardSymptomThe system is not booting from the drive.

Cause

  • The drive boot order is not set to boot from the microSD card.

  • The microSD card is not detected by iLO.

  • The microSD card is not seated properly.

Action

  1. Be sure the drive boot order in the UEFI System Utilities is set so that the server boots from the microSD card.

  2. Use the iLO web interface to verify that the microSD card is detected by iLO.

  3. Remove all power from the server. Reseat the microSD card, and then power on the server.

top

USB drive key issues

System does not boot from the USB drive key

Symptom

The system does not boot from the USB drive key.

Cause

  • The USB drive key is not enabled in the UEFI System Utilities.

  • The drive boot order is not set to boot from the USB drive key.

  • The USB drive key is not seated properly.

Action

  1. Be sure that USB is enabled in the UEFI System Utilities.

  2. Be sure the drive boot order in the UEFI System Utilities is set so that the server boots from the USB drive key.

  3. Reseat the USB drive key.

  4. Move the USB drive key to a different USB port, if available.

top

Graphics and video adapter issues

Troubleshooting general graphics and video adapter issues.

Cause

  • The graphics or video adapter is not supported on the server.

  • Insufficient power to support the graphics or video adapter.

  • The graphics or video adapter is not installed or seated properly.

Action

  • Use only cards listed as a supported option for the server.

  • Be sure that the power supplies installed in the server provide adequate power to support the server configuration. Some high-power graphics adapters require specific cabling, fans, or auxiliary power.

  • Be sure the adapter is seated properly.

top

External device issues

Video issues

The screen is blank for more than 60 seconds after user power up the server.

Symptom

The screen is blank for more than 60 seconds after the server powered up.

Cause

  • The monitor is not receiving power.

  • The monitor is not cabled properly.

  • The monitor cables are not connected properly.

  • The power is not sufficient for a PCIe device or graphics controller installed on the server.

  • A video expansion board is installed, but is not powered or configured properly.

  • A video expansion board installed on the server is not supported.

  • The video drive is not current.

Action

  1. Be sure the monitor power cord is plugged into a working grounded (earthed) AC outlet.

  2. Power up the monitor and be sure the monitor light is on, indicating that the monitor is receiving power.

  3. Be sure the monitor is cabled to the intended server or KVM connection.

  4. Be sure no loose connections exist by verifying the following connections:

    • For rack-mounted servers, check the cables to the KVM switch and be sure the switch is correctly setfor the server. User might need to connect the monitor directly to the server to be sure the KVM switchhas not failed.

    • For tower model servers, check the cable connection from the monitor to the server, and then fromthe server to the power outlet.

    • For blades, verify that the SUV cable is connected to the VGA cable on the monitor and to theconnector on the front of the blade.

  5. Press any key, or enter the password, and wait for a few moments for the screen to activate to be surethe energy saver feature is not in effect.

  6. Verify that a PCIe device or graphics controller does not need additional power to operate.

  7. Be sure a video expansion board has not been added to replace onboard video, making it seem like thevideo is not working. Disconnect the video cable from the onboard video, and then reconnect it to thevideo jack on the expansion board.

    NOTE: All servers automatically bypass onboard video when a video expansion board is present.
  8. Press any key, or enter the password, and wait for a few moments for the screen to activate to be surethe power-on password feature is not in effect. User can also tell if the power-on password is enabled if akey symbol is displayed on the screen when POST completes. If user do not have access to thepassword, user must disable the power-on password by using the Password Disable switch on thesystem board.

  9. If the video expansion board is installed in a PCI hot-plug slot, be sure the slot has power by checkingthe power LED on the slot, if applicable.

  10. Be sure the server and the OS support the video expansion board.

  11. Be sure the video driver is current. For driver requirements, see the third-party video adapterdocumentation.

Monitor does not function properly with energy saver features.

Symptom

The monitor does not function properly with energy saver features.

Cause

The monitor does not support energy saver features.

Action

Verify that the monitor supports energy saver features. If the monitor does not support energy saver features,disable the features.

Video colors are wrong

Symptom

The video colors are displayed wrong on the monitor.

Cause

  • The video cable is not connected securely to the correct port.

  • The monitor and KVM switch are not compatible with the video output of the server.

  • The video cable is damaged.

Action

  • Be sure the 15 pin VGA cable is securely connected to the correct VGA port on the server and to themonitor.

  • Be sure the monitor and any KVM switch are compatible with the VGA output of the server.

  • Be sure that the VGA cable is not damaged. Replace the cable with a known working cable

Slow-moving horizontal lines are displayed

Symptom

Slow-moving horizontal lines are displayed on the monitor.

Cause

Magnetic field interference is occurring.

Action

Move the monitor away from other monitors or power transformers.

Mouse and keyboard issues

Action

  1. Verify that all cables and cords are securely and properly connected. Check the following:

    • If user are using a KVM switching device, verify that the server is properly connected to the switch.

    • If user have rack-mounted servers, check the cables to the switch box to verify that the switch iscorrectly set for the server.

    • If user have tower model servers, check the cable connection from the input device to the server.

  2. If user are using a KVM switching device, verify that all cables and connectors are the proper length andare supported by the switch.

  3. Be sure the current drivers for the operating system are installed.

  4. Replace the driver with a known functioning driver to verify that the device driver is not corrupted.

  5. Restart the system. Check whether the input device functions correctly after the server restarts.

  6. Replace the device with a known working equivalent device (a similar mouse or keyboard):
    • If the issue still occurs with the new mouse or keyboard, the connector port on the system I/O board isdefective. Replace the board.

    • If the issue no longer occurs, the original input device is defective. Replace the device.

  7. Be sure the keyboard or mouse is connected to the correct port. Determine whether the keyboard lightsflash at POST or the NumLock LED illuminates. If not, change port connections.

  8. Expansion board issues

    Clean the keyboard or mouse.

Expansion board issues

System requests recovery method during expansion board replacement

Symptom

The system requests a recovery method during expansion board replacement on a BitLocker-encryptedserver.

Action

When replacing an expansion board on a BitLocker-encrypted server, always disable BitLocker beforereplacing the expansion board. If BitLocker is not disabled, the system requests the recovery method selectedwhen BitLocker was configured. Failure to provide the correct recovery password or passwords results in lossof access to all encrypted data.

Be sure to enable BitLocker after the installation is complete.

Network controller or FlexibleLOM issuesNetwork controller or FlexibleLOM is installed but not working

Symptom

The network controller or the FlexibleLOM is not working.

Action

  1. Check the network controller or FlexibleLOM LEDs to see if any statuses indicate the source of the issue.

  2. Review the IML for error messages that could indicate the issue.

  3. Be sure no loose connections exist.

  4. Be sure the correct cable type is used for the network speed or that the correct SFP or DAC cable isused. For dual-port 10 GB networking devices, both SFP ports should have the same media (forexample, DAC cable or equivalent SFP+ module). Mixing different types of SFP (SR/LR) on a singledevice is not supported.

  5. Be sure the network cable is working by replacing it with a known functional cable.

  6. Be sure a software issue has not caused the failure.

  7. Be sure the server and operating system support the controller.

  8. Be sure the controller is enabled in the UEFI System Utilities.

  9. Be sure the server ROM is up to date.

  10. Be sure the controller drivers are up to date.

  11. Be sure a valid IP address is assigned to the controller and that the configuration settings are correct.

Network controller or FlexibleLOM has stopped working

Symptom

The network controller or FlexibleLOM stopped working.

Action

  1. Check the network controller or FlexibleLOM LEDs to see if any statuses indicate the source of the issue.

  2. Be sure the correct network driver is installed for the controller and that the driver file is not corrupted.Reinstall the driver.

  3. Be sure no loose connections exist.

  4. Be sure the network cable is working by replacing it with a known functional cable.

  5. Be sure the network controller or FlexibleLOM is not damaged.

Network controller or FlexibleLOM stopped working when an expansion board was added

Symptom

The network controller or FlexibleLOM stopped working when an expansion board was added to the server.

Cause

  • The network controller or FlexibleLOM is not seated or connected properly.

  • The network controller or FlexibleLOM is not supported by the OS or the server.

  • Installation of the network controller or FlexibleLOM changes the server configuration.

  • The network controller or FlexibleLOM drivers are out of date.

  • The driver parameters do not match the configuration of the network controller.

Action

  1. Be sure no loose connections exist.

  2. Be sure the server and operating system support the controller.

  3. Be sure the new expansion board has not changed the server configuration, requiring reinstallation of thenetwork driver:

    1. Uninstall the network controller driver for the malfunctioning controller in the operating system

    2. Restart the server and run the appropriate option in the UEFI System Utilities. Be sure the serverrecognizes the controller and that resources are available for the controller.

    3. Restart the server, and then reinstall the network driver

  4. Be sure the correct drivers are installed.

  5. Be sure that the driver parameters match the configuration of the network controller.

Network interconnect blade issues

Symptom

The network interconnect blade has issues.

Cause

The network interconnect blade is not properly seated or connected.

Action

Be sure the network interconnect blades are properly seated and connected.

top

HPE Smart Storage Battery issues

HPE Smart Storage Battery might lose charge when shelved for long periods of time

Symptom

Any server configured with an HPE Smart Storage Battery for HPE Smart Array Controllers might display aPOST error message stating that the cache module or the HPE Smart Storage Battery failed.

Cause

The HPE Smart Storage Battery discharged to a threshold where it is permanently disabled and must bereplaced

Action

  1. Verify the HPE Smart Storage battery status in iLO.

  2. Download the Active Health System Log

  3. Submit a support case through AHSV.

HPE Smart Storage Battery configuration error

Symptom

Any HPE ProLiant server configured with an HPE Smart Storage Battery for HPE Smart Array Controllersreceive a POST error message or an IML message.CauseThe number of controllers exceeds the installed HPE Smart Storage Battery capacity.

Action

  1. Do one of the following:

    • Ensure that the HPE Smart Storage Battery is fully charged. It may take up to 120 minutes in apowered server or enclosure for the HPE Smart Storage Battery to charge to support the number ofcontrollers in the server or enclosure.

    • If the charge level is insufficient to support the controllers installed in the server, the HPE SmartStorage Battery output might not be enabled while the battery is charging. It might take up to 120minutes in a powered server or enclosure for the battery to charge fully.

    • Verify the HPE Smart Storage battery status in iLO.

    • Remove some of the devices using the HPE Smart Storage Battery. HPE Smart Array Controllers usethe HPE Smart Storage Battery..

HPE Smart Storage Battery failure

Symptom

Any HPE ProLiant server configured with an HPE Smart Storage Battery for HPE Smart Array Controllersreceives a POST error message or an IML message indicating a cache module failure or an HPE SmartStorage Battery failure.

Cause

  • Communication with the HPE Smart Storage Battery failed.

  • The HPE Smart Storage Battery output is not enabled.

Action

  • Verify that the HPE Smart Storage Battery is installed and cabled properly.

  • Verify the HPE Smart Storage battery status in iLO.

  • Update the system ROM.

  • If the issue persists, download the Active Health System Log and send it to a support professional to helpresolve the issue.

top

Cable issues

Drive errors, retries, timeouts, and unwarranted drive failures occur whenusing an older Mini-SAS cable

Symptom

Errors, retries, timeouts, and unwarranted drive failures occur when using an older Mini-SAS cable.

Cause

The Mini-SAS cable might be reaching its life expectancy.ActionThe Mini-SAS connector life expectancy is 250 connect/disconnect cycles (for external, internal, and cableMini-SAS connectors). If using an older cable that could be near the life expectancy, replace the Mini-SAScable.

USB device not recognized, an error message is displayed, or the device doesnot power on when connected to an SUV cable

Symptom

  • The USB device is not recognized when connected to an SUV cable.

  • An error message is displayed.

  • The device does not power on when connected to an SUV cable.

Cause

The USB connectors on the SUV cable do not support devices that require a power source greater than500mA.

Action

Remove the USB device and do one of the following:

  • Attach a USB device that requires a power source less than 500mA.

  • Attach an externally powered USB hub to the SUV cable and connect the USB device to the hub.

top

Legal Disclaimer: Products sold prior to the November 1, 2015 separation of Hewlett-Packard Company into Hewlett Packard Enterprise Company and HP Inc. may have older product names and model numbers that differ from current models.

Provide feedback

Please rate the information on this page to help us improve our content. Thank you!
Document title: HPE ProLiant Gen10 Servers - Troubleshooting Hardware Problems
Document ID: emr_na-a00029456en_us-3
How helpful was this document?
How can we improve this document?
Note: Only English language comments can be accepted at this time.
Please wait while we process your request.