Print | Rate this content

HPE Smart Array Controllers - Drive Procedures

Identifying the status of an HPE SmartDrive

HPE SmartDrives are the latest Hewlett Packard Enterprise drive technology, and they are supportedbeginning with ProLiant Gen8 servers and server blades. The SmartDrive is not supported on earliergeneration servers and server blades. Identify a SmartDrive by its carrier, shown in the followingillustration.

When a drive is configured as a part of an array and connected to a powered-up controller, the drive LEDsindicate the condition of the drive.

Solid blue
The drive is being identified by a host application.
Flashing blue
The drive carrier firmware is being updated or requires anupdate.
Activity ring
Rotating green
Drive activity
No drive activity
Do not remove
Solid white
Do not remove the drive. Removing the drive causes one ormore of the logical drives to fail.
Removing the drive does not cause a logical drive to fail.
Drive status
Solid green
The drive is a member of one or more logical drives.
Flashing green
The drive is rebuilding or performing a RAID migration, strip sizemigration, capacity expansion, or logical drive extension, or iserasing.
The drive is a member of one or more logical drives and predictsthe drive will fail.
Flashing amber
The drive is not configured and predicts the drive will fail.
Solid amber
The drive has failed.
The drive is not configured by a RAID controller.

The blue Locate LED is behind the release lever and is visible when illuminated.


Recognizing drive failure

If any of the following occurs, the drive has failed:

  • The drive status LED illuminates amber.

  • When failed drives are located inside the server or storage system and the drive LEDs are notvisible, the Health LED on the front of the server or server blade illuminates. This LED alsoilluminates when other problems occur such as when a fan fails, a redundant power supply fails, orthe system overheats.

  • A POST message lists failed drives when the system is restarted, as long as the controller detects atleast one functional drive.

  • HPE SSA lists all failed drives, and represents failed drives with a distinctive icon.

  • Systems Insight Manager can detect failed drives remotely across a network.For more informationabout Systems Insight Manager, see the documentation on the Insight Management DVD or on theHewlett Packard Enterprise website.

    Click here to access the Systems Insight Manager website .

  • The System Management Homepage (SMH) indicates that a drive has failed.

  • On servers with Windows operating systems, the Event Notification Service posts an event to theserver IML and the Microsoft Windows system event log.

  • On servers with Linux operating systems, Linux agents log the event, create an IML entry, andupdate /var/log/messages.

CAUTION: Sometimes, a drive that has previously been failed by the controller may seem tobe operational after the system is power-cycled or (for a hot-pluggable drive) after the drive hasbeen removed and reinserted. However, continued use of such marginal drives may eventuallyresult in data loss. Replace the marginal drive as soon as possible.


Effects of a hard drive failure on logical drives

When a drive fails, all logical drives that are in the same array are affected. Each logical drive in an arraymight be using a different fault-tolerance method, so each logical drive can be affected differently.

  • RAID 0 configurations do not tolerate drive failure. If any physical drive in the array fails, all RAID 0logical drives in the same array also fail.

  • RAID 1 and RAID 10 configurations tolerate multiple drive failures if no failed drives are mirrored toone another.

  • RAID 5 configurations tolerate one drive failure.

  • RAID 50 configurations tolerate one failed drive in each parity group.

  • RAID 6 configurations tolerate two failed drives at a given time.

  • RAID 60 configurations tolerate two failed drives in each parity group.

  • RAID 1 (ADM) and RAID 10 (ADM) configurations tolerate multiple drive failures if no more than twodrives, mirrored to one another, fail.


Compromised fault tolerance

CAUTION: When fault tolerance is compromised, data loss can occur. However, it may bepossible to recover the data.

If more drives fail than the fault-tolerance method can manage, fault tolerance is compromised, and thelogical drive fails. If this failure occurs, the operating system rejects all requests and indicatesunrecoverable errors.

For example, fault tolerance might occur when a drive in an array fails while another drive in the array isbeing rebuilt.

Compromised fault tolerance can also be caused by problems unrelated to drives. In such cases,replacing the physical drives is not required.


Recovering from compromised fault tolerance

If fault tolerance is compromised, inserting replacement drives does not improve the condition of thelogical volume. Instead, if the screen displays unrecoverable error messages, perform the followingprocedure to recover data:

  1. Power down the entire system, and then power it back up. In some cases, a marginal drive will workagain for long enough to enable you to make copies of important files. If a 1779 POST message isdisplayed, do the following:

    1. Press the F2 key and select Device Health Status.

    2. Select 1779 from the list of errors.

    3. Use the actions on the submenu to re-enable the logical volumes.

      Remember that data loss has probably occurred and any data on the logical volume is suspect.

  2. Make copies of important data, if possible.

  3. Replace any failed drives.

  4. After you have replaced the failed drives, fault tolerance may again be compromised. If so, cycle thepower again. If the 1779 POST message is displayed:

    1. Press the F2 key and select 1779 from the list of errors. Then, use the actions on the submenu tore-enable the logical drives.

    2. Recreate the partitions.

    3. Restore all data from backup.

To minimize the risk of data loss that is caused by compromised fault tolerance, make frequent backups ofall logical volumes.


Moving drives and arrays

You can move drives to other ID positions on the same array controller. You can also move a completearray from one controller to another, even if the controllers are on different servers.

Before moving drives, you must meet the following conditions:

  • If moving the drives to a different server, be sure the new server has enough empty bays toaccommodate all the drives simultaneously.

  • The array does not have failed or missing drives.

  • No spare drive in the array is acting as a replacement for a failed drive.

  • The controller is not performing capacity expansion, capacity extension, or RAID or strip sizemigration.

  • The controller is using the latest firmware version.

  • The server is powered down.

Before you move an array to another controller, you must meet the following conditions:

CAUTION: If the number of physical or logical drives exceeds the limit for the controllermodel and firmware version, then the controller may recognize an unpredictable subset of thedrives, possibly resulting in failed arrays and data loss.
  • If the other controller is connected already to one or more arrays of configured logical drives, the totalnumber of logical drives on the controller after the drives have been moved must not exceed thenumber of logical drives that the controller supports. This number depends on the controller modeland on the controller firmware version.

  • The total number of physical drives on the other controller after the drives have been moved must notexceed the maximum number of supported physical drives for that controller model and firmwareversion.

  • All drives in the array must be moved at the same time.

When all the conditions have been met, move the drives:

  1. Back up all data before removing any drives or changing configuration. This step is required if youare moving data-containing drives from a controller that does not have a cache module.

  2. Power down the system.

  3. Move the drives.

  4. Power up the system.

  5. Observe the POST messages:

    • If a 1785 POST message appears, the drive array did not configure properly. Continue with step6.

    • If a 1724 or 1727 POST message appears, drive positions were changed successfully and theconfiguration was updated. Continue with step 7.

  6. If the array did not configure properly, do the following:

    1. Power down the system immediately to prevent data loss.

    2. Return the drives to their original locations.

    3. Restore the data from backup, if necessary.

  7. Verify the new drive configuration by running HPE SSA.


Replacing drives

The most common reason for replacing a drive is that it has failed. However, another reason is togradually increase the storage capacity of the entire system.

For systems that support hot-pluggable drives, if you replace a failed drive that belongs to a fault-tolerantconfiguration while the system power is on, all drive activity in the array pauses for 1 or 2 seconds whilethe new drive is initializing. When the drive is ready, data recovery to the replacement drive beginsautomatically.

If you replace a drive belonging to a fault-tolerant configuration while the system power is off, a POSTmessage appears when the system is next powered up. This message prompts you to press the F1 key tostart automatic data recovery. If you do not enable automatic data recovery, the logical volume remains ina ready-to-recover condition and the same POST message appears whenever the system is restarted.

Before replacing drives

  • Open Systems Insight Manager, and inspect the Error Counter window for each physical drive in thesame array to confirm that no other drives have any errors.

    Click here to access the Systems Insight Manager website .

  • Be sure that the array has a current, valid backup.

  • Confirm that the replacement drive is of the same type as the degraded drive (either SAS or SATAand either hard drive or solid state drive).

  • Use replacement drives that have a capacity equal to or larger than the capacity of the smallest drivein the array. The controller immediately fails drives that have insufficient capacity.

In systems that use external data storage, be sure that the server is the first unit to be powered down andthe last unit to be powered up. Taking this precaution ensures that the system does not, erroneously,mark the drives as failed when the server is powered up.

In some situations, you can replace more than one drive at a time without data loss. For example:

  • In RAID 1 configurations, drives are mirrored in pairs. You can replace two drives simultaneously ifthey are not mirrored to other removed or failed drives.

  • In RAID 10 configurations, drives are mirrored in pairs. You can replace several drivessimultaneously if they are not mirrored to other removed or failed drives.

  • In RAID 50 configurations, drives are arranged in parity groups. You can replace several drivessimultaneously, if the drives belong to different parity groups. If two drives belong to the same paritygroup, replace those drives one at a time.

  • In RAID 6 configurations, you can replace any two drives simultaneously.

  • In RAID 60 configurations, drives are arranged in parity groups. You can replace several drivessimultaneously, if no more than two of the drives being replaced belong to the same parity group.

  • In RAID 1 (ADM) and RAID 10 (ADM) configurations, drives are mirrored in sets of three. You canreplace up to two drives per set simultaneously.

To remove more drives from an array than the fault tolerance method can support, follow the previousguidelines for removing several drives simultaneously, and then wait until rebuild is complete (asindicated by the drive LEDs) before removing additional drives.


Automatic data recovery (rebuild)

When you replace a drive in an array, the controller uses the fault-tolerance information on the remainingdrives in the array to reconstruct the missing data (the data that was originally on the replaced drive) andthen write the data to the replacement drive. This process is called automatic data recovery or rebuild. Iffault tolerance is compromised, the controller cannot reconstruct the data, and the data is likely lostpermanently.

If another drive in the array fails while fault tolerance is unavailable during rebuild, a fatal system error canoccur, and all data on the array can be lost. However, failure of another drive does not always lead to afatal system error in the following exceptional cases:

  • Failure after activation of a spare drive

  • Failure of a drive that is not mirrored to any other failed drives in the following configurations:

    • RAID 1

    • RAID 10

    • RAID 1 (ADM)

    • RAID 10 (ADM)

  • Failure of a second drive in a RAID 50 or RAID 60 configuration if the two failed drives are in differentparity groups.

  • Failure of a second drive in a RAID 6 configuration.

Time required for a rebuild

The time required for a rebuild varies, depending on several factors:

  • The priority that the rebuild is given over normal I/O operations (you can change the priority settingby using HPE SSA)

  • The amount of I/O activity during the rebuild operation

  • The average bandwidth capability (MBps) of the drives

  • The availability of drive cache

  • The brand, model, and age of the drives

  • The amount of unused capacity on the drives

  • For RAID 5 and RAID 6, the number of drives in the array

  • The strip size of the logical volume

CAUTION: Because data rebuild time operates at the rate of 200GB/15 minutes, the systemcould be unprotected against drive failure for an extended period during data recovery or adrive capacity upgrade. When possible, perform rebuild operations only during periods ofminimal system activity.

When automatic data recovery has finished, the drive status LED changes from flashing green to solidgreen.

If the drive status LED on the replacement drive changes to flashing or solid amber, the rebuild processhas terminated abnormally.

If an abnormal termination of a rebuild occurs, identify the cause and appropriate corrective steps in"Abnormal termination of a rebuild.

Abnormal termination of a rebuild

If the activity LED on the replacement drive permanently ceases to be illuminated even while other drivesin the array are active, the rebuild process has terminated abnormally. The following table indicates thethree possible causes of abnormal termination of a rebuild.

Cause of rebuild termination
None of the drives in the array have anilluminated amber drive status LED.
One of the drives in the array hasexperienced an uncorrectable read error.
The replacement drive has anilluminated amber drive status LED.
The replacement drive has failed.
One of the other drives in the arrayhas an illuminated amber drive statusLED.
The drive with the illuminated amber LEDhas now failed.

Each of these situations requires a different remedial action.

Case 1: An uncorrectable read error has occurred.

  1. Back up as much data as possible from the logical drive.

    CAUTION: Do not remove the drive that has the media error. Doing so causes the logicaldrive to fail.
  2. Restore data from backup. Writing data to the location of the unreadable sector often eliminates theerror.

  3. Remove and reinsert the replacement drive. This action restarts the rebuild process.

If the rebuild process still terminates abnormally:

  1. Delete and recreate the logical drive.

  2. Restore data from backup.

Case 2: The replacement drive has failed.

Verify that the replacement drive is of the correct capacity and is a supported model. If these factors arenot the cause of the problem, use a different drive as the replacement.

Case 3: Another drive in the array has failed.

A drive that has recently failed can sometimes be made temporarily operational again by cycling theserver power.

  1. Power down the server.

  2. Remove the replacement physical drive (the one undergoing a rebuild), and reinstall the drive that itis replacing.

  3. Power up the server.

If the newly failed drive seems to be operational again:

  1. Back up any unsaved data.

  2. Remove the drive that was originally to be replaced, and reinsert the replacement physical drive. Therebuild process automatically restarts.

  3. When the rebuild process has finished, replace the newly failed drive.

However, if the newly failed drive has not recovered:

  1. Remove the drive that was originally to be replaced, and reinsert the replacement physical drive.

  2. Replace the newly failed drive.

  3. Restore data from backup.


Upgrading drive capacity

To upgrade drive capacity:

  1. Back up all user data.

  2. Delete the existing drive configuration.

  3. Remove the existing unconfigured drives, and then install the new unconfigured drives.

  4. Create configurations on the new drives.

  5. Restore user data.

You can use the extra capacity to either create new logical drives or extend existing logical drives.


Adding drives

You can add drives to a system at any time, if you do not exceed the maximum number of drives that thecontroller supports. You can then either build a new array from the added drives or use the extra storagecapacity to expand the capacity of an existing array.

If the drives that you intend to add to the system are already configured into logical drives, you must meetcertain conditions before adding drives to the system. For more information, see "Moving drives andarrays." When you have successfully added the drives, reset the server so that the controllercan recognize the logical drives.

To perform an array capacity expansion, use HPE SSA. If the system uses hot-pluggable drives and HPESSA runs in the same environment as the normal server applications, you can expand array capacity without shutting down the operating system. For more information, see the HPE Smart StorageAdministrator User Guide on the Hewlett Packard Enterprise website.

Click here to access the HPE Smart Storage Administrator User Guide from HPE website .

The expansion process is illustrated in the following figure, in which the original array (containing data) isshown with a dashed border, and the newly added drives (containing no data) are shown unshaded. Thearray controller adds the new drives to the array and redistributes the original logical drives over theenlarged array one logical drive at a time. This process liberates some storage capacity on each physicaldrive in the array. Each logical drive keeps the same fault-tolerance method in the enlarged array that ithad in the smaller array.

When the expansion process has finished, you can use the liberated storage capacity on the enlargedarray to create new logical drives. Alternatively, you can use HPE SSA to enlarge (extend) one of theoriginal logical drives.


Legal Disclaimer: Products sold prior to the November 1, 2015 separation of Hewlett-Packard Company into Hewlett Packard Enterprise Company and HP Inc. may have older product names and model numbers that differ from current models.

Provide feedback

Please rate the information on this page to help us improve our content. Thank you!
Document title: HPE Smart Array Controllers - Drive Procedures
Document ID: emr_na-a00020470en_us-3
How helpful was this document?
How can we improve this document?
Note: Only English language comments can be accepted at this time.
Please wait while we process your request.