0 result(s) found
No result found

Objective

How to handle the VOLUME_ALARM_SNAPSHOT_FAILURE alarm when it is raised in an HPE Ezmeral Data Fabric cluster.

Environment


MapR Core

Steps

When the VOLUME_ALARM_SNAPSHOT_FAILURE alarm is raised in an HPE Ezmeral Data Fabric cluster it indicates that a snapshot operation failed. This alarm is raised for each volume for which the snapshot operation has failed.

To troubleshoot the cause of the alarm use the following steps:
  1. Review the CLDB logs ($MAPR_HOME/logs/cldb.log) for more information around the time when the alarm was raised. Ex:
    2016-12-05 18:53:55,866 ERROR SnapshotProcessor [RPC-141]: SnapshotCreateDone: VolId: 143680873 SnapshotId: 256041306 Create snapshot failed with status: 61 from fileserver 2016-12-05 19:00:14,620 ERROR SnapshotProcessor [RPC-86]: SnapshotCreateDone: VolId: 143680873 SnapshotId: 256041307 Create snapshot failed with status: 61 from fileserver 2016-12-05 19:00:16,122 WARN Alarms [ACR-8]: Alarm raised: VOLUME_ALARM_SNAPSHOT_FAILURE:143680873:VOLUME_ALARM; Cluster: my.cluster.com; Volume: uservol; Message: Failed to create snapshot 2016-12-05.19-00-00 2016-12-05 20:00:01,876 ERROR SnapshotProcessor [RPC-49]: SnapshotCreateDone: VolId: 143680873 SnapshotId: 256041308 Create snapshot failed with status: 61 from fileserver 2016-12-05 20:00:02,264 WARN Alarms [ACR-8]: Alarm raised: VOLUME_ALARM_SNAPSHOT_FAILURE:143680873:VOLUME_ALARM; Cluster: my.cluster.com; Volume: uservol; Message: Failed to create snapshot 2016-12-05.20-00-00 2016-12-05 21:00:02,187 ERROR SnapshotProcessor [RPC-85]: SnapshotCreateDone: VolId: 143680873 SnapshotId: 256041309 Create snapshot failed with status: 61 from fileserver 2016-12-05 21:00:03,649 WARN Alarms [ACR-33]: Alarm raised: VOLUME_ALARM_SNAPSHOT_FAILURE:143680873:VOLUME_ALARM; Cluster: my.cluster.com; Volume: uservol; Message: Failed to create snapshot 2016-12-05.21-00-00 2016-12-05 21:56:11,641 ERROR SnapshotProcessor [RPC-41]: SnapshotCreateDone: VolId: 143680873 SnapshotId: 256041310 Create snapshot failed with status: 61 from fileserver 2016-12-05 22:00:01,843 ERROR SnapshotProcessor [RPC-93]: SnapshotCreateDone: VolId: 143680873 SnapshotId: 256041311 Create snapshot failed with status: 61 from fileserver 2016-12-05 22:00:04,562 WARN Alarms [ACR-16]: Alarm raised: VOLUME_ALARM_SNAPSHOT_FAILURE:143680873:VOLUME_ALARM; Cluster: my.cluster.com; Volume: uservol; Message: Failed to create snapshot 2016-12-05.22-00-00
  2. The CLDB log will report the cause of the failure, which could be due to a node failure, incorrect permissions, disk space issues, or node specific issues.  In the below example, the snapshot failed due to insufficient permissions.
    2016-06-14 17:00:00,055 INFO CLDBServer [RPC-thread-47]: Allocating WorkUnit type : VOLUME_CREATE_SNAPSHOT for container 0 with sequence number 0 to 10.112.255.21:5660- 2016-06-14 17:00:00,058 ERROR CLDBServer [RPC-thread-45]: SnapshotCreate: from 10.112.255.21:5660 VolId: 116919575 SnapName: 2016-06-14.17-00-00 caller does not have sufficient permissions 2016-06-14 17:00:02,061 WARN Alarms [RPC-thread-17]: Alarm raised: VOLUME_ALARM_SNAPSHOT_FAILURE; Cluster: my.cluster.com; Volume: dbvol; Message: Failed to create snapshot 2016-06-14.17-00-00 2016-06-15 06:06:38,627 ERROR com.mapr.fs.cldb.CLDBServer [RPC-10]: SnapshotCreateDone: VolId: 189897519 SnapshotId: 256000131 Create snapshot failed with status: 119 from fileserver 2016-06-15 06:06:39,157 WARN com.mapr.fs.cldb.alarms.Alarms [ACR-290]: Alarm raised: VOLUME_ALARM_SNAPSHOT_FAILURE; Cluster: my.cluster.com; Volume: uservol; Message: Failed to create snapshot 2016-06-15.06-00-00
  3. Review the MFS logs ($MAPR_HOME/logs/mfs.log-3) on the node hosting the primary replica of the name container for the volume being snapshotted for any errors related to the snapshot operation or the creation of snapshot containers.
    • You can identify the name container and corresponding primary node using the command:
maprcli dump volumeinfo -volumename <name> -json
  1. If the attempted snapshot was a scheduled snapshot that was triggered in the background, try initiating a manual snapshot operation and review the result in the CLDB log.
  2. Based on the symptoms identified above review all known snapshot issues on the Support Portal to determine if the root cause is a software defect and whether there is a fix available.

If the root cause of the alarm cannot be identified from the above logs and diagnostics or it appears the ALARM is due to a software defect and a fix is needed please open a support case with the HPE Ezmeral Data Fabric Support team via the Support Portal and provide all logs and diagnostic data collected as a result of the above steps.

https://docs.datafabric.hpe.com/62/ReferenceGuide/VolumeAlarms-SnapshotFailure.html
Legal Disclaimer: Products sold prior to the November 1, 2015 separation of Hewlett-Packard Company into Hewlett Packard Enterprise Company and HP Inc. may have older product names and model numbers that differ from current models.
Hewlett Packard Enterprise believes in being unconditionally inclusive. Efforts to replace noninclusive terms in our active products are ongoing.
This page has an error. You might just need to refresh it. [NoErrorObjectAvailable] lightningout:client-error:script-setup:https://support.hpe.com/connect/l/%7B%22mode%22%3A%22PROD%22%2C%22dfs%22%3A%228%22%2C%22app%22%3A%22c%3AdceLightningOutApp%22%2C%22loaded%22%3A%7B%22APPLICATION%40markup%3A%2F%2Fc%3AdceLightningOutApp%22%3A%22739_dzZllEEIH7iFM5YnuoPToQ%22%7D%2C%22styleContext%22%3A%7B%22c%22%3A%22other%22%2C%22x%22%3A%5B%223%22%2C%22SLDS%22%2C%22isDesktop%22%5D%2C%22tokens%22%3A%5B%22markup%3A%2F%2Fforce%3AsldsTokens%22%2C%22markup%3A%2F%2Fforce%3Abase%22%2C%22markup%3A%2F%2Fforce%3AformFactorLarge%22%5D%2C%22tuid%22%3A%22614bpEEa2TzAg8-fXeYADg%22%2C%22cuid%22%3A547826658%7D%2C%22pathPrefix%22%3A%22%2Fconnect%22%7D/app.css?3=
wiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwiwi