How To: GlusterFS Split-Brain Recovery/Healing

Recovery from failures on GlusterFS

These are the types of failures that can occur from server crashes, disconnected peers or upgrade failures.

There is a basic Perl script to automatically repair the self-heal failures - gluster-heal.pl.

You will also need the attribute fixer tool gluster-xattr-clear.sh.

A Good Recovery

First, lets take a look at what non-errors look like in a Gluster Recovery. A machine had failed, or a disk died or the network disconnected or whatever. Now that the Brick(s) are back online we need to get them in sync. You may need to run a rebalance - but before that step I like to use the find trick to re-stat everything and trigger self-heals.

I do this one depth at a time; tail the log file while executing the find. You will see the self-heal triggered (shown below). I slowly proceed through the depth of the tree, and I even pause a bit in between runs to give the Gluster servers time to settle. I've had the situation where a find of the whole tree would hang, and caused all kinds of problems; slow and steady.

tail -f /var/log/glusterfs.log &
find /mnt/gluster -maxdepth 1 -noleaf
find /mnt/gluster -maxdepth N -noleaf
find /mnt/gluster -maxdepth N-1 -noleaf

no missing files - /gfs_root/d1234/d4321. proceeding to metadata check
background  gfid self-heal completed on /gfs_root/d1234/d4321
background  gfid self-heal triggered. path: /gfs_root/d1234/d4321
background  gfid self-heal triggered. path: /gfs_root/d1234/d4321
no missing files - /gfs_root/d1234/d4321. proceeding to metadata check
no missing files - /gfs_root/d1234/d4321. proceeding to metadata check
background  gfid self-heal completed on /gfs_root/d1234/d4321
background  gfid self-heal completed on /gfs_root/d1234/d4321
found anomalies in /gfs_root/d1234/d4321. holes=1 overlaps=0

It's showing some issues on directories that are in this volume. There will be lots of spew in the Brick specific logs on the Gluster servers.

Finding Issues

The Gluster team demonstrates that we should use the tool find to crawl our gluster mount point and re-stat all the files. Then gluster magic will happen and the gluster system will self-heal. This is not always the case.

When using find on a suspect gluster volume, it's best to start shallow and work your way down. This will help identify sticky points before they become too serious. Update the volume to at least the ERROR log level. Then tail/grep the log file on the client looking for errors, while at the same time walking the gluster mount point. The output is shown intented.

gluster volume VOLUME set diagnostics.client-log-level ERROR
tail -f /var/log/gluster* |grep ' E ' &
find /mnt/gluster -maxdepth 1
find /mnt/gluster -maxdepth 2
find /mnt/gluster -maxdepth 3
    background  entry self-heal failed on /file/12345/789/2013/fe/0a
    background  entry self-heal failed on /file/12345/789/2013/fd/9f
    background  entry self-heal failed on /file/12345/789/2013/f3/32

So now we have to doctor this directory up. Unmount the client, stop the volume, examine the bricks for this file and rsync if necessary. Now, clear the xattrs, start the volume, remount and start the find process again.

background entry self-heal failed on and Conflicting entries for

background entry self-heal failed on

For this one issue I make sure that I can find an authortative copy of the directories/files in question. Then I use rsync to replicate that over to another server in the GlusterFS pool. Once that is finished, I then use clear the xattrs, restart the volume, re-mount and run a find.

background meta-data data entry missing-entry gfid self-heal failed

Conflicting entries for

Skipping entry self-heal because of gfid absence

VOLUME: path PATH on subvolume VOLUME No such file or directory

This can happen of there are not enough copies of a file to make the replica work properly, like only existing in one brick in a distribute-replicate system - it should be on at least two. The error message is telling you where GlusterFS is looking for the files, in REPLICATE_VOLUME_A

[afr-self-heal-common.c:1054:afr_sh_common_lookup_resp_handler] REPLICATE_VOLUME_A: path PATH on subvolume SUBVOLUME_B => -1 (No such file or directory)

Examine the GlusterFS configuration of this Volume, generally stored in /etc/glusterd/vols/NAME/NAME-fuse.vol. You will find REPLICATE_VOLUME_A name, and then it's sub volume SUBVOLUME_B, then manually sync files to that location, wipe xattrs and then find again.

Identify Failing Files

These file not found errors will show up $replica_count times in the logs. The example below has two replica failures, the Gluster volume is named 'gfsb'.

 E [afr_sh_common_lookup_resp_handler] 0-gfsb-replicate-2: path /img/d1234/d5678/ab/08 on subvolume gfsb-client-5 => -1 (No such file or directory)
 E [afr_sh_common_lookup_resp_handler] 0-gfsb-replicate-2: path /img/d1234/d5678/ab/12 on subvolume gfsb-client-5 => -1 (No such file or directory)
 E [afr_sh_common_lookup_resp_handler] 0-gfsb-replicate-1: path /img/d1234/d5678/ab/08 on subvolume gfsb-client-2 => -1 (No such file or directory)
 E [afr_sh_common_lookup_resp_handler] 0-gfsb-replicate-1: path /img/d1234/d5678/ab/12 on subvolume gfsb-client-2 => -1 (No such file or directory)

Examining the volfile we find that gfsb-replicate-2 has two sub volumes, gfsb-client-4 and gfsb-client-5. Also, in question is gfsb-replicate-1 , sub-volume, gfsb-client-2. And gfsb-client-5 points to gfsm4:/opt/gluster/brick and gfsb-client-2 points to gfsm3:/opt/gluster/brick So the resolution here is rsync from a known good path, to the path in quesiton.

rsync -av /path/to/good/img/d1234/d5678/ab/08/ gfsm4:/opt/gluster/brick/img/d1234/d5678/ab/08/
rsync -av /path/to/good/img/d1234/d5678/ab/08/ gfsm3:/opt/gluster/brick/img/d1234/d5678/ab/08/

Now blank out the xattrs, on the server (make sure the volume is stopped!).

gfsm4: ~ # /opt/edoceo/gluster-xattr-clear.sh /data/gluster/brick/img/d1234/d5678/ab/08/
gfsm4: ~ # /opt/edoceo/gluster-xattr-clear.sh /data/gluster/brick/img/d1234/d5678/ab/08/

Now, restart the volume, remount from a client and run find . -noleaf on that.