GlusterFS

From CompBio
Jump to: navigation, search

Contents

Installing

rpm -Uvh ~haychoi/files/glusterfs-3.4.1-1/*.rpm
service glusterd start
chkconfig --levels 345 glusterd on

UPGRADED to Gluster 3.4.0-1 on July 22, 2013

UPGRADED to Gluster 3.4.1-1 on Sept 30, 2013

Adding peers

gluster
gluster> peer probe fp13.compbio.washington.edu
gluster> peer probe fp17.compbio.washington.edu
gluster> peer probe fp16.compbio.washington.edu
gluster> peer probe fp22.compbio.washington.edu
gluster> peer status
Number of Peers: 4
 Hostname: fp13.compbio.washington.edu
 Uuid: 509f4cc6-aeb6-4c1e-89eb-5e0a483e85a9
 State: Peer in Cluster (Connected)
 Hostname: fp17.compbio.washington.edu
 Uuid: 999c2054-8485-4164-ae91-022f35bf36b9
 State: Peer in Cluster (Connected)
 Hostname: fp16.compbio.washington.edu
 Uuid: 2ade1b2b-cb31-44f0-825b-778ff1253434
 State: Peer in Cluster (Connected)
 Hostname: fp22.compbio.washington.edu
 Uuid: 896349aa-7bbc-4df8-a8de-397437f9ac20
 State: Peer in Cluster (Connected)

Create Volume(s)

 gluster> volume create VolumeName replica 4 fp13:/glusterfs fp16:/glusterfs fp17:/glusterfs fp22:/glusterfs
 Creation of volume VolumeName has been successful. Please start the volume to access data.
 gluster> volume start VolumeName
 Starting volume VolumeName has been successful

Gluster Options

 # gluster volume set VolumeName performance.io-thread-count 64
 # gluster volume set VolumeName performance.cache-size ???
 # gluster volume set VolumeName performance.write-behind-window-size ???
 # gluster volume set VolumeName cluster.quorum-type auto
 # gluster volume set VolumeName cluster.self-heal-window-size 256
 # gluster volume set VolumeName diagnostics.client-log-level WARNING
 # gluster volume set VolumeName diagnostics.brick-log-level INFO

Expanding Volume(s)

 gluster> volume add-brick VolumeName replica 3 fp2:/glusterfs fp17:/glusterfs fp20:/glusterfs
 Add Brick successful
 
 gluster> volume rebalance VolumeName fix-layout start
 Starting rebalance on volume VolumeName has been successful
 
 gluster> volume rebalance VolumeName start
 Starting rebalance on volume VolumeName has been successful
 gluster> volume rebalance VolumeName status

Mounting MAXG

via NFS (unreliable as of 2013-10-11):

 mount -t nfs -o vers=3,tcp maxg.compbio.washington.edu:/gv0 /maxg

via mount.glusterfs:

 On glusterfs server nodes, mount.glusterfs $HOSTNAME:/gv0 /maxg
 On non-glusterfs server nodes, mount.glusterfs maxg:/gv0 /maxg
 maxg.compbio.washington.edu DNS RR to all gluster servers.

Formerly part of a gluster volume?

 gluster> volume create VolumeName replica 2 fp13:/glusterfs fp16:/glusterfs fp17:/glusterfs fp22:/glusterfs
 /glusterfs or a prefix of it is already part of a volume
 Fix with:
 setfattr -x trusted.glusterfs.volume-id $brick_path
 setfattr -x trusted.gfid $brick_path
 rm -rf $brick_path/.glusterfs
 service glusterd restart

Offline Volume

 If host goes down, Gluster Volume will halt for 42 seconds before continuing with online host:/bricks.
 # gluster volume set VolumeName network.ping-timeout 42 (default)

Self Heal

 Self-Heal proactively runs every 10 minutes in Gluster 3.3+.
 To trigger self-heal on necessary files: 
    # gluster volume heal VolumeName
 For self-healing all files:
    # gluster volume heal VolumeName full
 List files that require healing:
    # gluster volume heal VolumeName info

Compbio Plan

 Triple Replicate Distributed Volume in 21 Bricks/Hosts of 3TB each.
 replicate 0 : fp1  fp21  fp22 : client-0 client-1 client-2
 replicate 1 : fp2  fp17  fp20 : client-3 client-4 client-5
 replicate 2 : fp3  fp16  fp19 : client-6 client-7 client-8
 replicate 3 : fp4  fp13  fp18 : client-9 client-10 client-11 
 replicate 4 : fp5  fp14  fp9  : client-12 client-13 client-14  
 replicate 5 : fp6  fp15  fp10 : client-15 client-16 client-17
 replicate 6 : fp11 fp12  mv2  : client-18 client-19 client-20 
 See: /var/lib/glusterd/vols/gv0 for gluster-client:fp correlation and replicate:clients relationship
 2013 July 31
 [root@mv1 ~]# gluster volume info gv0
 Volume Name: gv0
 Type: Distributed-Replicate
 Volume ID: 0dcf2127-d3af-4766-804e-af0b5fd64218
 Status: Started
 Number of Bricks: 7 x 3 = 21
 Transport-type: tcp
 Bricks:
 Brick1: fp1:/glusterfs
 Brick2: fp21:/glusterfs
 Brick3: fp22:/glusterfs
 Brick4: fp2:/glusterfs
 Brick5: fp17:/glusterfs
 Brick6: fp20:/glusterfs
 Brick7: fp3:/glusterfs
 Brick8: fp16:/glusterfs
 Brick9: fp19:/glusterfs
 Brick10: fp4:/glusterfs
 Brick11: fp13:/glusterfs
 Brick12: fp18:/glusterfs
 Brick13: fp5:/glusterfs
 Brick14: fp14:/glusterfs
 Brick15: fp9:/glusterfs
 Brick16: fp6:/glusterfs
 Brick17: fp15:/glusterfs
 Brick18: fp10:/glusterfs
 Brick19: fp11:/glusterfs
 Brick20: fp12:/glusterfs
 Brick21: mv2:/glusterfs
 Options Reconfigured:
 cluster.min-free-disk: 1%
 auth.allow: 10.20.*.*
 performance.io-thread-count: 8
 cluster.quorum-type: auto
 cluster.self-heal-window-size: 256
 diagnostics.latency-measurement: on
 diagnostics.count-fop-hits: on
 diagnostics.brick-log-level: CRITICAL
 diagnostics.client-log-level: CRITICAL
 Mounts are established manually (for now...) after a machine is online: ~haychoi/bin/localmount.cando.maxg
 Since NODES are shared (processing,storage,etc) make sure gluster is a higher priority via root cronjob:
   */5 * * * *   /root/bin/gluster-renice > /dev/null 2>&1

Compbio Clients

These clients are NOT part of Gluster Servers and mount on either mv{1,2} or maxg.compbio.washington.edu:/gv0. maxg.compbio.washington.edu is a DNS RR.

 abyss, aeon, cosmos, fp8, fp167, time, zen

Clients only need to have the following gluster RPM's installed:

 [root@aeon gluster-3.4.1-1]# rpm -Uvh glusterfs-3.4.1-1.el6.x86_64.rpm glusterfs-fuse-3.4.1-1.el6.x86_64.rpm glusterfs-libs-3.4.1-1.el6.x86_64.rpm
 warning: glusterfs-3.4.1-1.el6.x86_64.rpm: Header V4 RSA/SHA1 Signature, key ID 89ccae8b: NOKEY
 Preparing...                ########################################### [100%]
  1:glusterfs-libs         ########################################### [ 33%]
  2:glusterfs              ########################################### [ 67%]
  3:glusterfs-fuse         ########################################### [100%]
 [root@aeon gluster-3.4.1-1]#

Useful Commands

List all Volume Options:

 gluster volume set help

Change Log Verbosity

 gluster volume set gv0 diagnostics.client-log-level {DEBUG|ERROR|CRITICAL|INFO|WARNING|NONE}
 gluster volume set gv0 diagnostics.brick-log-level {DEBUG|ERROR|CRITICAL|INFO|WARNING|NONE}

Peer Info:

 gluster peer status (does NOT show localhost)

Determining where file resides:

 attr -l 100011.pdb
 getfattr -m . -d 99999.pdb

Unable to empty remove directory?

 [root@mv2 maxg]# rm -rf packages/
 rm: cannot remove `packages/probis_srf/human_srf': Directory not empty
 [root@mv2 maxg]# ls -la packages/probis_srf/human_srf
 total 17
 drwxrwxr-x 2 root root    42 Apr 26 15:10 .
 drwxrwxr-x 7 root root 12681 Apr 25 15:24 ..
 [root@mv2 maxg]# 
 See /var/log/gluster/[mount].log:
 [2013-04-26 15:06:16.357099] W [client3_1-fops.c:327:client3_1_mkdir_cbk] 0-gv0-client-8: 
 remote operation failed: File exists. Path: 
 /packages/probis_srf/human_srf/(2492b967-0337-47c6-b177-110bbb5ebd06)
 Correlate 0-gv0-client-8 to brick and remove brick:/path/to/non-empty-directory.
 Will likely have to remove directories/files from all hosts in replicate.
 [root@mv2 maxg]# rm -rf packages/packages/probis_srf/human_srf
 [root@mv2 maxg]#

Removing directory with large number of files/directories

From: linuxnote.net

Each directory was created with 1,000,000 zero length files.

 Command					Elapsed	System Time	%CPU	cs1 (Vol/Invol)
 rsync -a –-delete empty/ a			10.60	1.31		95	106/22
 find b/ -type f --delete			28.51	14.46		52	14849/11
 find c/ -type f | xargs -L 100 rm2		41.69	20.60		54	37048/15074
 find d/ -type f | xargs -L 100 -P 100 rm2	34.32	27.82		89	929897/21720
 rm -rf f					31.29	14.80		47	15134/11

GlusterFS Server Daemons

 glusterd    = management daemon
 glusterfsd  = per brick daemon
 glustershd  = self-heal daemon
 glusterfs   = client-side nfs daemon

Jobs hung waiting for Gluster

 If jobs are waiting for an unresponsive gluster (kill -9 <JOBID> doesnt work)
   Search for maxg mount and kill process
   [haychoi@fp1 bin]$ ps -ef | grep max[g]
   root      5777     1  1 Sep30 ?        00:27:51 /usr/sbin/glusterfs --volfile-id=/gv0 --volfile-server=mv2 /maxg
   [haychoi@fp1 bin]$ kill 5777

Internal:Data Internal:Data

Internal:Main Internal:Main

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox