CTDB 之 50.samba monitor

| 分类 CTDB  | 标签 CTDB 

前言

上一篇文章介绍了CTDB eventscript调用的过程,这一篇我们重点介绍50.samba这个脚本的monitor事件。事实上, 在正常运行过程中,CTDB每过一段时间(默认15秒),就会检查是否正常,具体做法即执行脚本的monitor事件:

2017/05/08 10:45:33.986913 [519718]: server/eventscript.c:762 Starting eventscript monitor 
2017/05/08 10:45:33.987442 [eventscript-00.ctdb-monitor:134206]: Executing event script /etc/ctdb/events.d/00.ctdb monitor 
2017/05/08 10:45:33.994023 [eventscript-01.reclock-monitor:134220]: Executing event script /etc/ctdb/events.d/01.reclock monitor 
2017/05/08 10:45:34.022542 [eventscript-10.interface-monitor:134244]: Executing event script /etc/ctdb/events.d/10.interface monitor 
2017/05/08 10:45:34.067580 [eventscript-11.natgw-monitor:134295]: Executing event script /etc/ctdb/events.d/11.natgw monitor 
2017/05/08 10:45:34.075430 [eventscript-11.routing-monitor:134306]: Executing event script /etc/ctdb/events.d/11.routing monitor 
2017/05/08 10:45:34.087329 [eventscript-13.per_ip_routing-monitor:134312]: Executing event script /etc/ctdb/events.d/13.per_ip_routing monitor 
2017/05/08 10:45:34.096299 [eventscript-20.multipathd-monitor:134331]: Executing event script /etc/ctdb/events.d/20.multipathd monitor 
2017/05/08 10:45:34.096684 [eventscript-31.clamd-monitor:134332]: Executing event script /etc/ctdb/events.d/31.clamd monitor 
2017/05/08 10:45:34.097064 [eventscript-40.vsftpd-monitor:134333]: Executing event script /etc/ctdb/events.d/40.vsftpd monitor 
2017/05/08 10:45:34.103765 [eventscript-40.fs_use-monitor:134349]: Executing event script /etc/ctdb/events.d/40.fs_use monitor 
2017/05/08 10:45:34.104252 [eventscript-41.httpd-monitor:134350]: Executing event script /etc/ctdb/events.d/41.httpd monitor 
2017/05/08 10:45:34.115300 [eventscript-50.samba-monitor:134361]: Executing event script /etc/ctdb/events.d/50.samba monitor 
2017/05/08 10:45:34.160322 [eventscript-60.ganesha-monitor:134423]: Executing event script /etc/ctdb/events.d/60.ganesha monitor 
2017/05/08 10:45:34.160648 [eventscript-60.nfs-monitor:134424]: Executing event script /etc/ctdb/events.d/60.nfs monitor 
2017/05/08 10:45:34.170403 [eventscript-62.cnfs-monitor:134434]: Executing event script /etc/ctdb/events.d/62.cnfs monitor 
2017/05/08 10:45:34.176847 [eventscript-70.iscsi-monitor:134440]: Executing event script /etc/ctdb/events.d/70.iscsi monitor 
2017/05/08 10:45:34.181991 [eventscript-91.lvs-monitor:134444]: Executing event script /etc/ctdb/events.d/91.lvs monitor 
2017/05/08 10:45:34.188665 [519718]: server/eventscript.c:485 Eventscript monitor  finished with state 0

最近一轮执行的结果可以通过ctdb scriptstatus查询:

root@controller01:/var/log/ctdb# ctdb scriptstatus
17 scripts were executed last monitor cycle
00.ctdb              Status:OK    Duration:0.014 Mon May  8 11:51:26 2017
01.reclock           Status:OK    Duration:0.068 Mon May  8 11:51:26 2017
10.interface         Status:OK    Duration:0.106 Mon May  8 11:51:26 2017
11.natgw             Status:OK    Duration:0.024 Mon May  8 11:51:26 2017
11.routing           Status:OK    Duration:0.007 Mon May  8 11:51:26 2017
13.per_ip_routing    Status:OK    Duration:0.015 Mon May  8 11:51:26 2017
20.multipathd        Status:DISABLED    
31.clamd             Status:DISABLED    
40.vsftpd            Status:OK    Duration:0.013 Mon May  8 11:51:26 2017
40.fs_use            Status:DISABLED    
41.httpd             Status:OK    Duration:0.015 Mon May  8 11:51:26 2017
50.samba             Status:OK    Duration:0.101 Mon May  8 11:51:26 2017
60.ganesha           Status:DISABLED    
60.nfs               Status:OK    Duration:0.011 Mon May  8 11:51:26 2017
62.cnfs              Status:OK    Duration:0.009 Mon May  8 11:51:26 2017
70.iscsi             Status:OK    Duration:0.023 Mon May  8 11:51:26 2017
91.lvs               Status:OK    Duration:0.034 Mon May  8 11:51:26 2017

本文通过50.samba为例,介绍ctdb 如何检查samba 运行状态

50.samba monitor 之准备工作

脚本最开始,是这么写的:

. $CTDB_BASE/functions
   
detect_init_style

case $CTDB_INIT_STYLE in
    suse)
        CTDB_SERVICE_SMB=${CTDB_SERVICE_SMB:-smb}
        CTDB_SERVICE_NMB=${CTDB_SERVICE_NMB:-nmb}
        CTDB_SERVICE_WINBIND=${CTDB_SERVICE_WINBIND:-winbind}
        ;;      
    debian)
        CTDB_SERVICE_SMB=${CTDB_SERVICE_SMB:-samba}
        CTDB_SERVICE_NMB=${CTDB_SERVICE_NMB:-""}
        CTDB_SERVICE_WINBIND=${CTDB_SERVICE_WINBIND:-winbind}
        ;;      
    *)  
        # should not happen, but for now use redhat style as default:
        CTDB_SERVICE_SMB=${CTDB_SERVICE_SMB:-smb}
        CTDB_SERVICE_NMB=${CTDB_SERVICE_NMB:-""}
        CTDB_SERVICE_WINBIND=${CTDB_SERVICE_WINBIND:-winbind}
        ;;      
esac

service_name="samba"

loadconfig

ctdb_setup_service_state_dir

首先是引入了functions,很多公用的shell函数都都实现在functions。下面我们看下准备阶段做了哪些事情:

. $CTDB_BASE/functions

要想手动执行 50.samba monitor, 需要首先声明 CTDB_BASE环境变量,对于我们Ubuntu,应该如此执行:

CTDB_BASE=/etc/ctdb bash -x /etc/ctdb/events.d/50.samba moinitor

我们单步执行,看下functions 做了哪些准备

+ . /etc/ctdb/functions
++ PATH=/bin:/usr/bin:/usr/sbin:/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
++ '[' -z '' ']'
++ '[' -d /var/lib/ctdb ']'
++ export CTDB_VARDIR=/var/lib/ctdb
++ CTDB_VARDIR=/var/lib/ctdb
++ '[' -z '' ']'
++ export CTDB_ETCDIR=/etc
++ CTDB_ETCDIR=/etc
++ ctdb_status_dir=/var/lib/ctdb/status
++ ctdb_fail_dir=/var/lib/ctdb/failcount
++ ctdb_managed_dir=/var/lib/ctdb/managed_history
++ tickledir=/var/lib/ctdb/state/tickles
++ mkdir -p /var/lib/ctdb/state/tickles
++ '[' -n '' -a -x '' ']'
++ '[' -x /etc/ctdb/rc.local ']'
++ '[' -d /etc/ctdb/rc.local.d ']'
++ ctdb_set_current_debuglevel
++ '[' -z '' ']'
++ _f=/var/lib/ctdb/eventscript_debuglevel
++ '[' '' = create -o '!' -r /var/lib/ctdb/eventscript_debuglevel ']'
++ . /var/lib/ctdb/eventscript_debuglevel
+++ export CTDB_CURRENT_DEBUGLEVEL=0
+++ CTDB_CURRENT_DEBUGLEVEL=0
++ script_name=50.samba
++ service_name=50.samba
++ service_fail_limit=1
++ event_name=monitor

基本来讲,就是export了几个目录

 CTDB_VARDIR=/var/lib/ctdb
 CTDB_ETCDIR=/etc
 ctdb_status_dir=/var/lib/ctdb/status
 ctdb_fail_dir=/var/lib/ctdb/failcount
 ctdb_managed_dir=/var/lib/ctdb/managed_history
 tickledir=/var/lib/ctdb/state/tickles

这些目录基本都是ctdb的work directory,有状态目录,有失败目录,

root@controller02:/var/lib/ctdb/state# ll
total 36
drwxr-xr-x 9 root root 4096 May  8 10:44 ./
drwxrwxrwx 6 root root 4096 May  8 10:53 ../
drwxr-xr-x 2 root root 4096 May  4 08:03 ctdb/
drwxr-xr-x 3 root root 4096 May  4 08:03 gpfs/
drwxr-xr-x 3 root root 4096 May  8 10:44 interface_modify/
drwxr-xr-x 2 root root 4096 May  4 08:03 nfs/
drwxr-xr-x 2 root root 4096 May  4 08:03 per_ip_routing/
drwxr-xr-x 2 root root 4096 May  8 13:51 samba/
drwxr-xr-x 2 root root 4096 May  4 08:03 tickles/

root@node1:/var/lib/ctdb/failcount# ll
total 8
drwxr-xr-x 2 root root 4096 Apr 11 21:45 ./
drwxrwxrwx 6 root root 4096 Apr 12 21:36 ../
-rw-r--r-- 1 root root    0 May  8 13:55 01.reclock
-rw-r--r-- 1 root root    0 Apr 11 21:45 samba
-rw-r--r-- 1 root root    0 Apr 11 21:45 winbind

root@node1:/var/lib/ctdb/managed_history# ll
total 8
drwxr-xr-x 2 root root 4096 Apr 11 21:45 ./
drwxrwxrwx 6 root root 4096 Apr 12 21:36 ../
-rw-r--r-- 1 root root    0 Apr 11 21:45 samba
-rw-r--r-- 1 root root    0 Apr 11 21:45 winbind

目录下具体文件的含义,我们暂时掠过,后面代码部分详细阐述。

关于function的第二部分是获取 service name和事件号。

++ . /var/lib/ctdb/eventscript_debuglevel
+++ export CTDB_CURRENT_DEBUGLEVEL=0
+++ CTDB_CURRENT_DEBUGLEVEL=0
++ script_name=50.samba
++ service_name=50.samba
++ service_fail_limit=1
++ event_name=monitor

表示,执行的脚本名称是50.samba, service_name是 50.samba,事件为monitor。 代码如下:

ctdb_set_current_debuglevel

script_name="${0##*/}"       # basename
service_name="$script_name"  # default is just the script name      
service_fail_limit=1
event_name="$1"

比较有意思的是 script_name=”${0##*/}” 这一句,这一句的本意是 basename ${0}。这种用法在shell中比较普遍,含义如下:

${VAR##pattern} is the value of variable $VAR with the longest string that matches the pattern removed from the beggining 

loadconfig

第二部分我们简单称之为loadconfig。

detect_init_style

case $CTDB_INIT_STYLE in
    suse)
        CTDB_SERVICE_SMB=${CTDB_SERVICE_SMB:-smb}
        CTDB_SERVICE_NMB=${CTDB_SERVICE_NMB:-nmb}
        CTDB_SERVICE_WINBIND=${CTDB_SERVICE_WINBIND:-winbind}
        ;;      
    debian)
        CTDB_SERVICE_SMB=${CTDB_SERVICE_SMB:-samba}
        CTDB_SERVICE_NMB=${CTDB_SERVICE_NMB:-""}
        CTDB_SERVICE_WINBIND=${CTDB_SERVICE_WINBIND:-winbind}
        ;;      
    *)  
        # should not happen, but for now use redhat style as default:
        CTDB_SERVICE_SMB=${CTDB_SERVICE_SMB:-smb}
        CTDB_SERVICE_NMB=${CTDB_SERVICE_NMB:-""}
        CTDB_SERVICE_WINBIND=${CTDB_SERVICE_WINBIND:-winbind}
        ;;      
esac

service_name="samba"

loadconfig

ctdb_setup_service_state_dir

其中loadconfig是functions中的函数:

_loadconfig() {

    if [ -z "$1" ] ; then
        foo="${service_config:-${service_name}}"
        if [ -n "$foo" ] ; then
            loadconfig "$foo"
        fi
    elif [ "$1" != "ctdb" ] ; then
        loadconfig "ctdb"             
    fi

    if [ -f $CTDB_ETCDIR/sysconfig/$1 ]; then
        . $CTDB_ETCDIR/sysconfig/$1
    elif [ -f $CTDB_ETCDIR/default/$1 ]; then
        . $CTDB_ETCDIR/default/$1
    elif [ -f $CTDB_BASE/sysconfig/$1 ]; then
        . $CTDB_BASE/sysconfig/$1
    fi
}

loadconfig () {
    _loadconfig "$@"
}

这段函数比较简单,就是如果参数不是ctdb,那么先loadconfig ctdb,然后执行加载具体服务的配置文件,对于ubuntu而言,就相当于执行了如下语句:

 . /etc/default/ctdb
 . /etc/default/samba

两者一个是ctdb的默认配置,另一个是samba的默认配置。对于第二者就一句话:

RUN_MODE="daemons"

对于 ctdb的配置有如下:

 CTDB_RECOVERY_LOCK=/var/share/ezfs/ctdb/gwgrp/vs-1000/recovery.lock
 CTDB_PUBLIC_INTERFACE=eth0
 CTDB_PUBLIC_ADDRESSES=/etc/ctdb/public_addresses
 CTDB_MANAGES_SAMBA=yes
 CTDB_MANAGES_WINBIND=yes
 CTDB_SERVICE_SMB=smbd
 CTDB_SERVICE_NMB=nmbd
 CTDB_DEBUGLEVEL=INFO
 CTDB_SERVICE_AUTOSTARTSTOP=yes
 CTDB_SET_RecoveryBanPeriod=60
 CTDB_SET_RecoveryDropAllIPs=65

至此准备工作全部完成,后面看monitor的正文部分:

50.samba monitor

   monitor)
    # Create a dummy file to track when we need to do periodic cleanup
    # of samba databases
    periodic_cleanup_file="$service_state_dir/periodic_cleanup"
    [ -f "$periodic_cleanup_file" ] || {
        touch "$periodic_cleanup_file"
    }
    [ `find "$periodic_cleanup_file" -mmin +$SAMBA_CLEANUP_PERIOD | wc -l` -eq 1 ] && {
        # Cleanup the databases
            periodic_cleanup
        touch "$periodic_cleanup_file"
    }

    is_ctdb_managed_service "samba" && {
        [ "$CTDB_SAMBA_SKIP_SHARE_CHECK" = "yes" ] || {
            testparm_background_update
            
            testparm_cat | egrep '^WARNING|^ERROR|^Unknown' && {
                testparm_foreground_update
                testparm_cat | egrep '^WARNING|^ERROR|^Unknown' && {
                echo "ERROR: testparm shows smb.conf is not clean"
                exit 1
                }
            }
            
            list_samba_shares |
            ctdb_check_directories_probe || {               
                testparm_foreground_update
                list_samba_shares |
                ctdb_check_directories
            } || exit $?
        }                                                    

任务1 periodic_cleanup

samba monitor的第一个任务是周期性的清理工作,按照设计,每超过10分钟,就要执行

       periodic_cleanup
        touch "$periodic_cleanup_file"

实现的方法是在/var/lib/ctdb/samba/下放一个periodic_cleanup文件,每执行完一遍periodic_cleanup,就touch一下该文件。每次执行50.samba monitor的时候,看下该文件修改时间是否超过10分钟,如果是,则执行下一次的periodic_cleanup,否则,本轮不需要执行periodic_cleanup。

    [ `find "$periodic_cleanup_file" -mmin +$SAMBA_CLEANUP_PERIOD | wc -l` -eq 1 ]

说了这么久,那么period_cleanup到底做的是什么事情呢?

# periodic cleanup function
periodic_cleanup() {     
    # running smbstatus scrubs any dead entries from the connections
    # and sessionid database
    # echo "Running periodic cleanup of samba databases"
    smbstatus -np > /dev/null 2>&1 &
}

说实话,不太清除为什么执行smbstatus -np就能起到清理的作用。smbstatus的作用是展示smb连接的。

任务2 检查smb配置文件和端口

    /*smb是否被CTDB托管,毫无疑问是Yes*/
    is_ctdb_managed_service "samba" && {
        /*是否跳过samba share的检查,我们是no,不跳过*/      
        [ "$CTDB_SAMBA_SKIP_SHARE_CHECK" = "yes" ] || {
            testparm_background_update
            
            testparm_cat | egrep '^WARNING|^ERROR|^Unknown' && {
                testparm_foreground_update
                testparm_cat | egrep '^WARNING|^ERROR|^Unknown' && {
                echo "ERROR: testparm shows smb.conf is not clean"
                exit 1
                }
            }
            
            list_samba_shares |
            ctdb_check_directories_probe || {
                testparm_foreground_update
                list_samba_shares |
                ctdb_check_directories
            } || exit $?
        }

        smb_ports="$CTDB_SAMBA_CHECK_PORTS"
        [ -z "$smb_ports" ] && {
            smb_ports=`testparm_cat --parameter-name="smb ports"`
        }
        ctdb_check_tcp_ports $smb_ports || exit $?
    }

检查配置文件

testparm_background_update 这个函数用来后台执行testparm ,检查/etc/samba/smb.conf , 将有效的配置整理后放入 /var/lib/ctdb/state/samba/smb.conf.cache。

如果10秒内,后台进程无法完成testparm的检查,那么就杀死进程,失败返回,一般不会失败,哪怕有个别参数有问题。

完成这一步后,会通过testparm_cat 检查配置中是否’^WARNING ^ERROR ^Unknown’, 如果存在,那么在前台再执行一遍 testparm,生成新的smb.conf.cache, 然后再次检查,如果依然存在 ‘^WARNING ^ERROR ^Unknown’,那么就失败返回,表示当前事件检查出异常。一般而言这种情况比较罕见,我曾经故意修改smb.conf ,比如将 readonly = yes 改成readonly = fuck,依然50.samba monitor依然返回OK,只不过smb.conf.cache中,readonly = yes就不复存在了。

检查share folder是否存在

            list_samba_shares |
            ctdb_check_directories_probe || {
                testparm_foreground_update
                list_samba_shares |
                ctdb_check_directories
            } || exit $?

首先,list_samba_shares 会列出来所有export出去的shared folder

list_samba_shares ()      
{
    testparm_cat |
    sed -n -e 's@^[[:space:]]*path[[:space:]]*=[[:space:]]@@p' |
    sed -e 's/"//g'
}

因为export出去的samba文件夹,都是这种格式

path = /XXX/YYY 因此,list_samba_shares 就是搜索path,来获得目录列表。 通过 -d来检查目录是否存在,如果目录不存在,则返回1,表示失败。

ctdb_check_directories_probe() {
    while IFS="" read d ; do
        case "$d" in
            *%*)
                continue
                ;;
            *)
                [ -d "${d}/." ] || return 1
        esac             
    done
}

ctdb_check_directories() {
    n="${1:-${service_name}}"
    ctdb_check_directories_probe || {
        echo "ERROR: $n directory \"$d\" not available"
        exit 1
    }
}

检查samba用到的端口

        smb_ports="$CTDB_SAMBA_CHECK_PORTS"
        [ -z "$smb_ports" ] && {
            smb_ports=`testparm_cat --parameter-name="smb ports"`
        }
        ctdb_check_tcp_ports $smb_ports || exit $?   

相当于执行:

testparm -s /var/lib/ctdb/state/samba/smb.conf.cache '--parameter-name=smb ports' 2>/dev/null 
445 139

samba占用的两个端口是445和139:

root@controller03:/var/lib/ctdb/state/samba# netstat -antp |grep smb
tcp        0      0 0.0.0.0:445             0.0.0.0:*               LISTEN      211760/smbd     
tcp        0      0 0.0.0.0:139             0.0.0.0:*               LISTEN      211760/smbd 

检查端口占用执行的是ctdb_check_tcp_ports 函数,该函数的实现在functions中。

ctdb_check_tcp_ports()
{
    if [ -z "$1" ] ; then
        echo "INTERNAL ERROR: ctdb_check_tcp_ports - no ports specified"
        exit 1
    fi

    # Set default value for CTDB_TCP_PORT_CHECKS if unset.
    # If any of these defaults are unsupported then this variable can                                                                                  
    # be overridden in /etc/sysconfig/ctdb or via a file in
    # /etc/ctdb/rc.local.d/.
    : ${CTDB_TCP_PORT_CHECKERS:=ctdb nmap netstat}

    for _c in $CTDB_TCP_PORT_CHECKERS ; do
        ctdb_check_tcp_ports_$_c "$@"
        case "$?" in
            0)
                _ctdb_check_tcp_common
                rm -f "$_ctdb_service_started_file"
                return 0
                ;;
            1)
                _ctdb_check_tcp_common
                if [ ! -f "$_ctdb_service_started_file" ] ; then
                    echo "ERROR: $service_name tcp port $_p is not responding"
                    debug <<EOF
$ctdb_check_tcp_ports_debug
EOF
                else
                    echo "INFO: $service_name tcp port $_p is not responding"
                fi

                return 1
                ;;
            127)
                debug <<EOF
ctdb_check_ports - checker $_c not implemented
output from checker was:
$ctdb_check_tcp_ports_debug
EOF
                ;;
            *)
                
        esac
    done

    echo "INTERNAL ERROR: ctdb_check_ports - no working checkers in CTDB_TCP_PORT_CHECKERS=\"$CTDB_TCP_PORT_CHECKERS\"
       
    return 127
}

functions中实现了三个函数,来判断端口是否已经被占用:

ctdb_check_tcp_ports_ctdb
ctdb_check_tcp_ports_nmap
ctdb_check_tcp_ports_netstat

使用这三个函数来判断端口使用情况。如果端口没有被使用,自如地bind到对应端口,说明samba进程不正常。对于samba而言,使用的是第一个函数:ctdb_check_tcp_ports_ctdb

ctdb_check_tcp_ports_ctdb ()
{
    for _p ; do  # process each function argument (port)
        _cmd="ctdb checktcpport $_p"
        _out=$($_cmd 2>&1)
        _ret=$?
        case "$_ret" in
            0)
                ctdb_check_tcp_ports_debug="\"$_cmd\" was able to bind to port"
                return 1
                ;;
            98)
                # Couldn't bind, something already listening, next port...
                continue
                ;;
            *)
                ctdb_check_tcp_ports_debug="$_cmd (exited with $_ret) with output:
$_out"
                # assume not implemented
                return 127
        esac
    done
    return 0

执行ctdb checktcpport 445 命令,注意,返回98表示无法bind,即端口已经被占用,这说明程序运行的好好的,返回0。 如果ctdb checktcpport 445 返回0,表示,因为没有进程在使用445,因此bind成功,这种情况反倒说明samba进程不存在,因此,ctdb_check_tcp_ports_ctdb返回1。

另外一种情况是ctdb checktcpport 不存在这种命令,就返回127 ,对于ctdb_check_tcp_ports 函数,会接下来尝试 ctdb_check_tcp_ports_nmap 和ctdb_check_tcp_ports_netstat 函数,不赘述了。

如果我们通过 /etc/init.d/smbd stop 将samba停掉:

50.samba             Status:ERROR    Duration:0.136 Mon May  8 17:53:00 2017
   OUTPUT:ERROR: samba tcp port 445 is not responding

任务3 winbind 检查

    check_ctdb_manages_winbind && {
        ctdb_check_command "winbind" "wbinfo -p"
    }

monitor的第三个任务是检查winbind是否健康:

简单地说就是执行wbinfo -p命令,查看是否正常返回。 wbinfo -p的含义是:

-p|--ping
Check whether winbindd(8) is still alive. Prints out either 'succeeded' or 'failed'.

ping 一下winbindd进程,查看进程是否活着。

root@controller01:~# wbinfo -p
Ping to winbindd succeeded
root@controller01:~# echo $?
0

如果通过 /etc/init.d/winbind stop命令,将winbind停掉,我们可以看到如下输出:

50.samba             Status:ERROR    Duration:0.061 Mon May  8 17:57:08 2017
   OUTPUT:ERROR: winbind - wbinfo -p returned error

结尾

至此,05.samba monitor 我们就完整的分析了一遍。在此基础上,也熟悉了部分functions的函数。


上一篇     下一篇