[lustre-discuss] issues suddenly mounting

John White jwhite at lbl.gov
Thu Jul 7 09:40:12 PDT 2016

	We’re suddenly having a weird issue where our clients can no longer mount their file system via IB.  They can do it fine via tcp nids but IB is a no go.  Any ideas?  Here’s an attempt with solely an ib0 nid defined:

[root at n0236 ~]# cat /etc/modprobe.d/modprobe.local.conf 
alias net-pf-27 ib_sdp
alias ib0 ib_ipoib

options lnet networks=o2ib(ib0)
#options ko2iblnd ipif_name=ib0
#options ib_mthca msi_x=0 num_cq=131072

#options qla2xxx ql2xfailover=0

alias pppox-proto-1 off
blacklist l2tp_ppp
[root at n0236 ~]# modprobe lustre
[root at n0236 ~]# lctl ping ib-n0006.lustre at o2ib
12345-0 at lo
12345- at o2ib
12345- at tcp
[root at n0236 ~]# mount -t lustre ib-n0006.lustre at o2ib:/vulcan /clusterfs/vulcan/pscratch 
mount.lustre: mount ib-n0006.lustre at o2ib:/vulcan at /clusterfs/vulcan/pscratch failed: Input/output error
Is the MGS running?
[root at n0236 ~]# dmesg
LNet: HW CPU cores: 8, npartitions: 2
alg: No test for crc32 (crc32-table)
alg: No test for adler32 (adler32-zlib)
Lustre: Lustre: Build Version: jenkins-b_ieel2_0_ddn-88-g30a1a9c-PRISTINE-2.6.32-642.el6.x86_64
LNet: Added LNI at o2ib [8/256/0/180]
Lustre: 10362:0:(client.c:1920:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1467909196/real 1467909196]  req at ffff8805f019d800 x1539214325842016/t0(0) o503->MGC10.4.200.6 at o2ib@ at o2ib:26/25 lens 272/8416 e 0 to 1 dl 1467909203 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
LustreError: 166-1: MGC10.4.200.6 at o2ib: Connection to MGS (at at o2ib) was lost; in progress operations using this service will fail
LustreError: 15c-8: MGC10.4.200.6 at o2ib: The configuration from log 'vulcan-client' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
LustreError: 10362:0:(llite_lib.c:1088:ll_fill_super()) Unable to process log: -5
Lustre: Unmounted vulcan-client
Lustre: MGC10.4.200.6 at o2ib: Connection restored to MGS (at at o2ib)
LustreError: 10362:0:(obd_mount.c:1319:lustre_fill_super()) Unable to mount  (-5)
[root at n0236 ~]# 

More information about the lustre-discuss mailing list