[Lustre-discuss] Failure to communicate with MDS via o2ib

Charles Taylor taylor at hpc.ufl.edu
Tue May 27 07:36:07 PDT 2008


Here it is for the one of the other MDSs (10.13.16.24 at o2ib).   As you  
can see, the ipoib ping succeeds but the "lctl ping" fails as does the  
mount.    The last few lines of dmesg are also below.

[root at r5b-s41 ~]# ping 10.13.16.24
PING 10.13.16.24 (10.13.16.24) 56(84) bytes of data.
64 bytes from 10.13.16.24: icmp_seq=0 ttl=64 time=0.168 ms

--- 10.13.16.24 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.168/0.168/0.168/0.000 ms, pipe 2

[root at r5b-s41 ~]# lctl ping 10.13.16.24 at o2ib
failed to ping 10.13.16.24 at o2ib: Input/output error

[root at r5b-s41 ~]# mount -t lustre 10.13.16.24 at o2ib:/ufhpc /ufhpc/scratch
mount.lustre: mount 10.13.16.24 at o2ib:/ufhpc at /ufhpc/scratch failed:  
Cannot send after transport endpoint shutdown


dmesg....

LustreError: 12980:0:(client.c:519:ptlrpc_import_delay_req()) @@@  
IMP_INVALID  req at ffff8102320ee400 x15/t0 o501- 
 >MGS at MGC10.13.16.24@o2ib_0:26 lens 136/120 ref 1 fl Rpc:/0/0 rc 0/0
LustreError: 15c-8: MGC10.13.16.24 at o2ib: The configuration from log  
'ufhpc-client' failed (-108). This may be the result of communication  
errors between this node and the MGS, a bad configuration, or other  
errors. See the syslog for more information.
LustreError: 12980:0:(llite_lib.c:1021:ll_fill_super()) Unable to  
process log: -108
Lustre: client ffff810232fc3800 umount complete
LustreError: 12980:0:(obd_mount.c:1924:lustre_fill_super()) Unable to  
mount  (-108)



Thanks,

Charlie Taylor

On May 27, 2008, at 10:13 AM, Isaac Huang wrote:

> On Tue, May 27, 2008 at 09:50:38AM -0400, Charles Taylor wrote:
>>   Whoops, I meant to include the mount-time error message....
>>
>> /etc/init.d/lustre-client start
>> IB HCA detected - will try to sleep until link state becomes ACTIVE
>>  State becomes ACTIVE
>> Loading Lustre lnet module with option networks=o2ib:      [  OK  ]
>> Loading Lustre kernel module:                              [  OK  ]
>> mount -t lustre 10.13.24.40 at o2ib:/ufhpc /ufhpc/scratch:
>>
>>
>> mount.lustre: mount 10.13.24.40 at o2ib:/ufhpc at /ufhpc/scratch  
>> failed: Cannot
>> send after transport endpoint shutdown
>>                                                           [FAILED]
>> Error: Failed to mount 10.13.24.40 at o2ib:/ufhpc
>> mount -t lustre 10.13.24.90 at o2ib:/crn /crn/scratch:  mount.lustre:  
>> mount
>> 10.13.24.90 at o2ib:/crn at /crn/scratch failed: Cannot send after  
>> transport
>> endpoint shutdown
>>                                                           [FAILED]
>> Error: Failed to mount 10.13.24.90 at o2ib:/crn
>> mount -t lustre 10.13.24.85 at o2ib:/hpcdata /ufhpc/hpcdata:   
>> mount.lustre: mount
>> 10.13.24.85 at o2ib:/hpcdata at /ufhpc/hpcdata failed: Cannot send  
>> after transport
>> endpoint shutdown
>>                                                           [FAILED]
>> Error: Failed to mount 10.13.24.85 at o2ib:/hpcdata
>
> Was there any error message in 'dmesg'? Can you try 'lctl ping
> 10.13.24.90 at o2ib'? (and 'lctl list_nids' and 'lctl --net o2ib
> peer_list' and 'lctl --net o2ib conn_list').
>
> Isaac




More information about the lustre-discuss mailing list