[Lustre-discuss] Failure to communicate with MDS via o2ib

Charles Taylor taylor at hpc.ufl.edu
Tue May 27 07:52:25 PDT 2008


Thanks but I'm going to withdraw this for now.   I was too quick on  
the trigger.    We are seeing some issues with LID assignment (upon  
reboot) for the nodes in question on our SM.

Sorry for the wasted BW.

Charlie Taylor
UF HPC Center


On May 27, 2008, at 10:13 AM, Isaac Huang wrote:

> On Tue, May 27, 2008 at 09:50:38AM -0400, Charles Taylor wrote:
>>   Whoops, I meant to include the mount-time error message....
>>
>> /etc/init.d/lustre-client start
>> IB HCA detected - will try to sleep until link state becomes ACTIVE
>>  State becomes ACTIVE
>> Loading Lustre lnet module with option networks=o2ib:      [  OK  ]
>> Loading Lustre kernel module:                              [  OK  ]
>> mount -t lustre 10.13.24.40 at o2ib:/ufhpc /ufhpc/scratch:
>>
>>
>> mount.lustre: mount 10.13.24.40 at o2ib:/ufhpc at /ufhpc/scratch  
>> failed: Cannot
>> send after transport endpoint shutdown
>>                                                           [FAILED]
>> Error: Failed to mount 10.13.24.40 at o2ib:/ufhpc
>> mount -t lustre 10.13.24.90 at o2ib:/crn /crn/scratch:  mount.lustre:  
>> mount
>> 10.13.24.90 at o2ib:/crn at /crn/scratch failed: Cannot send after  
>> transport
>> endpoint shutdown
>>                                                           [FAILED]
>> Error: Failed to mount 10.13.24.90 at o2ib:/crn
>> mount -t lustre 10.13.24.85 at o2ib:/hpcdata /ufhpc/hpcdata:   
>> mount.lustre: mount
>> 10.13.24.85 at o2ib:/hpcdata at /ufhpc/hpcdata failed: Cannot send  
>> after transport
>> endpoint shutdown
>>                                                           [FAILED]
>> Error: Failed to mount 10.13.24.85 at o2ib:/hpcdata
>
> Was there any error message in 'dmesg'? Can you try 'lctl ping
> 10.13.24.90 at o2ib'? (and 'lctl list_nids' and 'lctl --net o2ib
> peer_list' and 'lctl --net o2ib conn_list').
>
> Isaac




More information about the lustre-discuss mailing list