[Lustre-discuss] Failure to communicate with MDS via o2ib
Charles Taylor
taylor at hpc.ufl.edu
Tue May 27 07:52:25 PDT 2008
Thanks but I'm going to withdraw this for now. I was too quick on
the trigger. We are seeing some issues with LID assignment (upon
reboot) for the nodes in question on our SM.
Sorry for the wasted BW.
Charlie Taylor
UF HPC Center
On May 27, 2008, at 10:13 AM, Isaac Huang wrote:
> On Tue, May 27, 2008 at 09:50:38AM -0400, Charles Taylor wrote:
>> Whoops, I meant to include the mount-time error message....
>>
>> /etc/init.d/lustre-client start
>> IB HCA detected - will try to sleep until link state becomes ACTIVE
>> State becomes ACTIVE
>> Loading Lustre lnet module with option networks=o2ib: [ OK ]
>> Loading Lustre kernel module: [ OK ]
>> mount -t lustre 10.13.24.40 at o2ib:/ufhpc /ufhpc/scratch:
>>
>>
>> mount.lustre: mount 10.13.24.40 at o2ib:/ufhpc at /ufhpc/scratch
>> failed: Cannot
>> send after transport endpoint shutdown
>> [FAILED]
>> Error: Failed to mount 10.13.24.40 at o2ib:/ufhpc
>> mount -t lustre 10.13.24.90 at o2ib:/crn /crn/scratch: mount.lustre:
>> mount
>> 10.13.24.90 at o2ib:/crn at /crn/scratch failed: Cannot send after
>> transport
>> endpoint shutdown
>> [FAILED]
>> Error: Failed to mount 10.13.24.90 at o2ib:/crn
>> mount -t lustre 10.13.24.85 at o2ib:/hpcdata /ufhpc/hpcdata:
>> mount.lustre: mount
>> 10.13.24.85 at o2ib:/hpcdata at /ufhpc/hpcdata failed: Cannot send
>> after transport
>> endpoint shutdown
>> [FAILED]
>> Error: Failed to mount 10.13.24.85 at o2ib:/hpcdata
>
> Was there any error message in 'dmesg'? Can you try 'lctl ping
> 10.13.24.90 at o2ib'? (and 'lctl list_nids' and 'lctl --net o2ib
> peer_list' and 'lctl --net o2ib conn_list').
>
> Isaac
More information about the lustre-discuss
mailing list