[Lustre-discuss] Singlehomed to multihomed upgrade

Wojciech Turek wjt27 at cam.ac.uk
Thu Jan 8 08:46:39 PST 2009


Hi,

Do you have just one lustre server which serves as OSS and MDS/MGS ?
Can you paste output from   `lctl ping <server_nid>` run on client?
Does the ethernet client has only one interface or is there more?
Did you also set lnet option (in modprobe.conf) on the clients?
Can you send output from `lctl list_nids` run on server(s)
And also output from `tunefs.lustre --print /dev/<lustre_target>` run on 
the server

Cheers

Wojciech



Lukas Hejtmanek wrote:
> Hello,
>
> I have a setup with Lustre server and Lustre clients using o2ib. It works.
> I decided to add more clients, unfortunately the new clients does not have IB
> card. So I added the option on the server:
> options lnet networks="o2ib,tcp0"
>
> /usr/local/lustre/sbin/lctl list_nids
> 10.0.0.1 at o2ib
> 192.168.0.1 at tcp
>
> However, a client using tcp complains about:
> mount -t lustre 192.168.0.1 at tcp:/spfs /mnt/lustre/
> mount.lustre: mount 192.168.0.1 at tcp:/spfs at /mnt/lustre failed: No such file or
> directory
> Is the MGS specification correct?
> Is the filesystem name correct?
> If upgrading, is the copied client log valid? (see upgrade docs)
>
> This is from dmesg:
> LustreError: 15342:0:(events.c:454:ptlrpc_uuid_to_peer()) No NID found for
> 10.0.0.1 at o2ib
> LustreError: 15342:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot find
> peer 10.0.0.1 at o2ib!
> LustreError: 15342:0:(ldlm_lib.c:321:client_obd_setup()) can't add initial
> connection
> LustreError: 17831:0:(connection.c:144:ptlrpc_put_connection()) NULL
> connection
> LustreError: 15342:0:(obd_config.c:336:class_setup()) setup
> spfs-MDT0000-mdc-ffff8801d1d67c00 failed (-2)
> LustreError: 15342:0:(obd_config.c:1074:class_config_llog_handler()) Err -2 on
> cfg command:
> Lustre:    cmd=cf003 0:spfs-MDT0000-mdc  1:spfs-MDT0000_UUID  2:10.0.0.1 at o2ib  
> LustreError: 15c-8: MGC192.168.0.1 at tcp: The configuration from log
> 'spfs-client' failed (-2). This may be the result of communication errors
> between this node and the MGS, a bad configuration, or other errors. See the
> syslog for more information.
> LustreError: 15314:0:(llite_lib.c:1063:ll_fill_super()) Unable to process log:
> -2
> LustreError: 15314:0:(obd_config.c:403:class_cleanup()) Device 2 not setup
> LustreError: 15314:0:(ldlm_request.c:984:ldlm_cli_cancel_req()) Got rc -108
> from cancel RPC: canceling anyway
> LustreError: 15314:0:(ldlm_request.c:1593:ldlm_cli_cancel_list())
> ldlm_cli_cancel_list: -108
> Lustre: client ffff8801d1d67c00 umount complete
> LustreError: 15314:0:(obd_mount.c:1957:lustre_fill_super()) Unable to mount
> (-2)
>
> Is there a way I can upgrade the singlehomed server to the multihomed server?
> Do I really need to setup a router? How does it work? Is there any slowdown
> due to routing?
>
>   

-- 
Wojciech Turek

Assistant System Manager
High Performance Computing Service
University of Cambridge
Email: wjt27 at cam.ac.uk
Tel: (+)44 1223 763517 




More information about the lustre-discuss mailing list