[Lustre-discuss] Singlehomed to multihomed upgrade

Wojciech Turek wjt27 at cam.ac.uk
Thu Jan 8 08:46:39 PST 2009


Do you have just one lustre server which serves as OSS and MDS/MGS ?
Can you paste output from   `lctl ping <server_nid>` run on client?
Does the ethernet client has only one interface or is there more?
Did you also set lnet option (in modprobe.conf) on the clients?
Can you send output from `lctl list_nids` run on server(s)
And also output from `tunefs.lustre --print /dev/<lustre_target>` run on 
the server



Lukas Hejtmanek wrote:
> Hello,
> I have a setup with Lustre server and Lustre clients using o2ib. It works.
> I decided to add more clients, unfortunately the new clients does not have IB
> card. So I added the option on the server:
> options lnet networks="o2ib,tcp0"
> /usr/local/lustre/sbin/lctl list_nids
> at o2ib
> at tcp
> However, a client using tcp complains about:
> mount -t lustre at tcp:/spfs /mnt/lustre/
> mount.lustre: mount at tcp:/spfs at /mnt/lustre failed: No such file or
> directory
> Is the MGS specification correct?
> Is the filesystem name correct?
> If upgrading, is the copied client log valid? (see upgrade docs)
> This is from dmesg:
> LustreError: 15342:0:(events.c:454:ptlrpc_uuid_to_peer()) No NID found for
> at o2ib
> LustreError: 15342:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot find
> peer at o2ib!
> LustreError: 15342:0:(ldlm_lib.c:321:client_obd_setup()) can't add initial
> connection
> LustreError: 17831:0:(connection.c:144:ptlrpc_put_connection()) NULL
> connection
> LustreError: 15342:0:(obd_config.c:336:class_setup()) setup
> spfs-MDT0000-mdc-ffff8801d1d67c00 failed (-2)
> LustreError: 15342:0:(obd_config.c:1074:class_config_llog_handler()) Err -2 on
> cfg command:
> Lustre:    cmd=cf003 0:spfs-MDT0000-mdc  1:spfs-MDT0000_UUID  2: at o2ib  
> LustreError: 15c-8: MGC192.168.0.1 at tcp: The configuration from log
> 'spfs-client' failed (-2). This may be the result of communication errors
> between this node and the MGS, a bad configuration, or other errors. See the
> syslog for more information.
> LustreError: 15314:0:(llite_lib.c:1063:ll_fill_super()) Unable to process log:
> -2
> LustreError: 15314:0:(obd_config.c:403:class_cleanup()) Device 2 not setup
> LustreError: 15314:0:(ldlm_request.c:984:ldlm_cli_cancel_req()) Got rc -108
> from cancel RPC: canceling anyway
> LustreError: 15314:0:(ldlm_request.c:1593:ldlm_cli_cancel_list())
> ldlm_cli_cancel_list: -108
> Lustre: client ffff8801d1d67c00 umount complete
> LustreError: 15314:0:(obd_mount.c:1957:lustre_fill_super()) Unable to mount
> (-2)
> Is there a way I can upgrade the singlehomed server to the multihomed server?
> Do I really need to setup a router? How does it work? Is there any slowdown
> due to routing?

Wojciech Turek

Assistant System Manager
High Performance Computing Service
University of Cambridge
Email: wjt27 at cam.ac.uk
Tel: (+)44 1223 763517 

More information about the lustre-discuss mailing list