[Lustre-discuss] Singlehomed to multihomed upgrade

Lukas Hejtmanek xhejtman at ics.muni.cz
Thu Jan 8 09:31:07 PST 2009


On Thu, Jan 08, 2009 at 04:46:39PM +0000, Wojciech Turek wrote:
> Do you have just one lustre server which serves as OSS and MDS/MGS ?

yes only one lustre server which serves as two OSSs and one MDS/MGS.

> Can you paste output from   `lctl ping <server_nid>` run on client?

./lctl ping 192.168.0.1 at tcp
12345-0 at lo
12345-10.0.0.1 at o2ib
12345-192.168.0.1 at tcp


> Does the ethernet client has only one interface or is there more?

only one.

> Did you also set lnet option (in modprobe.conf) on the clients?

no, lnet has no option on the client.

> Can you send output from `lctl list_nids` run on server(s)

# /usr/local/lustre/sbin/lctl list_nids
10.0.0.1 at o2ib
192.168.0.1 at tcp


> And also output from `tunefs.lustre --print /dev/<lustre_target>` run on  
> the server

# /usr/local/lustre/sbin/tunefs.lustre  --print /dev/Scratch_VG/Scratch_1
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata

   Read previous values:
Target:     spfs-MDT0000
Index:      0
Lustre FS:  spfs
Mount type: ldiskfs
Flags:      0x5
              (MDT MGS )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters:


   Permanent disk data:
Target:     spfs-MDT0000
Index:      0
Lustre FS:  spfs
Mount type: ldiskfs
Flags:      0x5
              (MDT MGS )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters:

exiting before disk write.

# /usr/local/lustre/sbin/tunefs.lustre  --print /dev/Scratch_VG/Scratch_2
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata

   Read previous values:
Target:     spfs-OST0000
Index:      0
Lustre FS:  spfs
Mount type: ldiskfs
Flags:      0x2
              (OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=10.0.0.1 at o2ib


   Permanent disk data:
Target:     spfs-OST0000
Index:      0
Lustre FS:  spfs
Mount type: ldiskfs
Flags:      0x2
              (OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=10.0.0.1 at o2ib

exiting before disk write.

# /usr/local/lustre/sbin/tunefs.lustre  --print /dev/Scratch_VG/Scratch_3
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata

   Read previous values:
Target:     spfs-OST0001
Index:      1
Lustre FS:  spfs
Mount type: ldiskfs
Flags:      0x2
              (OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=10.0.0.1 at o2ib


   Permanent disk data:
Target:     spfs-OST0001
Index:      1
Lustre FS:  spfs
Mount type: ldiskfs
Flags:      0x2
              (OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=10.0.0.1 at o2ib

exiting before disk write.


>
> Cheers
>
> Wojciech
>
>
>
> Lukas Hejtmanek wrote:
>> Hello,
>>
>> I have a setup with Lustre server and Lustre clients using o2ib. It works.
>> I decided to add more clients, unfortunately the new clients does not have IB
>> card. So I added the option on the server:
>> options lnet networks="o2ib,tcp0"
>>
>> /usr/local/lustre/sbin/lctl list_nids
>> 10.0.0.1 at o2ib
>> 192.168.0.1 at tcp
>>
>> However, a client using tcp complains about:
>> mount -t lustre 192.168.0.1 at tcp:/spfs /mnt/lustre/
>> mount.lustre: mount 192.168.0.1 at tcp:/spfs at /mnt/lustre failed: No such file or
>> directory
>> Is the MGS specification correct?
>> Is the filesystem name correct?
>> If upgrading, is the copied client log valid? (see upgrade docs)
>>
>> This is from dmesg:
>> LustreError: 15342:0:(events.c:454:ptlrpc_uuid_to_peer()) No NID found for
>> 10.0.0.1 at o2ib
>> LustreError: 15342:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot find
>> peer 10.0.0.1 at o2ib!
>> LustreError: 15342:0:(ldlm_lib.c:321:client_obd_setup()) can't add initial
>> connection
>> LustreError: 17831:0:(connection.c:144:ptlrpc_put_connection()) NULL
>> connection
>> LustreError: 15342:0:(obd_config.c:336:class_setup()) setup
>> spfs-MDT0000-mdc-ffff8801d1d67c00 failed (-2)
>> LustreError: 15342:0:(obd_config.c:1074:class_config_llog_handler()) Err -2 on
>> cfg command:
>> Lustre:    cmd=cf003 0:spfs-MDT0000-mdc  1:spfs-MDT0000_UUID  
>> 2:10.0.0.1 at o2ib  LustreError: 15c-8: MGC192.168.0.1 at tcp: The 
>> configuration from log
>> 'spfs-client' failed (-2). This may be the result of communication errors
>> between this node and the MGS, a bad configuration, or other errors. See the
>> syslog for more information.
>> LustreError: 15314:0:(llite_lib.c:1063:ll_fill_super()) Unable to process log:
>> -2
>> LustreError: 15314:0:(obd_config.c:403:class_cleanup()) Device 2 not setup
>> LustreError: 15314:0:(ldlm_request.c:984:ldlm_cli_cancel_req()) Got rc -108
>> from cancel RPC: canceling anyway
>> LustreError: 15314:0:(ldlm_request.c:1593:ldlm_cli_cancel_list())
>> ldlm_cli_cancel_list: -108
>> Lustre: client ffff8801d1d67c00 umount complete
>> LustreError: 15314:0:(obd_mount.c:1957:lustre_fill_super()) Unable to mount
>> (-2)
>>
>> Is there a way I can upgrade the singlehomed server to the multihomed server?
>> Do I really need to setup a router? How does it work? Is there any slowdown
>> due to routing?
>>
>>   
>
> -- 
> Wojciech Turek
>
> Assistant System Manager
> High Performance Computing Service
> University of Cambridge
> Email: wjt27 at cam.ac.uk
> Tel: (+)44 1223 763517 

-- 
Lukáš Hejtmánek



More information about the lustre-discuss mailing list