[Lustre-discuss] Singlehomed to multihomed upgrade
Lukas Hejtmanek
xhejtman at ics.muni.cz
Thu Jan 8 09:31:07 PST 2009
On Thu, Jan 08, 2009 at 04:46:39PM +0000, Wojciech Turek wrote:
> Do you have just one lustre server which serves as OSS and MDS/MGS ?
yes only one lustre server which serves as two OSSs and one MDS/MGS.
> Can you paste output from `lctl ping <server_nid>` run on client?
./lctl ping 192.168.0.1 at tcp
12345-0 at lo
12345-10.0.0.1 at o2ib
12345-192.168.0.1 at tcp
> Does the ethernet client has only one interface or is there more?
only one.
> Did you also set lnet option (in modprobe.conf) on the clients?
no, lnet has no option on the client.
> Can you send output from `lctl list_nids` run on server(s)
# /usr/local/lustre/sbin/lctl list_nids
10.0.0.1 at o2ib
192.168.0.1 at tcp
> And also output from `tunefs.lustre --print /dev/<lustre_target>` run on
> the server
# /usr/local/lustre/sbin/tunefs.lustre --print /dev/Scratch_VG/Scratch_1
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata
Read previous values:
Target: spfs-MDT0000
Index: 0
Lustre FS: spfs
Mount type: ldiskfs
Flags: 0x5
(MDT MGS )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters:
Permanent disk data:
Target: spfs-MDT0000
Index: 0
Lustre FS: spfs
Mount type: ldiskfs
Flags: 0x5
(MDT MGS )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters:
exiting before disk write.
# /usr/local/lustre/sbin/tunefs.lustre --print /dev/Scratch_VG/Scratch_2
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata
Read previous values:
Target: spfs-OST0000
Index: 0
Lustre FS: spfs
Mount type: ldiskfs
Flags: 0x2
(OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=10.0.0.1 at o2ib
Permanent disk data:
Target: spfs-OST0000
Index: 0
Lustre FS: spfs
Mount type: ldiskfs
Flags: 0x2
(OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=10.0.0.1 at o2ib
exiting before disk write.
# /usr/local/lustre/sbin/tunefs.lustre --print /dev/Scratch_VG/Scratch_3
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata
Read previous values:
Target: spfs-OST0001
Index: 1
Lustre FS: spfs
Mount type: ldiskfs
Flags: 0x2
(OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=10.0.0.1 at o2ib
Permanent disk data:
Target: spfs-OST0001
Index: 1
Lustre FS: spfs
Mount type: ldiskfs
Flags: 0x2
(OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=10.0.0.1 at o2ib
exiting before disk write.
>
> Cheers
>
> Wojciech
>
>
>
> Lukas Hejtmanek wrote:
>> Hello,
>>
>> I have a setup with Lustre server and Lustre clients using o2ib. It works.
>> I decided to add more clients, unfortunately the new clients does not have IB
>> card. So I added the option on the server:
>> options lnet networks="o2ib,tcp0"
>>
>> /usr/local/lustre/sbin/lctl list_nids
>> 10.0.0.1 at o2ib
>> 192.168.0.1 at tcp
>>
>> However, a client using tcp complains about:
>> mount -t lustre 192.168.0.1 at tcp:/spfs /mnt/lustre/
>> mount.lustre: mount 192.168.0.1 at tcp:/spfs at /mnt/lustre failed: No such file or
>> directory
>> Is the MGS specification correct?
>> Is the filesystem name correct?
>> If upgrading, is the copied client log valid? (see upgrade docs)
>>
>> This is from dmesg:
>> LustreError: 15342:0:(events.c:454:ptlrpc_uuid_to_peer()) No NID found for
>> 10.0.0.1 at o2ib
>> LustreError: 15342:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot find
>> peer 10.0.0.1 at o2ib!
>> LustreError: 15342:0:(ldlm_lib.c:321:client_obd_setup()) can't add initial
>> connection
>> LustreError: 17831:0:(connection.c:144:ptlrpc_put_connection()) NULL
>> connection
>> LustreError: 15342:0:(obd_config.c:336:class_setup()) setup
>> spfs-MDT0000-mdc-ffff8801d1d67c00 failed (-2)
>> LustreError: 15342:0:(obd_config.c:1074:class_config_llog_handler()) Err -2 on
>> cfg command:
>> Lustre: cmd=cf003 0:spfs-MDT0000-mdc 1:spfs-MDT0000_UUID
>> 2:10.0.0.1 at o2ib LustreError: 15c-8: MGC192.168.0.1 at tcp: The
>> configuration from log
>> 'spfs-client' failed (-2). This may be the result of communication errors
>> between this node and the MGS, a bad configuration, or other errors. See the
>> syslog for more information.
>> LustreError: 15314:0:(llite_lib.c:1063:ll_fill_super()) Unable to process log:
>> -2
>> LustreError: 15314:0:(obd_config.c:403:class_cleanup()) Device 2 not setup
>> LustreError: 15314:0:(ldlm_request.c:984:ldlm_cli_cancel_req()) Got rc -108
>> from cancel RPC: canceling anyway
>> LustreError: 15314:0:(ldlm_request.c:1593:ldlm_cli_cancel_list())
>> ldlm_cli_cancel_list: -108
>> Lustre: client ffff8801d1d67c00 umount complete
>> LustreError: 15314:0:(obd_mount.c:1957:lustre_fill_super()) Unable to mount
>> (-2)
>>
>> Is there a way I can upgrade the singlehomed server to the multihomed server?
>> Do I really need to setup a router? How does it work? Is there any slowdown
>> due to routing?
>>
>>
>
> --
> Wojciech Turek
>
> Assistant System Manager
> High Performance Computing Service
> University of Cambridge
> Email: wjt27 at cam.ac.uk
> Tel: (+)44 1223 763517
--
Lukáš Hejtmánek
More information about the lustre-discuss
mailing list