[Lustre-discuss] lustre_mgs: operation ... on unconnected MGS

Reto Gantenbein reto.gantenbein at id.unibe.ch
Wed Apr 30 07:33:07 PDT 2008


Hello everybody

I did a clean install of the osts/mgs and now it seems to work without  
these errors. But still it's unclear to me why they appeared or what  
they explicitly mean. Simpler error messages would be a nice thing in  
lustre, especially for newbies.

Cheers,
Reto



On Apr 29, 2008, at 9:06 PM, Reto Gantenbein wrote:

> Dear lustre users
>
> I did setup a lustre file system with 7 osts (fibre-channel raids) and
> an mgs/mdt which are exported via two nodes. One node has the mgs/mdt
> and 3 osts, the other has 4 osts mounted. The nodes are running the
> lustre patched 2.6.18 vanilla kernel. The clients are patchless and  
> are
> running the 2.6.22 gentoo kernel. The lustre-1.6.4.3 is compiled from
> sources under gentoo linux.
>
> The two nodes are called lustre01 and lustre02.
>
> I did format the mgs/mdt on lustre01 with:
> mkfs.lustre --mgs --mdt --fsname=homefs --failnode=lustre02 at tcp
> --reformat /dev/sdb
>
> Then I mounted it and formatted the osts also on lustre01 with:
> mkfs.lustre --ost --mgsnode=lustre01 at tcp --mgsnode=lustre02 at tcp
> --fsname=homefs --failnode=lustre02 at tcp --index=1 /dev/sdc
>
> and so on...
>
> Is there already a general mistake in this installation setup?
>
> The osts are distributed over both servers to enlarge bandwidth and  
> also
> for failover reasons. All osts and mgs are connected to both servers  
> but
> only mounted on a single one.
>
> Now to my problem:
> I mounted the file system from a client with ip 10.1.1.65 and these  
> are
> the messages that appear in the system log:
>
> lustre01 LustreError: 13533:0:(handler.c:148:mds_sendpage()) @@@ bulk
> failed: timeout 0(4096), evicting
> 87fb775c-8f64-5d85-2a95-8fb595e62892 at NET_0x200000a010141_UUID
> lustre01 req at ffff81011dc72e00 x2483/t0
> o37->87fb775c-8f64-5d85-2a95-8fb595e62892 at NET_0x200000a010141_UUID:-1
> lens 296/296 ref 0 fl Interpret:/0/0 rc 0/0
>
> lustre01 LustreError: 13469:0:(ldlm_lib.c: 
> 1442:target_send_reply_msg())
> @@@ processing error (-107)  req at ffff81011d704a00 x2479/t0
> o400-><?>@<?>:-1 lens 128/0 ref 0 fl Interpret:/0/0 rc -107/0
>
> lustre01 LustreError: 13469:0:(handler.c:1499:mds_handle()) operation
> 400 on unconnected MDS from 12345-10.1.1.65 at tcp
>
> lustre01 LustreError: 13535:0:(mgs_handler.c:515:mgs_handle())
> lustre_mgs: operation 101 on unconnected MGS
>
> lustre01 LustreError: 13535:0:(mgs_handler.c:515:mgs_handle())
> lustre_mgs: operation 501 on unconnected MGS
>
> I already tried to find some answers in the net but without much
> success. I cannot find what they mean or where they come from.
>
> Maybe it also helps to show you my device list:
>
> lustre01:
> lctl > device_list
>  0 UP mgs MGS MGS 11
>  1 UP mgc MGC10.1.140.2 at tcp 89b4c0f0-c602-0857-c22e-ed232d8ad7aa 5
>  2 UP mdt MDS MDS_uuid 3
>  3 UP lov homefs-mdtlov homefs-mdtlov_UUID 4
>  4 UP mds homefs-MDT0000 homefs-MDT0000_UUID 5
>  5 UP osc homefs-OST0001-osc homefs-mdtlov_UUID 5
>  6 UP osc homefs-OST0004-osc homefs-mdtlov_UUID 5
>  7 UP osc homefs-OST0005-osc homefs-mdtlov_UUID 5
>  8 UP osc homefs-OST0002-osc homefs-mdtlov_UUID 5
>  9 UP osc homefs-OST0003-osc homefs-mdtlov_UUID 5
> 10 UP osc homefs-OST0006-osc homefs-mdtlov_UUID 5
> 11 UP osc homefs-OST0007-osc homefs-mdtlov_UUID 5
> 12 UP mgc MGC10.1.140.1 at tcp c8ad2ab0-9eef-b334-37af-85734b53ac94 5
> 13 UP ost OSS OSS_uuid 3
> 14 UP obdfilter homefs-OST0001 homefs-OST0001_UUID 7
> 15 UP obdfilter homefs-OST0004 homefs-OST0004_UUID 7
> 16 UP obdfilter homefs-OST0005 homefs-OST0005_UUID 7
>
> lustre02:
> lctl > device_list
>  0 UP mgc MGC10.1.140.1 at tcp 6154baf3-e830-81d9-ff6c-451d107650c1 5
>  1 UP ost OSS OSS_uuid 3
>  2 UP obdfilter homefs-OST0002 homefs-OST0002_UUID 7
>  3 UP obdfilter homefs-OST0003 homefs-OST0003_UUID 7
>  4 UP obdfilter homefs-OST0006 homefs-OST0006_UUID 7
>  5 UP obdfilter homefs-OST0007 homefs-OST0007_UUID 7
>
>
> Can someone give me some hints? What is going wrong here?
>
> Kind regards,
> Reto Gantenbein
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss









More information about the lustre-discuss mailing list