[Lustre-discuss] Can someone help?

Aaron Knister aaron at iges.org
Sun Dec 30 06:15:44 PST 2007


Aha! I think I've found the problem. It looks like the filesystem name  
on the OSTs is different than the MDT you are trying to use. On the  
MDT you have datafs and on the OSTs it looks like you have spfs. I'm  
guessing you wan the name "datafs" so you need to use tunefs.lustre to  
change the fs name on the OSTs. Here's how--

1. Unmount the OSTs
2. Run "tunefs.lustre --fsname="datafs" /device/of/ost" for each of  
your OSTs
3. Mount everything back up and post the output of "lctl dl" on your  
OSSs and MDSs.

-Aaron

ps "lctl dl" is a quicker way of running cat /proc/fs/lustre/devices

On Dec 30, 2007, at 4:00 AM, Albert Ozilov wrote:

> Hi Aaron,
>
> Thanks in advance for your help.
>
> Here is some Info from my setup:
>
> MDT/MGS server IP is 11.4.3.241
> Client IP is 11.4.3.242.
> OST's IP's are 11.4.3.243 and 11.4.3.244.
> ===========================================
>
> [root at sw241 ~]# cat /proc/fs/lustre/mds/datafs-MDT0000/recovery_status
> status: INACTIVE
> ===========================================
>
> cat /proc/fs/lustre/devices:
>
> [root at sw241 ~]# cat /proc/fs/lustre/devices
> 0 UP mgs MGS MGS 9
> 1 UP mgc MGC11.4.3.241 at o2ib a4bcf80c-5a47-b79d-326a-6520a237ce4a 5
> 2 UP mdt MDS MDS_uuid 3
> 3 UP lov datafs-mdtlov datafs-mdtlov_UUID 4
> 4 UP mds datafs-MDT0000 datafs-MDT0000_UUID 3
> [root at sw241 ~]#
> ===========================================
> [root at sw243 ~]# cat /proc/fs/lustre/devices
> 0 UP mgc MGC11.4.3.241 at o2ib c44415f9-2bd0-420f-99a3-8792c190d45e 5
> 1 UP ost OSS OSS_uuid 3
> 2 UP obdfilter spfs-OST0000 spfs-OST0000_UUID 3
> [root at sw243 ~]#
> ===========================================
> [root at sw244 ~]# cat /proc/fs/lustre/devices
> 0 UP mgc MGC11.4.3.241 at o2ib d4639175-68e7-9482-2edd-e5ee29e3af2a 5
> 1 UP ost OSS OSS_uuid 3
> 2 UP obdfilter spfs-OST0001 spfs-OST0001_UUID 3
> [root at sw244 ~]#
> ===========================================
>
> lctl ping:
>
> [root at sw242 ~]# lctl ping 11.4.3.241 at o2ib
> 12345-0 at lo
> 12345-11.4.3.241 at o2ib
>
> [root at sw242 ~]# lctl ping 11.4.3.243 at o2ib
> 12345-0 at lo
> 12345-11.4.3.243 at o2ib
>
> [root at sw242 ~]# lctl ping 11.4.3.244 at o2ib
> 12345-0 at lo
> 12345-11.4.3.244 at o2ib
> ===========================================
> [root at sw241 ~]# lctl ping 11.4.3.242 at o2ib
> 12345-0 at lo
> 12345-11.4.3.242 at o2ib
>
> [root at sw241 ~]# lctl ping 11.4.3.243 at o2ib
> 12345-0 at lo
> 12345-11.4.3.243 at o2ib
>
> [root at sw241 ~]# lctl ping 11.4.3.244 at o2ib
> 12345-0 at lo
> 12345-11.4.3.244 at o2ib
> ===========================================
>
> Your help is appreciated.
>
> Best Regards,
> Alberto.
>
>
> From: Aaron Knister [mailto:aaron at iges.org]
> Sent: Sunday, December 30, 2007 4:05 AM
> To: Albert Ozilov
> Cc: lustre-discuss at clusterfs.com
> Subject: Re: [Lustre-discuss] Can someone help?
>
> Hi!
>
> Could you post the IPs over your MDT/MGS, OSSs and Client?
>
> Also on the MDT/MGS I'm wondering if the MDT is in recovery. To  
> check recovery status run--
>
> cat /proc/fs/lustre/mds/datafs-MDT0000/recovery_status
>
> and post what you get back.
>
> Also you can do an lnet ping to the various points in your system.  
> The syntax is "lctl ping nid" so to ping 11.4.3.241 at o2ib the syntax  
> would be "lctl ping 11.4.3.241 at o2ib". Also post what you get back  
> from that.
>
> -Aaron
>
> On Dec 29, 2007, at 7:29 PM, Albert Ozilov wrote:
>
>> Hi,
>>
>> I am new user with Lustre file system, I am trying to run over  
>> Infiniband fabric (ofed 1.2.xxx), I am using Lustre version 1.6.4.1
>> I have created mgs/mdt on the same node and two OST's on different  
>> nodes, all looks fine till now.
>> Now I am trying to mount the client node but getting connection  
>> refused from the MGS server,
>>
>> This is my command on the client node:
>> mount -t lustre  11.4.3.241 at o2ib:/datafs   /mnt/testfs
>>
>> This is the dmesg on the MGS node:
>>
>> Lustre: datafs-MDT0000: temporarily refusing client connection from  
>> 11.4.3.242 at o2ib
>> Lustre: Skipped 19 previous similar messages
>> LustreError: 6047:0:(ldlm_lib.c:1442:target_send_reply_msg()) @@@  
>> processing error (-11)  req at ffff810219855850 x6/t0 o38-><?>@<?>:-1  
>> lens 240/0 ref 0 fl Interpret:/0/0 rc -11/0
>> LustreError: 6047:0:(ldlm_lib.c:1442:target_send_reply_msg())  
>> Skipped 19 previous similar messages
>>
>> Is someone familiar with this problem, can someone help?
>>
>> Best Regards,
>> Alberto.
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at clusterfs.com
>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>
> Aaron Knister
> Associate Systems Analyst
> Center for Ocean-Land-Atmosphere Studies
>
> (301) 595-7000
> aaron at iges.org
>
>
>
>

Aaron Knister
Associate Systems Analyst
Center for Ocean-Land-Atmosphere Studies

(301) 595-7000
aaron at iges.org




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20071230/d02e8503/attachment.htm>


More information about the lustre-discuss mailing list