[Lustre-discuss] new OST not mounting up

Evan Felix evan.felix at pnl.gov
Mon Mar 2 12:46:02 PST 2009


Ok, try this..  I know you¹ve done it already, but it may help us to
understand:

1. reboot both the ost and the MGS machine
2. run tunefs.lustre ‹writeconf /dev/<device>
3. mount the mgs, then the ost
4. run dmesg on both servers so we can see what that first mount failure
says.

Since this also looks like they are on the same machine we may only get one
dmesg output here..  Also it looks like you are having trouble communicating
with the server at times. Is it possible there are communication errors.

What version of lustre is this?

evan

On 3/1/09 5:19 PM, "Mag Gam" <magawake at gmail.com> wrote:

> OK, I did a tunefs.lustre --writeconf /dev/
> 
> and tried to mount it up, still the same error. "Operation already in
> progress". The target service is already running.
> 
> I am not sure whatelse I can try...
> 
> Any suggestions?
> 
> TIA
> 
> On Sat, Feb 28, 2009 at 8:39 AM, Mag Gam <magawake at gmail.com> wrote:
>> > (sorry adding the entire list for Evan's reponse)
>> >
>> > Thankyou got getting back to me on this.
>> > So, when I try to mount the **new** ost I keep getting these messages.
>> >
>> > For some reason the new OST is active on the MGS side which I am not
>> > sure why.  I think I made a mistake by trying to mount up a new OST
>> > while clients were still active.
>> >
>> >
>> > When I try to activaste  the bad OST.I get this message.
>> >
>> > Lustre: 11647:0:(ldlm_lib.c:736:target_handle_connect())
>> > lfs001-OST0005: cookie lfs001-mdtlov_UUID seen on new NID
>> > mds_ip_addr at tcp when existing NID 0 at lo is already connected
>> > Feb 27 11:59:01 oss_server kernel: Lustre:
>> > 11647:0:(ldlm_lib.c:736:target_handle_connect()) Skipped 4 previous
>> > similar messages
>> > Feb 27 11:59:01 mds_server kernel: Lustre:
>> > 3426:0:(import.c:411:import_select_connection()) lfs001-OST0005-osc:
>> > tried all connections, increasing latency to 51s
>> > Feb 27 11:59:01 oss_server kernel: LustreError:
>> > 11647:0:(ldlm_lib.c:1614:target_send_reply_msg()) @@@ processing error
>> > (-114)  req at ffff8104251a4400 x388745/t0 o8-><?>@<?>:0/0 lens 240/144 e
>> > 0 to 0 dl 1235754041 ref 1 fl Interpret:/0/0 rc -114/0
>> > Feb 27 11:59:01 mds_server kernel: Lustre:
>> > 3426:0:(import.c:411:import_select_connection()) Skipped 6 previous
>> > similar messages
>> > Feb 27 11:59:01 mds_server kernel: LustreError: 11-0: an error
>> > occurred while communicating with oss_ip at tcp. The ost_connect
>> > operation failed with -114
>> > Feb 27 11:59:01 mds_server kernel: LustreError: Skipped 12 previous
>> > similar messages
>> >
>> >
>> >  oss_server kernel: LustreError:
>> > 11556:0:(ldlm_lib.c:1614:target_send_reply_msg()) @@@ processing error
>> > (-114)  req at ffff81042150a000 x388953/t0 o8-><?>@<?>:0/0 lens 240/144 e
>> > 0 to 0 dl 1235754240 ref 1 fl Interpret:/0/0 rc -114/0
>> >
>> >
>> >
>> > Also, I was wondering if there was a way to reset the state of my OST.
>> > Its keep thinking its already mounted. Even after a reboot. Any way to
>> > say "hey, I am not mounted" ? :-)
>> >
>> > TIA
>> >
>> > On Sat, Feb 28, 2009 at 8:38 AM, Mag Gam <magawake at gmail.com> wrote:
>>> >> Thankyou got getting back to me on this.
>>> >> So, when I try to mount the **new** ost I keep getting these messages.
>>> >>
>>> >> For some reason the new OST is active on the MGS side which I am not
>>> >> sure why.  I think I made a mistake by trying to mount up a new OST
>>> >> while clients were still active.
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> When I try to activaste  the bad OST.I get this message.
>>> >>
>>> >> Lustre: 11647:0:(ldlm_lib.c:736:target_handle_connect())
>>> >> lfs001-OST0005: cookie lfs001-mdtlov_UUID seen on new NID
>>> >> mds_ip_addr at tcp when existing NID 0 at lo is already connected
>>> >> Feb 27 11:59:01 oss_server kernel: Lustre:
>>> >> 11647:0:(ldlm_lib.c:736:target_handle_connect()) Skipped 4 previous
>>> >> similar messages
>>> >> Feb 27 11:59:01 mds_server kernel: Lustre:
>>> >> 3426:0:(import.c:411:import_select_connection()) lfs001-OST0005-osc:
>>> >> tried all connections, increasing latency to 51s
>>> >> Feb 27 11:59:01 oss_server kernel: LustreError:
>>> >> 11647:0:(ldlm_lib.c:1614:target_send_reply_msg()) @@@ processing error
>>> >> (-114)  req at ffff8104251a4400 x388745/t0 o8-><?>@<?>:0/0 lens 240/144 e
>>> >> 0 to 0 dl 1235754041 ref 1 fl Interpret:/0/0 rc -114/0
>>> >> Feb 27 11:59:01 mds_server kernel: Lustre:
>>> >> 3426:0:(import.c:411:import_select_connection()) Skipped 6 previous
>>> >> similar messages
>>> >> Feb 27 11:59:01 mds_server kernel: LustreError: 11-0: an error
>>> >> occurred while communicating with oss_ip at tcp. The ost_connect
>>> >> operation failed with -114
>>> >> Feb 27 11:59:01 mds_server kernel: LustreError: Skipped 12 previous
>>> >> similar messages
>>> >>
>>> >>
>>> >>  oss_server kernel: LustreError:
>>> >> 11556:0:(ldlm_lib.c:1614:target_send_reply_msg()) @@@ processing error
>>> >> (-114)  req at ffff81042150a000 x388953/t0 o8-><?>@<?>:0/0 lens 240/144 e
>>> >> 0 to 0 dl 1235754240 ref 1 fl Interpret:/0/0 rc -114/0
>>> >>
>>> >>
>>> >>
>>> >> Also, I was wondering if there was a way to reset the state of my OST.
>>> >> Its keep thinking its already mounted. Even after a reboot. Any way to
>>> >> say "hey, I am not mounted" ? :-)
>>> >>
>>> >> Would a writeconf help on the OST? I am hesitant to run one on it.
>>> >>
>>> >> TIA
>>> >>
>>> >> On Fri, Feb 27, 2009 at 11:41 AM, Evan Felix <evan.felix at pnl.gov> wrote:
>>>> >>> Mag,
>>>> >>>
>>>> >>> Can you send us the output from your kernel log after you try the mount
>>>> >>> command that is failing?
>>>> >>>
>>>> >>> Just run 'dmesg' and send us the last 20 lines or so..
>>>> >>>
>>>> >>> evan
>>>> >>>
>>>> >>>
>>>> >>> On 2/26/09 7:12 PM, "Mag Gam" <magawake at gmail.com> wrote:
>>>> >>>
>>>>> >>>> Any ideas?
>>>>> >>>>
>>>>> >>>> I am still unable to mount this new OST.  I stopped the client hang
>>>>> >>>> problem by disabling the OST via lctl but crazy problem indeed.
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> I would love to know how to activate the OST.
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> On Wed, Feb 25, 2009 at 4:43 PM, Mag Gam <magawake at gmail.com> wrote:
>>>>>> >>>>> Hello.
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> We created an OST on an OSS. But when I try to mount up the OST, it
>>>>>> >>>>> keeps saying.
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> mount.lustre /dev/vg/ost002 /vol/srv1/ost002
>>>>>> >>>>> mount.lustre: mount /dev/vg/ost002 at /vol/srv1/ost002 failed:
>>>>>> >>>>> Operation already in progress
>>>>>> >>>>> The target service is already running. (/dev/vg/ost002)
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> However,
>>>>>> >>>>> mount | grep -i ost002
>>>>>> >>>>> Nothing is mounted up....
>>>>>> >>>>>
>>>>>> >>>>> lctl is even showing this OST and also the client is able to see
it.
>>>>>> >>>>> lfs df -h
>>>>>> >>>>> ...
>>>>>> >>>>> lfs001-OST0005_UUID     492.2G    445.2G     22.0G   90%
>>>>>> >>>>> /lfs/srv5/lfs001[OST:5]
>>>>>> >>>>> ...
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> The MDS/OSS Version:
>>>>>> >>>>> lustre: 1.6.5.52
>>>>>> >>>>> kernel: patchless
>>>>>> >>>>> build:
>>>>>> >>>>> 
>>>>>> 1.6.5.52-19691231190000-PRISTINE-.var.tmp.linux-2.6.18.x86_64-2.6.18-prep
>>>>>> >>>>> 2.6.18 = Kernel Version
>>>>>> >>>>>
>>>>>> >>>>> I don't think its bugzilla 11564, because my lustre fs name is only
6
>>>>>> >>>>> characters long.
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> Also, when the client tries to access the new OST's space, it
>>>>>> simple
>>>>>> >>>>> hangs. It placed it in "bloc
>>>>>> >>>>>
>>>>>> >>>>> Any thoughts about this?
>>>>>> >>>>>
>>>>>> >>>>> TIA
>>>>>> >>>>>
>>>>> >>>> _______________________________________________
>>>>> >>>> Lustre-discuss mailing list
>>>>> >>>> Lustre-discuss at lists.lustre.org
>>>>> >>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>> >>>
>>>> >>>
>>> >>
>> >
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090302/05c44dd2/attachment.htm>


More information about the lustre-discuss mailing list