<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">

<HTML>

<HEAD>

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">

<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7235.2">

<TITLE>RE: Help needed in Building lustre using pre-packaged releases</TITLE>

</HEAD>

<BODY>

<!-- Converted from text/plain format -->

<BR>

<BR>


<P><FONT SIZE=2>Hi,<BR>

Can anyone guide me in building the lustre using pre-packaged lustre release.I'm using Ubuntu 7.10 I want to build lustre using RHEL2.6 rpms available on my system.I'm referring how_to in wiki. but in that no detailed step by step procedure is given for building lustre using pre-packed release.<BR>

<BR>

I'm in need of this.<BR>

<BR>

Thanks and Regards,<BR>

Ashok Bharat<BR>

-----Original Message-----<BR>

From: lustre-discuss-bounces@lists.lustre.org on behalf of lustre-discuss-request@lists.lustre.org<BR>

Sent: Fri 3/14/2008 2:25 AM<BR>

To: lustre-discuss@lists.lustre.org<BR>

Subject: Lustre-discuss Digest, Vol 26, Issue 36<BR>

<BR>

Send Lustre-discuss mailing list submissions to<BR>

        lustre-discuss@lists.lustre.org<BR>

<BR>

To subscribe or unsubscribe via the World Wide Web, visit<BR>

        <A HREF="http://lists.lustre.org/mailman/listinfo/lustre-discuss">http://lists.lustre.org/mailman/listinfo/lustre-discuss</A><BR>

or, via email, send a message with subject or body 'help' to<BR>

        lustre-discuss-request@lists.lustre.org<BR>

<BR>

You can reach the person managing the list at<BR>

        lustre-discuss-owner@lists.lustre.org<BR>

<BR>

When replying, please edit your Subject line so it is more specific<BR>

than "Re: Contents of Lustre-discuss digest..."<BR>

<BR>

<BR>

Today's Topics:<BR>

<BR>

   1. Re: OSS not healty (Andreas Dilger)<BR>

   2. Re: e2scan for backup (Andreas Dilger)<BR>

   3. Howto map block devices to Lustre devices? (Chris Worley)<BR>

   4. Re: e2fsck mdsdb: DB_NOTFOUND (Aaron Knister)<BR>

   5. Re: e2fsck mdsdb: DB_NOTFOUND (Karen M. Fernsler)<BR>

   6. Re: Howto map block devices to Lustre devices? (Klaus Steden)<BR>

<BR>

<BR>

----------------------------------------------------------------------<BR>

<BR>

Message: 1<BR>

Date: Thu, 13 Mar 2008 11:11:19 -0700<BR>

From: Andreas Dilger <adilger@sun.com><BR>

Subject: Re: [Lustre-discuss] OSS not healty<BR>

To: "Brian J. Murrell" <Brian.Murrell@sun.com><BR>

Cc: lustre-discuss@lists.lustre.org<BR>

Message-ID: <20080313181119.GB3217@webber.adilger.int><BR>

Content-Type: text/plain; charset=us-ascii<BR>

<BR>

On Mar 13, 2008  13:44 +0100, Brian J. Murrell wrote:<BR>

> On Thu, 2008-03-13 at 12:34 +0100, Frank Mietke wrote:<BR>

> > Mar 13 06:17:31 chic2e24 kernel: [3068633.701448] attempt to access beyond end of device<BR>

> > Mar 13 06:17:31 chic2e24 kernel: [3068633.701454] sda: rw=1, want=11287722456, limit=7796867072<BR>

><BR>

> This is pretty self-explanatory.  Something tried to read beyond the end<BR>

> of the disk.  Something has a misunderstanding of how big the disk is.<BR>

> Is it possible that the disk format process was misled about the disk<BR>

> size during initialization?<BR>

<BR>

Unlikely.<BR>

<BR>

> Andreas, does mkfs do any bounds checking to verify the sanity of the<BR>

> mkfs request?  I.e. does it make sure that if/when you specify a number<BR>

> of blocks for a filesystem that that many block are available?<BR>

<BR>

Yes, mke2fs will zero out the last ~128kB of the device to overwrite any<BR>

MD RAID signatures, and also verify that the device is as big as requested.<BR>

<BR>

These kind of errors are usually a result of corruption internal to the<BR>

filesystem, and some garbage is interpreted as a block number beyond the<BR>

end of the device.<BR>

<BR>

> > Mar 13 06:17:31 chic2e24 kernel: [3068633.701555] attempt to access beyond end of device<BR>

> > Mar 13 06:17:31 chic2e24 kernel: [3068633.701558] sda: rw=1, want=25366292592, limit=7796867072<BR>

> > Mar 13 06:17:31 chic2e24 kernel: [3068633.701562] Buffer I/O error on device sda, logical block 3170786573<BR>

> > Mar 13 06:17:31 chic2e24 kernel: [3068633.701785] lost page write due to I/O error on sda<BR>

> > Mar 13 06:17:31 chic2e24 kernel: [3068633.702004] Aborting journal on device sda.<BR>

><BR>

> This is all just fallout error messages from the attempted read beyond<BR>

> EOF.<BR>

<BR>

Time to unmount the filesystem and run a full e2fsck "e2fsck -fp /dev/sdaNNN"<BR>

<BR>

Cheers, Andreas<BR>

--<BR>

Andreas Dilger<BR>

Sr. Staff Engineer, Lustre Group<BR>

Sun Microsystems of Canada, Inc.<BR>

<BR>

<BR>

<BR>

------------------------------<BR>

<BR>

Message: 2<BR>

Date: Thu, 13 Mar 2008 11:22:48 -0700<BR>

From: Andreas Dilger <adilger@sun.com><BR>

Subject: Re: [Lustre-discuss] e2scan for backup<BR>

To: Jakob Goldbach <jakob@goldbach.dk><BR>

Cc: Lustre User Discussion Mailing List<BR>

        <lustre-discuss@lists.lustre.org><BR>

Message-ID: <20080313182248.GD3217@webber.adilger.int><BR>

Content-Type: text/plain; charset=us-ascii<BR>

<BR>

On Mar 13, 2008  12:59 +0100, Jakob Goldbach wrote:<BR>

> On Wed, 2008-03-12 at 23:12 +0100, Brian J. Murrell wrote:<BR>

> > On Wed, 2008-03-12 at 14:50 -0600, Lundgren, Andrew wrote:<BR>

> > > How do you do the snapshot?<BR>

> ><BR>

> > lvcreate -s<BR>

><BR>

> No need to freeze the filesystem while creating the snapshot to ensure a<BR>

> consistent filesystem on the snapshot ?<BR>

<BR>

Yes, but this is handled internally by LVM and ext3 when the snapshot<BR>

is created.<BR>

<BR>

> (xfs has a xfs_freeze function that does just this)<BR>

<BR>

In fact I was just discussing this with an XFS developer and this is<BR>

a source of problems for them because if you do xfs_freeze before doing<BR>

the LVM snapshot it will deadlock.<BR>

<BR>

Cheers, Andreas<BR>

--<BR>

Andreas Dilger<BR>

Sr. Staff Engineer, Lustre Group<BR>

Sun Microsystems of Canada, Inc.<BR>

<BR>

<BR>

<BR>

------------------------------<BR>

<BR>

Message: 3<BR>

Date: Thu, 13 Mar 2008 13:50:51 -0600<BR>

From: "Chris Worley" <worleys@gmail.com><BR>

Subject: [Lustre-discuss] Howto map block devices to Lustre devices?<BR>

To: lustre-discuss <lustre-discuss@lists.lustre.org><BR>

Message-ID:<BR>

        <f3177b9e0803131250n23084fd7g184ef07403a298cd@mail.gmail.com><BR>

Content-Type: text/plain; charset=ISO-8859-1<BR>

<BR>

I'm trying to deactivate some OST's, but to find them I've been<BR>

searching through /var/log/messages, as in:<BR>

<BR>

# ssh io2 grep -e sde -e sdf -e sdj -e sdk -e sdd /var/log/messages"*"<BR>

| grep Server<BR>

/var/log/messages:Mar 10 13:27:54 io2 kernel: Lustre: Server<BR>

ddnlfs-OST0035 on device /dev/sdf has started<BR>

/var/log/messages.1:Mar  4 16:02:13 io2 kernel: Lustre: Server<BR>

ddnlfs-OST0030 on device /dev/sdf has started<BR>

/var/log/messages.1:Mar  6 14:34:44 io2 kernel: Lustre: Server<BR>

ddnlfs-OST002e on device /dev/sdd has started<BR>

/var/log/messages.1:Mar  6 14:34:55 io2 kernel: Lustre: Server<BR>

ddnlfs-OST002f on device /dev/sde has started<BR>

/var/log/messages.1:Mar  6 14:35:16 io2 kernel: Lustre: Server<BR>

ddnlfs-OST0030 on device /dev/sdf has started<BR>

/var/log/messages.1:Mar  6 15:20:48 io2 kernel: Lustre: Server<BR>

ddnlfs-OST002f on device /dev/sde has started<BR>

/var/log/messages.1:Mar  6 16:08:38 io2 kernel: Lustre: Server<BR>

ddnlfs-OST002e on device /dev/sdd has started<BR>

/var/log/messages.1:Mar  6 16:08:43 io2 kernel: Lustre: Server<BR>

ddnlfs-OST0030 on device /dev/sdf has started<BR>

/var/log/messages.1:Mar  6 16:08:53 io2 kernel: Lustre: Server<BR>

ddnlfs-OST0034 on device /dev/sdj has started<BR>

<BR>

Note that there isn't an entry for sdk (probably rotated out), and sdf<BR>

has two different names.<BR>

<BR>

Is there a better way to find the right Lustre device name map to<BR>

Linux block device?<BR>

<BR>

I'm trying to cull-out slow disks.  I'm hoping that just by<BR>

"deactivating" the device in lctl, it'll quit using it, and that's the<BR>

best way to get rid of a slow drive... correct?<BR>

<BR>

Thanks,<BR>

<BR>

Chris<BR>

<BR>

<BR>

------------------------------<BR>

<BR>

Message: 4<BR>

Date: Thu, 13 Mar 2008 16:50:04 -0400<BR>

From: Aaron Knister <aaron@iges.org><BR>

Subject: Re: [Lustre-discuss] e2fsck mdsdb: DB_NOTFOUND<BR>

To: Michelle Butler <mbutler@ncsa.uiuc.edu><BR>

Cc: Andreas Dilger <adilger@sun.com>, lustre-discuss@clusterfs.com,<BR>

        abe-admin@ncsa.uiuc.edu, ckerner@ncsa.uiuc.edu, alex parga<BR>

        <aparga@ncsa.uiuc.edu>, set@ncsa.uiuc.edu<BR>

Message-ID: <85E6EB25-EC03-4D93-BD8B-B267F65A5400@iges.org><BR>

Content-Type: text/plain; charset=ISO-8859-1; format=flowed; delsp=yes<BR>

<BR>

What version of lustre/kernel is running on the problematic server?<BR>

<BR>

On Mar 13, 2008, at 11:02 AM, Michelle Butler wrote:<BR>

<BR>

> We got past that point by e2fsck the individual partitions first.<BR>

><BR>

> But we are still having problems.. I'm sorry to<BR>

> say.   we have an I/O server that is fine until<BR>

> we start Lustre.  It starts spewing lustre call traces :<BR>

><BR>

> Call<BR>

> Trace:<ffffffffa02fa089>{:libcfs:lcw_update_time+22}<BR>

> <ffffffffa03e06e3>{:ptlrpc:ptlrpc_main+1408}<BR>

>        <ffffffff8013327d>{default_wake_function+0}<BR>

> <ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0}<BR>

>        <ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0}<BR>

> <ffffffff80110ebb>{child_rip+8}<BR>

>        <ffffffffa03e0163>{:ptlrpc:ptlrpc_main+0}<BR>

> <ffffffff80110eb3>{child_rip+0}<BR>

><BR>

> ll_ost_io_232 S 000001037d6bbee8     0 26764      1         26765 <BR>

> 26763 (L-TLB)<BR>

> 000001037d6bbe58 0000000000000046 0000000100000246 0000000000000003<BR>

>        0000000000000016 0000000000000001 00000104100bcb20 <BR>

> 0000000300000246<BR>

>        00000103f5470030 000000000001d381<BR>

> Call<BR>

> Trace:<ffffffffa02fa089>{:libcfs:lcw_update_time+22}<BR>

> <ffffffffa03e06e3>{:ptlrpc:ptlrpc_main+1408}<BR>

>        <ffffffff8013327d>{default_wake_function+0}<BR>

> <ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0}<BR>

>        <ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0}<BR>

> <ffffffff80110ebb>{child_rip+8}<BR>

>        <ffffffffa03e0163>{:ptlrpc:ptlrpc_main+0}<BR>

> <ffffffff80110eb3>{child_rip+0}<BR>

><BR>

> ll_ost_io_233 S 00000103de847ee8     0 26765      1         26766 <BR>

> 26764 (L-TLB)<BR>

> 00000103de847e58 0000000000000046 0000000100000246 0000000000000001<BR>

>        0000000000000016 0000000000000001 000001040f83c620 <BR>

> 0000000100000246<BR>

>        00000103e627e030 000000000001d487<BR>

> Call<BR>

> Trace:<ffffffffa02fa089>{:libcfs:lcw_update_time+22}<BR>

> <ffffffffa03e06e3>{:ptlrpc:ptlrpc_main+1408}<BR>

>        <ffffffff8013327d>{default_wake_function+0}<BR>

> <ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0}<BR>

>        <ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0}<BR>

> <ffffffff80110ebb>{child_rip+8}<BR>

>        <ffffffffa03e0163>{:ptlrpc:ptlrpc_main+0}<BR>

> <ffffffff80110eb3>{child_rip+0}<BR>

><BR>

> ll_ost_io_234 S 00000100c4353ee8     0 26766      1         26767 <BR>

> 26765 (L-TLB)<BR>

> 00000100c4353e58 0000000000000046 0000000100000246 0000000000000003<BR>

>        0000000000000016 0000000000000001 00000104100bcc60 <BR>

> 0000000300000246<BR>

>        00000103de81b810 000000000001d945<BR>

> Call<BR>

> Trace:<ffffffffa02fa089>{:libcfs:lcw_update_time+22}<BR>

> <ffffffffa03e06e3>{:ptlrpc:ptlrpc_main+1408}<BR>

>        <ffffffff8013327d>{default_wake_function+0}<BR>

> <ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0}<BR>

>        <BR>

> <ffffffffa03e0156>{:ptlrpc:ptlrpc_retr???f?????????c?????????c??????<BR>

>                                                          <BR>

> Ks[F????????????<BR>

> <ffffffff8013327d>{default_wake_function+0}<BR>

> <ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0}<BR>

>        <ffffffffa03e0156>{:ptl<BR>

><BR>

> It then panic's the kernel.. ??<BR>

><BR>

> Michelle Butler<BR>

><BR>

> At 02:39 AM 3/13/2008, Andreas Dilger wrote:<BR>

>> On Mar 12, 2008  06:44 -0500, Karen M. Fernsler wrote:<BR>

>>> I'm running:<BR>

>>><BR>

>>> e2fsck -y -v --mdsdb mdsdb --ostdb osth3_1 /dev/mapper/27l4<BR>

>>><BR>

>>> and getting:<BR>

>>><BR>

>>> Pass 6: Acquiring information for lfsck<BR>

>>> error getting mds_hdr (3685469441:8) in<BR>

>> /post/cfg/mdsdb: DB_NOTFOUND: No matching key/data pair found<BR>

>>> e2fsck: aborted<BR>

>>><BR>

>>> Any ideas how to get around this?<BR>

>><BR>

>> Does "mdsdb" actually exist?  This should be created by first <BR>

>> running:<BR>

>><BR>

>> e2fsck --mdsdb mdsdb /dev/{mdsdevicename}<BR>

>><BR>

>> before running your above command on the OST.<BR>

>><BR>

>> Please also try specifying the absolute pathname for the mdsdb and <BR>

>> ostdb<BR>

>> files.<BR>

>><BR>

>> Cheers, Andreas<BR>

>> --<BR>

>> Andreas Dilger<BR>

>> Sr. Staff Engineer, Lustre Group<BR>

>> Sun Microsystems of Canada, Inc.<BR>

><BR>

><BR>

> _______________________________________________<BR>

> Lustre-discuss mailing list<BR>

> Lustre-discuss@lists.lustre.org<BR>

> <A HREF="http://lists.lustre.org/mailman/listinfo/lustre-discuss">http://lists.lustre.org/mailman/listinfo/lustre-discuss</A><BR>

<BR>

Aaron Knister<BR>

Associate Systems Analyst<BR>

Center for Ocean-Land-Atmosphere Studies<BR>

<BR>

(301) 595-7000<BR>

aaron@iges.org<BR>

<BR>

<BR>

<BR>

<BR>

<BR>

<BR>

------------------------------<BR>

<BR>

Message: 5<BR>

Date: Thu, 13 Mar 2008 15:51:22 -0500<BR>

From: "Karen M. Fernsler" <fernsler@ncsa.uiuc.edu><BR>

Subject: Re: [Lustre-discuss] e2fsck mdsdb: DB_NOTFOUND<BR>

To: Aaron Knister <aaron@iges.org><BR>

Cc: Andreas Dilger <adilger@sun.com>, lustre-discuss@clusterfs.com,<BR>

        Michelle Butler <mbutler@ncsa.uiuc.edu>, abe-admin@ncsa.uiuc.edu,<BR>

        ckerner@ncsa.uiuc.edu, alex parga <aparga@ncsa.uiuc.edu>,<BR>

        set@ncsa.uiuc.edu<BR>

Message-ID: <20080313205122.GA17635@ncsa.uiuc.edu><BR>

Content-Type: text/plain; charset=iso-8859-1<BR>

<BR>

2.6.9-42.0.10.EL_lustre-1.4.10.1smp<BR>

<BR>

This is a 2.6.9-42.0.10.E kernel with lustre-1.4.10.1.<BR>

<BR>

This has been working ok for almost a year.  We did try to<BR>

export this filesystem to another cluster over nfs before<BR>

we started seeing problems, but I don't know how related if<BR>

at all that is.<BR>

<BR>

We are now trying to dissect the problem by inspecting<BR>

the switch logs these nodes are connected to.<BR>

<BR>

thanks,<BR>

-k<BR>

<BR>

On Thu, Mar 13, 2008 at 04:50:04PM -0400, Aaron Knister wrote:<BR>

> What version of lustre/kernel is running on the problematic server?<BR>

><BR>

> On Mar 13, 2008, at 11:02 AM, Michelle Butler wrote:<BR>

><BR>

> >We got past that point by e2fsck the individual partitions first.<BR>

> ><BR>

> >But we are still having problems.. I'm sorry to<BR>

> >say.   we have an I/O server that is fine until<BR>

> >we start Lustre.  It starts spewing lustre call traces :<BR>

> ><BR>

> >Call<BR>

> >Trace:<ffffffffa02fa089>{:libcfs:lcw_update_time+22}<BR>

> ><ffffffffa03e06e3>{:ptlrpc:ptlrpc_main+1408}<BR>

> >       <ffffffff8013327d>{default_wake_function+0}<BR>

> ><ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0}<BR>

> >       <ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0}<BR>

> ><ffffffff80110ebb>{child_rip+8}<BR>

> >       <ffffffffa03e0163>{:ptlrpc:ptlrpc_main+0}<BR>

> ><ffffffff80110eb3>{child_rip+0}<BR>

> ><BR>

> >ll_ost_io_232 S 000001037d6bbee8     0 26764      1         26765 <BR>

> >26763 (L-TLB)<BR>

> >000001037d6bbe58 0000000000000046 0000000100000246 0000000000000003<BR>

> >       0000000000000016 0000000000000001 00000104100bcb20 <BR>

> >0000000300000246<BR>

> >       00000103f5470030 000000000001d381<BR>

> >Call<BR>

> >Trace:<ffffffffa02fa089>{:libcfs:lcw_update_time+22}<BR>

> ><ffffffffa03e06e3>{:ptlrpc:ptlrpc_main+1408}<BR>

> >       <ffffffff8013327d>{default_wake_function+0}<BR>

> ><ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0}<BR>

> >       <ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0}<BR>

> ><ffffffff80110ebb>{child_rip+8}<BR>

> >       <ffffffffa03e0163>{:ptlrpc:ptlrpc_main+0}<BR>

> ><ffffffff80110eb3>{child_rip+0}<BR>

> ><BR>

> >ll_ost_io_233 S 00000103de847ee8     0 26765      1         26766 <BR>

> >26764 (L-TLB)<BR>

> >00000103de847e58 0000000000000046 0000000100000246 0000000000000001<BR>

> >       0000000000000016 0000000000000001 000001040f83c620 <BR>

> >0000000100000246<BR>

> >       00000103e627e030 000000000001d487<BR>

> >Call<BR>

> >Trace:<ffffffffa02fa089>{:libcfs:lcw_update_time+22}<BR>

> ><ffffffffa03e06e3>{:ptlrpc:ptlrpc_main+1408}<BR>

> >       <ffffffff8013327d>{default_wake_function+0}<BR>

> ><ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0}<BR>

> >       <ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0}<BR>

> ><ffffffff80110ebb>{child_rip+8}<BR>

> >       <ffffffffa03e0163>{:ptlrpc:ptlrpc_main+0}<BR>

> ><ffffffff80110eb3>{child_rip+0}<BR>

> ><BR>

> >ll_ost_io_234 S 00000100c4353ee8     0 26766      1         26767 <BR>

> >26765 (L-TLB)<BR>

> >00000100c4353e58 0000000000000046 0000000100000246 0000000000000003<BR>

> >       0000000000000016 0000000000000001 00000104100bcc60 <BR>

> >0000000300000246<BR>

> >       00000103de81b810 000000000001d945<BR>

> >Call<BR>

> >Trace:<ffffffffa02fa089>{:libcfs:lcw_update_time+22}<BR>

> ><ffffffffa03e06e3>{:ptlrpc:ptlrpc_main+1408}<BR>

> >       <ffffffff8013327d>{default_wake_function+0}<BR>

> ><ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0}<BR>

> >       <BR>

> ><ffffffffa03e0156>{:ptlrpc:ptlrpc_retr???f?????????c?????????c??????<BR>

> >                                                         <BR>

> >Ks[F????????????<BR>

> ><ffffffff8013327d>{default_wake_function+0}<BR>

> ><ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0}<BR>

> >       <ffffffffa03e0156>{:ptl<BR>

> ><BR>

> >It then panic's the kernel.. ??<BR>

> ><BR>

> >Michelle Butler<BR>

> ><BR>

> >At 02:39 AM 3/13/2008, Andreas Dilger wrote:<BR>

> >>On Mar 12, 2008  06:44 -0500, Karen M. Fernsler wrote:<BR>

> >>>I'm running:<BR>

> >>><BR>

> >>>e2fsck -y -v --mdsdb mdsdb --ostdb osth3_1 /dev/mapper/27l4<BR>

> >>><BR>

> >>>and getting:<BR>

> >>><BR>

> >>>Pass 6: Acquiring information for lfsck<BR>

> >>>error getting mds_hdr (3685469441:8) in<BR>

> >>/post/cfg/mdsdb: DB_NOTFOUND: No matching key/data pair found<BR>

> >>>e2fsck: aborted<BR>

> >>><BR>

> >>>Any ideas how to get around this?<BR>

> >><BR>

> >>Does "mdsdb" actually exist?  This should be created by first <BR>

> >>running:<BR>

> >><BR>

> >>e2fsck --mdsdb mdsdb /dev/{mdsdevicename}<BR>

> >><BR>

> >>before running your above command on the OST.<BR>

> >><BR>

> >>Please also try specifying the absolute pathname for the mdsdb and <BR>

> >>ostdb<BR>

> >>files.<BR>

> >><BR>

> >>Cheers, Andreas<BR>

> >>--<BR>

> >>Andreas Dilger<BR>

> >>Sr. Staff Engineer, Lustre Group<BR>

> >>Sun Microsystems of Canada, Inc.<BR>

> ><BR>

> ><BR>

> >_______________________________________________<BR>

> >Lustre-discuss mailing list<BR>

> >Lustre-discuss@lists.lustre.org<BR>

> ><A HREF="http://lists.lustre.org/mailman/listinfo/lustre-discuss">http://lists.lustre.org/mailman/listinfo/lustre-discuss</A><BR>

><BR>

> Aaron Knister<BR>

> Associate Systems Analyst<BR>

> Center for Ocean-Land-Atmosphere Studies<BR>

><BR>

> (301) 595-7000<BR>

> aaron@iges.org<BR>

><BR>

><BR>

><BR>

<BR>

--<BR>

Karen Fernsler Systems Engineer<BR>

National Center for Supercomputing Applications<BR>

ph: (217) 265 5249<BR>

email: fernsler@ncsa.uiuc.edu<BR>

<BR>

<BR>

------------------------------<BR>

<BR>

Message: 6<BR>

Date: Thu, 13 Mar 2008 13:55:45 -0700<BR>

From: Klaus Steden <klaus.steden@thomson.net><BR>

Subject: Re: [Lustre-discuss] Howto map block devices to Lustre<BR>

        devices?<BR>

To: Chris Worley <worleys@gmail.com>,   lustre-discuss<BR>

        <lustre-discuss@lists.lustre.org><BR>

Message-ID: <C3FEE2E1.59E7%klaus.steden@thomson.net><BR>

Content-Type: text/plain;       charset="US-ASCII"<BR>

<BR>

<BR>

Hi Chris,<BR>

<BR>

Don't your Lustre volumes have a label on them?<BR>

<BR>

On the one cluster I've got, the physical storage is shared with a number of<BR>

other systems, so the device information can change over time ... so I use<BR>

device labels in my /etc/fstab and friends.<BR>

<BR>

Something like 'lustre-OST0000', 'lustre-OST00001' ... although when the<BR>

devices are actually mounted, they show up with their /dev node names.<BR>

<BR>

Look through /proc/fs/lustre for Lustre volume names (they show up when<BR>

they're mounted), and you can winnow your list down by mounting by name,<BR>

checking the device ID, and removing it that way.<BR>

<BR>

If you have a lot of devices on the same bus, it will likely take a bit for<BR>

the right one to be found, but it's there.<BR>

<BR>

hth,<BR>

Klaus<BR>

<BR>

On 3/13/08 12:50 PM, "Chris Worley" <worleys@gmail.com>did etch on stone<BR>

tablets:<BR>

<BR>

> I'm trying to deactivate some OST's, but to find them I've been<BR>

> searching through /var/log/messages, as in:<BR>

><BR>

> # ssh io2 grep -e sde -e sdf -e sdj -e sdk -e sdd /var/log/messages"*"<BR>

> | grep Server<BR>

> /var/log/messages:Mar 10 13:27:54 io2 kernel: Lustre: Server<BR>

> ddnlfs-OST0035 on device /dev/sdf has started<BR>

> /var/log/messages.1:Mar  4 16:02:13 io2 kernel: Lustre: Server<BR>

> ddnlfs-OST0030 on device /dev/sdf has started<BR>

> /var/log/messages.1:Mar  6 14:34:44 io2 kernel: Lustre: Server<BR>

> ddnlfs-OST002e on device /dev/sdd has started<BR>

> /var/log/messages.1:Mar  6 14:34:55 io2 kernel: Lustre: Server<BR>

> ddnlfs-OST002f on device /dev/sde has started<BR>

> /var/log/messages.1:Mar  6 14:35:16 io2 kernel: Lustre: Server<BR>

> ddnlfs-OST0030 on device /dev/sdf has started<BR>

> /var/log/messages.1:Mar  6 15:20:48 io2 kernel: Lustre: Server<BR>

> ddnlfs-OST002f on device /dev/sde has started<BR>

> /var/log/messages.1:Mar  6 16:08:38 io2 kernel: Lustre: Server<BR>

> ddnlfs-OST002e on device /dev/sdd has started<BR>

> /var/log/messages.1:Mar  6 16:08:43 io2 kernel: Lustre: Server<BR>

> ddnlfs-OST0030 on device /dev/sdf has started<BR>

> /var/log/messages.1:Mar  6 16:08:53 io2 kernel: Lustre: Server<BR>

> ddnlfs-OST0034 on device /dev/sdj has started<BR>

><BR>

> Note that there isn't an entry for sdk (probably rotated out), and sdf<BR>

> has two different names.<BR>

><BR>

> Is there a better way to find the right Lustre device name map to<BR>

> Linux block device?<BR>

><BR>

> I'm trying to cull-out slow disks.  I'm hoping that just by<BR>

> "deactivating" the device in lctl, it'll quit using it, and that's the<BR>

> best way to get rid of a slow drive... correct?<BR>

><BR>

> Thanks,<BR>

><BR>

> Chris<BR>

> _______________________________________________<BR>

> Lustre-discuss mailing list<BR>

> Lustre-discuss@lists.lustre.org<BR>

> <A HREF="http://lists.lustre.org/mailman/listinfo/lustre-discuss">http://lists.lustre.org/mailman/listinfo/lustre-discuss</A><BR>

<BR>

<BR>

<BR>

------------------------------<BR>

<BR>

_______________________________________________<BR>

Lustre-discuss mailing list<BR>

Lustre-discuss@lists.lustre.org<BR>

<A HREF="http://lists.lustre.org/mailman/listinfo/lustre-discuss">http://lists.lustre.org/mailman/listinfo/lustre-discuss</A><BR>

<BR>

<BR>

End of Lustre-discuss Digest, Vol 26, Issue 36<BR>

**********************************************<BR>

<BR>

</FONT>

</P>


</BODY>

</HTML>