[Lustre-discuss] Another server question.

Brian J. Murrell Brian.Murrell at Sun.COM
Tue Feb 3 08:42:59 PST 2009


On Tue, 2009-02-03 at 10:29 -0600, Robert Minvielle wrote:
> 
> I have five OSTs,

Do you really mean OSTs here or OSSes?  An OST is a disk device.  An OSS
is the server that an OST is serviced with.

> one of them is the MGS/MDT. Yes, it is a totally bad
> idea to have a MGD/MDT on the same node as an OST/OSS,

Yes, it is.  If you really do have 5 OSSes (and not 5 OSTs in a single
OSS for example) why don't you just dedicate one of those OSSes to being
an MDS/MGS?

> I down one of the servers (normal shutdown, not the MGD of course). 
> OK, so the clients seem to be frozen in regards to the lustre.

Only if they want to access objects (files, or file stripes) on that
server that you shut down, yes.

> Many here 
> have noted that it should be ok, with the exception of files that were
> stored on the downed server,

Yes.

> but that does not seem to be the case here. 
> That is not my main concern however, the real question is, I bring the server
> back up; check its ID by issuing lctl dl; I check the MGS by a cat /proc/fs/lustre/devices
> and see the ID in there as UP. OK, so it all seems well again, but the client
> is still (somewhat) stuck.

How long are you waiting after you bring the server up.  Recovery is not
instantaneous.

> I reboot the client, hrm, it still
> can not perform certain filesystem operations (ls -lR, df, du, find . all hang). 
> I can create files and read files if I know their location, but I can not seem 
> to perform any "recursive" type actions on the mount point on the client. 

Because you are likely looking for something from the down (and maybe
now recovering) OSS.

> I was going to restart the MGS/OSS servers, but the last time I did that
> nothing worked again and I had to start over.

Yes.  This is exactly why MDS/OSS is a bad idea.  When you reset one of
those, recovery has to be aborted because you took a (hidden -- the MDS
is a client of the OSTs) client down with the server.

> I have to be missing something
> here. I thought you could reboot a OST at will with more or less no side effects
> other than clients not seeing the files that were on that OST.

Yes, until it comes back up and recovery is finished.  Look at the
syslog of the OSS that you rebooted for details about the recovery.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090203/c7779fdb/attachment.pgp>


More information about the lustre-discuss mailing list