[Lustre-discuss] OSS panicing.....
Phill Harvey-Smith
p.harvey-smith at warwick.ac.uk
Tue Aug 6 06:22:59 PDT 2013
Hi all,
Our OSS has started panicing in the last couple of days, it seems to be
related to nfs4, but not sure so asking the group for pointers.
Fistly a couple of screen grabs are at :
http://penguin.stats.warwick.ac.uk/~stsxab/Lustre/
The OSS server is currently running Ubuntu 10.04 LTS with an alien
(redhat I believe) kernel installed.
The running kernel is :
2.6.32-131.6.1.el6_lustre.g65156ed.x86_64
I believe that it is running lustre 1.6.x. The MDS is also setup in a
similar manner.
The clients are a mixture of Ubuntu 10.04 LTS with Lustre 1.6.x and the
3 most recent nodes are Ubuntu 12.04 LTS with Lustre 2.5.x which I built
recently.
The OSS has 2 raid arrays, one on the onboard SAS controller which has
two of the Lustre volumes (/home and /scratch), along with the NFS
exported file system, on a separate XFS partition. The second raid array
is on an external PCIE Raid controler, and an external disk array and
holds the other Lustre filesystem on two virtual disks.
The OSS also has a couple of NFS4 shares :
/export
192.168.0.0/24(rw,async,fsid=0,crossmnt,no_root_squash,no_subtree_check)
192.168.1.0/24(rw,sync,fsid=0,no_root_squash,crossmnt,no_subtree_check)
/export/software/packages-x86_64-linux-gnu
192.168.0.0/24(rw,async,no_subtree_check,no_root_squash)
Which are on a separate disk.
If I disable the NFS shares then the OSS server seems to stay up and
client machines can access the lustre file systems. But once I enable
the NFS shares the OSS will panic within a few minutes, this is why I
suspect some interaction with NFS.
The odd thing is the machine only started doing this yesterday, I have
replaced / re-seated the RAM, CPUs and cards (Ethernet & SAS), but this
doesn't seem to have changed anything.
I am aware that this setup is not a supported architecture (I inherited
custody of the cluster from a previous admin) and am planning on
re-installing both the OSS and MDS with (probably) CentOS, as that is
supported for the server. Is there anything I need to be aware of in
planning this upgrade ?
Does anyone have any clue as to what I might try, is there an easy way I
can check the integrity of the Lustre volumes ?
Cheers.
Phill.
More information about the lustre-discuss
mailing list