[Lustre-discuss] Is it safe to run MDS, MGS & OSS on the same machine ?

Keith Mannthey keith.mannthey at intel.com
Wed Mar 5 09:05:57 PST 2014



On Wed, 2014-03-05 at 10:24 +0100, Rafal Maszkowski wrote:
> On Tue, Mar 04, 2014 at 10:55:05PM +0000, Dilger, Andreas wrote:
> > On 2014/03/04, 2:38 AM, "邓尧" <torshie at gmail.com<mailto:torshie at gmail.com>> wrote:
> > We're running low on physical machines, and want to deploy MGS, MDS and OSS on the same machine, is it officially supported ?
> > I know that MGS and MDS can be put on the same machine, but not sure about OSS and MDS.
> > This will work, but if the node fails then there is no recovery for operations in progress and the clients can get an IO error for operations in progress.
> 
> We mostly use this mode of operation and our experience is that after a
> machine crash* the nodes and heavy computing programs on them survive
> several hours of break.
> 
> R.
> *The machines which crash are our aging Thumpers. We replace memory
> chips but we still do not know how to interpret the ILOM messages like:
> ID =  60c : 11/28/2013 : 16:39:08 : Memory : BIOS : Uncorrectable ECC Node 7 DIMM 1
> ID =  60b : 11/28/2013 : 16:39:08 : Memory : BIOS : Uncorrectable ECC Node 7 DIMM 0

These messages mean the ECC on Memory is failing and has returned a read
or possibly a write that was incorrect at the HW level.  Some firmware
will reboot you systems on such an event as to protect the system. This
is not healthy for the system. 


> Thumpers have only two nodes with four memory chips in each. The crashes
> are rare though so we cannot test various hypotheses easily.

Thanks,
 Keith 




More information about the lustre-discuss mailing list