[Lustre-discuss] Lustre Mount Crashing

Andreas Dilger adilger at sun.com
Mon Jun 2 12:30:09 PDT 2008


On Jun 02, 2008  10:05 -0700, Kilian CAVALOTTI wrote:
> On Monday 02 June 2008 08:35:35 am Charles Taylor wrote:
> > Unfortunately, getting the messages off the console (in the machine
> > room) means using a pencil and paper (you'd think we have something
> > as fancy as a ip-kvm console server, but alas, we do things, ahem,
> > "inexpensively" here.   
> 
> There are a couple solutions to help you there:
> * using a serial console connected to a remote machine (costs a serial 
>   cable and some configuration).

One very practical and low-cost mechanism is to cross-cable the serial
console from one machine to its neighbour.  Most server-class machines
have 2 serial consoles, so you can have an inbound port for the console
of the neighbour, and an outbound port configured to be the serial
console of that machine.

> * and maybe the easiest, most inexpensive (no hardware involved) and 
>   most convenient one: using netdump [1]. You configure a netdump client 
>   on the machine you want to gather logs and traces from, and a 
>   netdump-server on an other host, to receive those messages. This 
>   solution proved to be really efficient in gathering Lustre's debug 
>   logs and crash dumps.
> 
> [1] http://www.redhat.com/support/wpapers/redhat/netdump/
> and http://docs.freevps.com/doku.php?id=how-to:netdump

Yes, LLNL has been using netdump to good effect.  It works with the
"normal" crashdump utilities like "crash" (modified gdb).  It isn't
in all kernels, however.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list