[Lustre-discuss] Lustre Mount Crashing
Andreas Dilger
adilger at sun.com
Mon Jun 2 12:30:09 PDT 2008
On Jun 02, 2008 10:05 -0700, Kilian CAVALOTTI wrote:
> On Monday 02 June 2008 08:35:35 am Charles Taylor wrote:
> > Unfortunately, getting the messages off the console (in the machine
> > room) means using a pencil and paper (you'd think we have something
> > as fancy as a ip-kvm console server, but alas, we do things, ahem,
> > "inexpensively" here.
>
> There are a couple solutions to help you there:
> * using a serial console connected to a remote machine (costs a serial
> cable and some configuration).
One very practical and low-cost mechanism is to cross-cable the serial
console from one machine to its neighbour. Most server-class machines
have 2 serial consoles, so you can have an inbound port for the console
of the neighbour, and an outbound port configured to be the serial
console of that machine.
> * and maybe the easiest, most inexpensive (no hardware involved) and
> most convenient one: using netdump [1]. You configure a netdump client
> on the machine you want to gather logs and traces from, and a
> netdump-server on an other host, to receive those messages. This
> solution proved to be really efficient in gathering Lustre's debug
> logs and crash dumps.
>
> [1] http://www.redhat.com/support/wpapers/redhat/netdump/
> and http://docs.freevps.com/doku.php?id=how-to:netdump
Yes, LLNL has been using netdump to good effect. It works with the
"normal" crashdump utilities like "crash" (modified gdb). It isn't
in all kernels, however.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
More information about the lustre-discuss
mailing list