[Lustre-discuss] slow on flaky system
Ms. Megan Larko
dobsonunit at gmail.com
Tue Jan 20 08:21:53 PST 2009
When I have seen "input/output" error on my Lustre system it is
usually indicative of a networking problem. I would do the usual
things of checking the cable connections. I also look at the OST log
files. My FC5 /var/log/messages file would have "IMP_INVALID"
messages on the OSS computer with the bad networking connection.
Sometimes the fix has been as easy as replacing the cable (in my
specific case Infiniband cable). I have also increased my timeout
values above 100, but that is really only just masking the issue.
Using a higher timeout value will give more access time so that user
jobs do not fail, but the tradeoff is that the jobs will take longer
to run as they wait for the I/O connection to work properly.
Just my $.02.
More information about the lustre-discuss