[Lustre-discuss] New Errors in Lustre 1.6.7.1

Alex Lyashkov Alexey.Lyashkov at Sun.COM
Fri Apr 17 20:46:45 PDT 2009


On Fri, 2009-04-17 at 15:51 -0400, Roger Spellman wrote:
> Hi,
> 
> I just upgraded some servers to 1.6.7.1, and I started getting some
> error messages.  So, I reformatted my file system, and started again:
> Here are the messages on the MDS:
> 
>  
> 
> Lustre: MDT storage-MDT0000 now serving storage-MDT0000_UUID
> (storage-MDT0000/82f9201e-d26f-a513-4ad8-bdb4091f8afd) with recovery
> enabled
> 
> Lustre: 6890:0:(lproc_mds.c:271:lprocfs_wr_group_upcall())
> storage-MDT0000: group upcall set to /usr/sbin/l_getgroups
> 
> Lustre: storage-MDT0000.mdt: set parameter
> group_upcall=/usr/sbin/l_getgroups
> 
> Lustre: Server storage-MDT0000 on device /dev/sdb2 has started
> 
> Lustre: Request x7 sent from storage-OST0000-osc to NID 10.2.46.2 at o2ib
> 0s ago has timed out (limit 5s).
> 
> Lustre: Request x8 sent from storage-OST0001-osc to NID 10.2.46.3 at o2ib
> 0s ago has timed out (limit 5s).
hm.. mds send request when o2ib was not ready for send - '0s ago' say
this was network issue with send request, and not really timeout.

> 
> Lustre: 6749:0:(lproc_mds.c:271:lprocfs_wr_group_upcall())
> storage-MDT0000: group upcall set to NONE
> 
> Lustre: 6544:0:(import.c:507:import_select_connection())
> storage-OST0000-osc: tried all connections, increasing latency to 5s
> 
> Lustre: 6544:0:(import.c:507:import_select_connection())
> storage-OST0001-osc: tried all connections, increasing latency to 5s
> 
> Lustre: 6543:0:(quota_master.c:1642:mds_quota_recovery()) Not all osts
> are active, abort quota recovery
> 
> Lustre: 6543:0:(quota_master.c:1642:mds_quota_recovery()) Not all osts
> are active, abort quota recovery
> 
> Lustre: MDS storage-MDT0000: storage-OST0000_UUID now active,
> resetting orphans
> 
> Lustre: MDS storage-MDT0000: storage-OST0002_UUID now active,
> resetting orphans
> 
> Lustre: Skipped 1 previous similar message
> 
>  
but later looks connects finished fine.

> 
> 
> I don’t seem to have any error messages on the OSTs.  I tested my
> network, and it is running well.
> 
>  
> 
> Any thoughts?
this isn't errors. just notices - which say - first connect request
which send from mds to ost is timeout (or o2ib not ready for send) - and
mds can't connect to ost from first pass, but after 5s they reconnect
successfully.



> 
-- 
Alex Lyashkov <alexey.lyashkov at sun.com>
Lustre Group, Sun Microsystems




More information about the lustre-discuss mailing list