[Lustre-discuss] troubleshooting lustre

Andreas Dilger adilger at whamcloud.com
Tue Dec 13 20:37:15 PST 2011


On 2011-12-13, at 4:36 PM, Andrus, Brian Contractor wrote:
> A large Volume Group off a DDN connected via infiniband.
> This is broken into several Logical Volumes. Some are just regular ext3/4 filesystems. Quite a few are partitioned out (in 4TB chunks) for OSTs.

Anything that is using partitions/LVs that are smaller than a full
RAID LUN (i.e. 8+2 RAID-6) is going to have serious performance loss.
Having multiple OSTs on the same disks is only going to increase the
contention on those disks, and doesn't provide any functional benefit.

> I have 3 lustre filesystems: home, scratch and work.
> Home consists of a single OST
> Scratch consists of 2 OSTs
> Work consists of 10 OSTs
>  
> Each filesystem has its own combined MGS/MGT
> Each OSS has 2 OSTs where possible
> Each MGS will also serve one OST
>  
> I have 8 systems that are OSSes (The MGSes are also among those 8)
>  
> Now, ONE of my nodes (an OSSes that is only serving 2 OSTs) has a helluva load:
>  
> [root at nas-0-3 ~]# uptime
> 15:34:06 up 77 days, 22:39,  1 user,  load average: 352.59, 339.80, 318.11
>  
> I see lots of:
> Lustre: work-OST0004: slow commitrw commit 91s due to heavy IO load
>  
> And:
> Dec 13 15:32:48 nas-0-3 kernel: LustreError: 6413:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-107)  req at ffff8105c557ac00 x1381121762230130/t0 o400-><?>@<?>:0/0 lens 192/0 e 0 to 0 dl 1323819184 ref 1 fl Interpret:H/0/0 rc -107/0
> Dec 13 15:32:48 nas-0-3 kernel: LustreError: 6413:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 1900 previous similar messages
>  
> Not sure what that one means, but it seems significant.
>  
> Things get VERY slow and start timing out. Users see it as the system ‘hanging’.
>  
> Could someone point me in the right direction for figuring out the culprit here?
>  
> Thanks in advance!
>  
>  
> Brian Andrus
> ITACS/Research Computing
> Naval Postgraduate School
> Monterey, California
>  
>  
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss


Cheers, Andreas
--
Andreas Dilger 
Principal Engineer
Whamcloud, Inc.






More information about the lustre-discuss mailing list