[Lustre-discuss] Lustre 1.6.5.1 on X4200 and STK 6140 Issues

Brock Palen brockp at umich.edu
Mon Oct 13 08:31:47 PDT 2008


I never uninstalled it (i still use some of the tools in it)   
Faultmond is a service,  just chkconfig it off.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp at umich.edu
(734)936-1985



On Oct 13, 2008, at 11:03 AM, Malcolm Cowe wrote:

> Brock Palen wrote:
>>
>> I know you say the only addition was the RDAC for the MDS's I  
>> assume (we use it also just fine).
> Yes, the MDS's share a STK 6140.
>> When I ran faultmond from suns dcmu rpm (RHEL 4 here) the x4500's  
>> would crash like clock work ~48 hours. For a very simple bit of  
>> code I was surpised that once when I forgot to turn it on when  
>> working on the load this would happen. Just FYI it was unrelated  
>> to lustre (using provided rpm's no kernel build) this solved my  
>> problem on the x4500
> The DCMU RPM is installed. I didn't explicitly install this, so it  
> must have been bundled in with the SIA CD... I'll try removing the  
> rpm to see what happens. Thanks for the heads up.
>
> Regards,
>
> Malcolm.
>
>> Brock Palen www.umich.edu/~brockp Center for Advanced Computing  
>> brockp at umich.edu (734)936-1985 On Oct 13, 2008, at 4:41 AM,  
>> Malcolm Cowe wrote:
>>>
>>> The X4200m2 MDS systems and the X4500 OSS were rebuilt using the  
>>> stock Lustre packages (Kernel + modules + userspace). With the  
>>> exception of the RDAC kernel module, no additional software was  
>>> applied to the systems. We recreated our volumes and ran the  
>>> servers over the weekend. However, the OSS crashed about 8 hours  
>>> in. The syslog output is attached to this message. Looks like it  
>>> could be similar to bug #16404, which means patching and  
>>> rebuilding the kernel. Given my lack of success at trying to  
>>> build from source, I am again asking for some guidance on how to  
>>> do this. I sent out the steps I used to try and build from source  
>>> on the 7th because I was encountering problems and was unable to  
>>> get a working set of packages. Included in that messages was  
>>> output from quilt that implies that the kernel patching process  
>>> was not working properly. Regards, Malcolm. -- <6g_top.gif>  
>>> Malcolm Cowe Solutions Integration Engineer Sun Microsystems,  
>>> Inc. Blackness Road Linlithgow, West Lothian EH49 7LR UK Phone:  
>>> x73602 / +44 1506 673 602 Email: Malcolm.Cowe at Sun.COM  
>>> <6g_top.gif> Oct 10 06:49:39 oss-1 kernel: LDISKFS FS on md15,  
>>> internal journal Oct 10 06:49:39 oss-1 kernel: LDISKFS-fs:  
>>> mounted filesystem with ordered data mode. Oct 10 06:53:42 oss-1  
>>> kernel: kjournald starting. Commit interval 5 seconds Oct 10  
>>> 06:53:42 oss-1 kernel: LDISKFS FS on md16, internal journal Oct  
>>> 10 06:53:42 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
>>> ordered data mode. Oct 10 06:57:49 oss-1 kernel: kjournald  
>>> starting. Commit interval 5 seconds Oct 10 06:57:49 oss-1 kernel:  
>>> LDISKFS FS on md17, internal journal Oct 10 06:57:49 oss-1  
>>> kernel: LDISKFS-fs: mounted filesystem with ordered data mode.  
>>> Oct 10 07:44:55 oss-1 faultmond: 16:Polling all 48 slots for  
>>> drive fault Oct 10 07:45:00 oss-1 faultmond: Polling cycle 16 is  
>>> complete Oct 10 07:56:23 oss-1 kernel: Lustre: OBD class driver,  
>>> info at clusterfs.com Oct 10 07:56:23 oss-LDISKFS-fs: file extents  
>>> enabled1 kernel: Lustre VersionLDISKFS-fs: mballoc enabled :  
>>> 1.6.5.1 Oct 10 07:56:23 oss-1 kernel: Build Version:  
>>> 1.6.5.1-19691231190000-PRISTINE-.cache.OLDRPMS. 
>>> 20080618230526.linux- smp-2.6.9-67.0.7.EL_lustre. 
>>> 1.6.5.1.x86_64-2.6.9-67.0.7.EL_lustre. 1.6.5.1smp Oct 10 07:56:24  
>>> oss-1 kernel: Lustre: Added LNI 192.168.30.111 at o2ib [8/64] Oct 10  
>>> 07:56:24 oss-1 kernel: Lustre: Lustre Client File System;  
>>> info at clusterfs.com Oct 10 07:56:24 oss-1 kernel: kjournald  
>>> starting. Commit interval 5 seconds Oct 10 07:56:24 oss-1 kernel:  
>>> LDISKFS FS on md11, external journal on md21 Oct 10 07:56:24  
>>> oss-1 kernel: LDISKFS-fs: mounted filesystem with journal data  
>>> mode. Oct 10 07:56:24 oss-1 kernel: kjournald starting. Commit  
>>> interval 5 seconds Oct 10 07:56:24 oss-1 kernel: LDISKFS FS on  
>>> md11, external journal on md21 Oct 10 07:56:24 oss-1 kernel:  
>>> LDISKFS-fs: mounted filesystem with   journal data mode. Oct 10  
>>> 07:56:24 oss-1 kernel: LDISKFS-fs: file extents enabled Oct 10  
>>> 07:56:24 oss-1 kernel: LDISKFS-fs: mballoc enabled Lustre:  
>>> Request x1 sent from MGC192.168.30.101 at o2ib to NID  
>>> 192.168.30.101 at o2ib 5s ago has timed out (limit 5s). Oct 10  
>>> 07:56:30 oss-1 kernel: Lustre: Request x1 sent from  
>>> MGC192.168.30.101 at o2ib to NID 192.168.30.101 at o2ib 5s ago has  
>>> timed out (limit 5s). LustreError: 4685:0:(events.c: 
>>> 55:request_out_callback()) @@@ type 4, status -113  
>>> req at 00000101f8ef3200 x3/t0 o250-
>>>>
>>>> MGS at MGC192.168.30.101@o2ib_1:26/25 lens 240/400 e 0 to 5 dl
>>> 1223621815 ref 2 fl Rpc:/0/0 rc 0/0 Lustre: Request x3 sent from  
>>> MGC192.168.30.101 at o2ib to NID 192.168.30.102 at o2ib 0s ago has  
>>> timed out (limit 5s). LustreError: 18125:0:(obd_mount.c: 
>>> 1062:server_start_targets()) Required registration failed for  
>>> lfs01-OSTffff: -5 LustreError: 15f-b: Communication error with  
>>> the MGS. Is the MGS running? LustreError: 18125:0:(obd_mount.c: 
>>> 1597:server_fill_super()) Unable to start targets: -5  
>>> LustreError: 18125:0:(obd_mount.c:1382:server_put_super()) no obd  
>>> lfs01-OSTffff LustreError: 18125:0:(obd_mount.c: 
>>> 119:server_deregister_mount()) lfs01-OSTffff not registered  
>>> LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success) LDISKFS-fs:  
>>> mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0   breaks,  
>>> 0 lost LDISKFS-fs: mballoc: 0 generated and it took 0 LDISKFS-fs:  
>>> mballoc: 0 preallocated, 0 discarded Oct 10 07:56:50 oss-1  
>>> kernel: Lustre: Changing connection for MGC192.168.30.101 at o2ib to  
>>> MGC192.1Lustre: server umount lfs01- OSTffff complete  
>>> 68.30.101 at o2ib_1LustreError: 18125:0:(obd_mount.c:  
>>> 1951:lustre_fill_super()) Unable to mount (-5) / 
>>> 192.168.30.102 at o2ib Oct 10 07:56:50 oss-1 kernel: LustreError:  
>>> 4685:0:(events.c: 55:request_out_callback()) @@@ type 4, status  
>>> -113 req at 00000101f8ef3200 x3/t0 o250- 
>>> >MGS at MGC192.168.30.101@o2ib_1:26/25 lens 240/400 e 0 to 5 dl  
>>> 1223621815 ref 2 fl Rpc:/0/0 rc 0/0Oct 10 07:56:50 oss-1 kernel:  
>>> Lustre: Request x3 sent from   MGC192.168.30.101 at o2ib to NID  
>>> 192.168.30.102 at o2ib 0s ago has timed out (limit 5s). Oct 10  
>>> 07:56:50 oss-1 kernel: LustreError: 18125:0:(obd_mount.c:  
>>> 1062:server_start_targets()) Required registration failed for  
>>> lfs01- OSTffff: -5 Oct 10 07:56:50 oss-1 kernel: LustreError: 15f- 
>>> b: Communication   error with the MGS. Is the MGS running? Oct 10  
>>> 07:56:50 oss-1 kernel: LustreError: 18125:0:(obd_mount.c:  
>>> 1597:server_fill_super()) Unable to start targets: -5 Oct 10  
>>> 07:56:50 oss-1 kernel: LustreError: 18125:0:(obd_mount.c:  
>>> 1382:server_put_super()) no obd lfs01-OSTffff Oct 10 07:56:50  
>>> oss-1 kernel: LustreError: 18125:0:(obd_mount.c:  
>>> 119:server_deregister_mount()) lfs01-OSTffff not registered Oct  
>>> 10 07:56:50 oss-1 kernel: LDISKFS-fs: mballoc: 0 blocks 0 reqs (0  
>>> success) Oct 10 07:56:50 oss-1 kernel: LDISKFS-fs: mballoc: 0  
>>> extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost Oct 10  
>>> 07:56:51 oss-1 kernel: LDISKFS-fs: mballoc: 0 generated and it  
>>> took 0 Oct 10 07:56:51 oss-1 kernel: LDISKFS-fs: mballoc: 0  
>>> preallocated, 0 discarded Oct 10 07:56:51 oss-1 kernel: Lustre:  
>>> server umount lfs01-OSTffff complete Oct 10 07:56:51 oss-1  
>>> kernel: LustreError: 18125:0:(obd_mount.c: 1951:lustre_fill_super 
>>> ()) Unable to mount (-5) LustreError: 6644:0:(events.c: 
>>> 55:request_out_callback()) @@@ type 4, status -113  
>>> req at 00000103f7a50600 x1/t0 o250-
>>>>
>>>> MGS at MGC192.168.30.101@o2ib_1:26/25 lens 240/400 e 0 to 5 dl
>>> 1223621790 ref 1 fl Complete:EX/0/0 rc -110/0 Oct 10 07:57:15  
>>> oss-1 kernel: LustreError: 6644:0:(events.c:  
>>> 55:request_out_callback()) @@@ type 4, status -113  
>>> req at 00000103f7a50600 x1/t0 o250- 
>>> >MGS at MGC192.168.30.101@o2ib_1:26/25 lens 240/400 e 0 to 5 dl  
>>> 1223621790 ref 1 fl Complete:EX/0/0 rc -110/0 Oct 10 08:04:09  
>>> oss-1 sshd(pam_unix)[18530]: session opened for user root by root 
>>> (uid=0) LDISKFS-fs: file extents enabled LDISKFS-fs: mballoc  
>>> enabled Lustre: lfs01-OST0000: new disk, initializing Lustre:  
>>> Server lfs01-OST0000 on device /dev/md11 has started Oct 10  
>>> 08:06:49 oss-1 kernel: kjournald starting. Commit interval 5  
>>> seconds Oct 10 08:06:49 oss-1 kernel: LDISKFS FS on md11,  
>>> external journal on md21 Oct 10 08:06:49 oss-1 kernel: LDISKFS- 
>>> fs: mounted filesystem with journal data mode. Oct 10 08:06:49  
>>> oss-1 kernel: kjournald starting. Commit interval 5 seconds Oct  
>>> 10 08:06:49 oss-1 kernel: LDISKFS FS on md11, external journal on  
>>> md21 Oct 10 08:06:49 oss-1 kernel: LDISKFS-fs: mounted filesystem  
>>> with journal data mode. Oct 10 08:06:49 oss-1 kernel: LDISKFS-fs:  
>>> file extents enabled Oct 10 08:06:49 oss-1 kernel: LDISKFS-fs:  
>>> mballoc enabled Oct 10 08:06:49 oss-1 kernel: Lustre: Filtering  
>>> OBD driver; info at clusterfs.com Oct 10 08:06:49 oss-1 kernel:  
>>> Lustre: lfs01-OST0000: new disk, initializing Oct 10 08:06:49  
>>> oss-1 kernel: Lustre: OST lfs01-OST0000 now serving dev (lfs01- 
>>> OST0000/ccc68ac6-5b58-acd6-455b-2df9d2980009) with recovery  
>>> enabled Oct 10 08:06:49 oss-1 kernel: Lustre: Server lfs01- 
>>> OST0000 on device /dev/md11 has started Lustre: lfs01-OST0000:  
>>> received MDS connection from 192.168.30.101 at o2ib Oct 10 08:06:54  
>>> oss-1 kernel: Lustre: lfs01-OST0000: received MDS connection from  
>>> 192.168.30.101 at o2ib LDISKFS-fs: file extents enabled LDISKFS-fs:  
>>> mballoc enabled Lustre: lfs01-OST0001: new disk, initializing  
>>> Lustre: Server lfs01-OST0001 on device /dev/md12 has started Oct  
>>> 10 08:06:56 oss-1 kernel: kjournald starting. Commit interval 5  
>>> seconds Oct 10 08:06:56 oss-1 kernel: LDISKFS FS on md12,  
>>> external journal on md22 Oct 10 08:06:56 oss-1 kernel: LDISKFS- 
>>> fs: mounted filesystem with journal data mode. Oct 10 08:06:56  
>>> oss-1 kernel: kjournald starting. Commit interval 5 seconds Oct  
>>> 10 08:06:56 oss-1 kernel: LDISKFS FS on md12, external journal on  
>>> md22 Oct 10 08:06:56 oss-1 kernel: LDISKFS-fs: mounted filesystem  
>>> with journal data mode. Oct 10 08:06:56 oss-1 kernel: LDISKFS-fs:  
>>> file extents enabled Oct 10 08:06:56 oss-1 kernel: LDISKFS-fs:  
>>> mballoc enabled Oct 10 08:06:56 oss-1 kernel: Lustre: lfs01- 
>>> OST0001: new disk, initializing Oct 10 08:06:56 oss-1 kernel:  
>>> Lustre: OST lfs01-OST0001 now serving dev (lfs01-OST0001/b2122e87- 
>>> be36-bd1a-4e40-fdd41e626d0b) with recovery enabled Oct 10  
>>> 08:06:56 oss-1 kernel: Lustre: Server lfs01-OST0001 on device / 
>>> dev/md12 has started Lustre: lfs01-OST0001: received MDS  
>>> connection from 192.168.30.101 at o2ib Oct 10 08:07:01 oss-1 kernel:  
>>> Lustre: lfs01-OST0001: received MDS connection from  
>>> 192.168.30.101 at o2ib LDISKFS-fs: file extents enabled LDISKFS-fs:  
>>> mballoc enabled Lustre: lfs01-OST0002: new disk, initializing  
>>> Lustre: Server lfs01-OST0002 on device /dev/md13 has started Oct  
>>> 10 08:07:02 oss-1 kernel: kjournald starting. Commit interval 5  
>>> seconds Oct 10 08:07:02 oss-1 kernel: LDISKFS FS on md13,  
>>> external journal on md23 Oct 10 08:07:02 oss-1 kernel: LDISKFS- 
>>> fs: mounted filesystem with journal data mode. Oct 10 08:07:02  
>>> oss-1 kernel: kjournald starting. Commit interval 5 seconds Oct  
>>> 10 08:07:02 oss-1 kernel: LDISKFS FS on md13, external journal on  
>>> md23 Oct 10 08:07:02 oss-1 kernel: LDISKFS-fs: mounted filesystem  
>>> with journal data mode. Oct 10 08:07:02 oss-1 kernel: LDISKFS-fs:  
>>> file extents enabled Oct 10 08:07:02 oss-1 kernel: LDISKFS-fs:  
>>> mballoc enabled Oct 10 08:07:02 oss-1 kernel: Lustre: lfs01- 
>>> OST0002: new disk, initializing Oct 10 08:07:02 oss-1 kernel:  
>>> Lustre: OST lfs01-OST0002 now serving   dev (lfs01- 
>>> OST0002/13c66dfa-47c5-b350-43e3-3c3b67c358b6) with recovery  
>>> enabled Oct 10 08:07:02 oss-1 kernel: Lustre: Server lfs01- 
>>> OST0002 on device /dev/md13 has started Lustre: lfs01-OST0002:  
>>> received MDS connection from 192.168.30.101 at o2ib Oct 10 08:07:06  
>>> oss-1 kernel: Lustre: lfs01-OST0002: received MDS connection from  
>>> 192.168.30.101 at o2ib LDISKFS-fs: file extents enabled LDISKFS-fs:  
>>> mballoc enabled Oct 10 08:07:08 oss-1 kernel: kjournald starting.  
>>> Commit interval 5 seconds OcLustre: lfs01-OST0003: new disk,  
>>> initializing t 10 08:07:08 oss-1 kernel: LDISKFS FS on md15,  
>>> external journalLustre: Server lfs01-OST0003 on device /dev/md15  
>>> has started on md25 Oct 10 08:07:08 oss-1 kernel: LDISKFS-fs:  
>>> mounted filesystem with journal data mode. Oct 10 08:07:08 oss-1  
>>> kernel: kjournald starting. Commit interval 5 seconds Oct 10  
>>> 08:07:08 oss-1 kernel: LDISKFS FS on md15, external journal on  
>>> md25 Oct 10 08:07:08 oss-1 kernel: LDISKFS-fs: mounted filesystem  
>>> with journal data mode. Oct 10 08:07:08 oss-1 kernel: LDISKFS-fs:  
>>> file extents enabled Oct 10 08:07:08 oss-1 kernel: LDISKFS-fs:  
>>> mballoc enabled Oct 10 08:07:08 oss-1 kernel: Lustre: lfs01- 
>>> OST0003: new disk, initializing Oct 10 08:07:08 oss-1 kernel:  
>>> Lustre: OST lfs01-OST0003 now serving   dev (lfs01-OST0003/ 
>>> d6fd7a9d-3bb8-ae05-41ed-bbfb1b6b0303) with recovery enabled Oct  
>>> 10 08:07:08 oss-1 kernel: Lustre: Server lfs01-OST0003 on device / 
>>> dev/md15 has started Lustre: lfs01-OST0003: received MDS  
>>> connection from 192.168.30.101 at o2ib Oct 10 08:07:12 oss-1 kernel:  
>>> Lustre: lfs01-OST0003: received MDS connection from  
>>> 192.168.30.101 at o2ib LDISKFS-fs: file extents enabled LDISKFS-fs:  
>>> mballoc enabled Lustre: lfs01-OST0004: new disk, initializing Oct  
>>> 10 08:07:14 oss-1 kernel: kjournald starting. Commit  
>>> intervLustre: Server lfs01-OST0004 on device /dev/md16 has  
>>> started al 5 seconds Oct 10 08:07:14 oss-1 kernel: LDISKFS FS on  
>>> md16, external journal on md26 Oct 10 08:07:14 oss-1 kernel:  
>>> LDISKFS-fs: mounted filesystem with journal data mode. Oct 10  
>>> 08:07:14 oss-1 kernel: kjournald starting. Commit interval 5  
>>> seconds Oct 10 08:07:14 oss-1 kernel: LDISKFS FS on md16,  
>>> external journal on md26 Oct 10 08:07:14 oss-1 kernel: LDISKFS- 
>>> fs: mounted filesystem with journal data mode. Oct 10 08:07:14  
>>> oss-1 kernel: LDISKFS-fs: file extents enabled Oct 10 08:07:14  
>>> oss-1 kernel: LDISKFS-fs: mballoc enabled Oct 10 08:07:14 oss-1  
>>> kernel: Lustre: lfs01-OST0004: new disk, initializing Oct 10  
>>> 08:07:14 oss-1 kernel: Lustre: OST lfs01-OST0004 now serving    
>>> dev (lfs01-OST0004/661dcb52-7ef9-8274-45d7-4441e36410d1) with  
>>> recovery enabled Oct 10 08:07:14 oss-1 kernel: Lustre: Server  
>>> lfs01-OST0004 on device /dev/md16 has started Lustre: lfs01- 
>>> OST0004: received MDS connection from 192.168.30.101 at o2ib Oct 10  
>>> 08:07:18 oss-1 kernel: Lustre: lfs01-OST0004: received MDS  
>>> connection from 192.168.30.101 at o2ib LDISKFS-fs: file extents  
>>> enabled LDISKFS-fs: mballoc enabled Lustre: lfs01-OST0005: new  
>>> disk, initializing Lustre: Server lfs01-OST0005 on device /dev/ 
>>> md17 has started Oct 10 08:07:19 oss-1 kernel: kjournald  
>>> starting. Commit interval 5 seconds Oct 10 08:07:19 oss-1 kernel:  
>>> LDISKFS FS on md17, external journal on md27 Oct 10 08:07:19  
>>> oss-1 kernel: LDISKFS-fs: mounted filesystem with journal data  
>>> mode. Oct 10 08:07:19 oss-1 kernel: kjournald starting. Commit  
>>> interval 5 seconds Oct 10 08:07:19 oss-1 kernel: LDISKFS FS on  
>>> md17, external journal on md27 Oct 10 08:07:19 oss-1 kernel:  
>>> LDISKFS-fs: mounted filesystem with journal data mode. Oct 10  
>>> 08:07:19 oss-1 kernel: LDISKFS-fs: file extents enabled Oct 10  
>>> 08:07:20 oss-1 kernel: LDISKFS-fs: mballoc enabled Oct 10  
>>> 08:07:20 oss-1 kernel: Lustre: lfs01-OST0005: new disk,  
>>> initializing Oct 10 08:07:20 oss-1 kernel: Lustre: OST lfs01- 
>>> OST0005 now serving   dev (lfs01- 
>>> OST0005/978ba68c-0ba7-9ac7-439f-964ca7bf86a3) with recovery  
>>> enabled Oct 10 08:07:20 oss-1 kernel: Lustre: Server lfs01- 
>>> OST0005 on device /dev/md17 has started Lustre: lfs01-OST0005:  
>>> received MDS connection from 192.168.30.101 at o2ib Oct 10 08:07:25  
>>> oss-1 kernel: Lustre: lfs01-OST0005: received MDS connection from  
>>> 192.168.30.101 at o2ib Oct 10 08:45:00 oss-1 faultmond: 17:Polling  
>>> all 48 slots for drive fault Oct 10 08:45:06 oss-1 faultmond:  
>>> Polling cycle 17 is complete Oct 10 09:45:06 oss-1 faultmond:  
>>> 18:Polling all 48 slots for drive fault Oct 10 09:45:12 oss-1  
>>> faultmond: Polling cycle 18 is complete Oct 10 10:45:12 oss-1  
>>> faultmond: 19:Polling all 48 slots for drive fault Oct 10  
>>> 10:45:17 oss-1 faultmond: Polling cycle 19 is complete  
>>> LustreError: 18732:0:(lustre_fsfilt.h:312:fsfilt_setattr())  
>>> lfs01- OST0001: slow setattr 85s Oct 10 10:48:14 oss-1 kernel:  
>>> LustreError: 18732:0:(lustre_fsfilt.h: 312:fsfilt_setattr())  
>>> lfs01-OST0001: slow setattr 85s Oct 10 11:45:17 oss-1 faultmond:  
>>> 20:Polling all 48 slots for drive fault Oct 10 11:45:25 oss-1  
>>> faultmond: Polling cycle 20 is complete Oct 10 12:45:25 oss-1  
>>> faultmond: 21:Polling all 48 slots for drive fault Oct 10  
>>> 12:45:33 oss-1 faultmond: Polling cycle 21 is complete Lustre:  
>>> 18805:0:(lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- OST0005:  
>>> slow setattr 33s Oct 10 13:14:46 oss-1 kernel: Lustre: 18805:0: 
>>> (lustre_fsfilt.h: 312:fsfilt_setattr()) lfs01-OST0005: slow  
>>> setattr 33s Lustre: 18794:0:(lustre_fsfilt.h:312:fsfilt_setattr 
>>> ()) lfs01- OST0000: slow setattr 43s Oct 10 13:15:03 oss-1  
>>> kernel: Lustre: 18794:0:(lustre_fsfilt.h: 312:fsfilt_setattr())  
>>> lfs01-OST0000: slow setattr 43s Lustre: 18815:0:(lustre_fsfilt.h: 
>>> 312:fsfilt_setattr()) lfs01- OST0004: slow setattr 40s Oct 10  
>>> 13:15:13 oss-1 kernel: Lustre: 18815:0:(lustre_fsfilt.h:  
>>> 312:fsfilt_setattr()) lfs01-OST0004: slow setattr 40s Lustre:  
>>> 18809:0:(filter_io_26.c:700:filter_commitrw_write()) lfs01-  
>>> OST0003: slow i_mutex 31s Lustre: 18753:0:(filter_io_26.c: 
>>> 700:filter_commitrw_write()) lfs01- OST0003: slow i_mutex 31s Oct  
>>> 10 13:15:25 oss-1 kernel: Lustre: 18809:0:(filter_io_26.c:  
>>> 700:filter_commitrw_write()) lfs01-OST0003: slow i_mutex 31s Oct  
>>> 10 13:15:25 oss-1 kernel: Lustre: 18753:0:(filter_io_26.c:  
>>> 700:filter_commitrw_write()) lfs01-OST0003: slow i_mutex 31s  
>>> Lustre: 18768:0:(filter_io_26.c:700:filter_commitrw_write())  
>>> lfs01- OST0002: slow i_mutex 34s Lustre: 18768:0:(filter_io_26.c: 
>>> 700:filter_commitrw_write()) Skipped 2 previous similar messages  
>>> Oct 10 13:15:28 oss-1 kernel: Lustre: 18768:0:(filter_io_26.c:  
>>> 700:filter_commitrw_write()) lfs01-OST0002: slow i_mutex 34s Oct  
>>> 10 13:15:28 oss-1 kernel: Lustre: 18768:0:(filter_io_26.c:  
>>> 700:filter_commitrw_write()) Skipped 2 previous similar messages  
>>> Lustre: 18833:0:(filter_io_26.c:700:filter_commitrw_write())  
>>> lfs01- OST0001: slow i_mutex 37s Oct 10 13:15:31 oss-1 kernel:  
>>> Lustre: 18833:0:(filter_io_26.c: 700:filter_commitrw_write())  
>>> lfs01-OST0001: slow i_mutex 37s Lustre: 18812:0:(filter_io_26.c: 
>>> 700:filter_commitrw_write()) lfs01- OST0002: slow i_mutex 40s  
>>> Lustre: 18844:0:(filter_io_26.c:765:filter_commitrw_write())  
>>> lfs01- OST0003: slow direct_io 40s Oct 10 13:15:34 oss-1 kernel:  
>>> Lustre: 18812:0:(filter_io_26.c: 700:filter_commitrw_write())  
>>> lfs01-OST0002: slow i_mutex 40s Oct 10 13:15:34 oss-1 kernel:  
>>> Lustre: 18844:0:(filter_io_26.c: 765:filter_commitrw_write())  
>>> lfs01-OST0003: slow direct_io 40s Lustre: 18741:0: 
>>> (lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- OST0001: slow  
>>> setattr 41s Lustre: 18849:0:(filter_io_26.c: 
>>> 765:filter_commitrw_write()) lfs01- OST0001: slow direct_io 31s  
>>> Oct 10 13:15:35 oss-1 kernel: Lustre: 18741:0:(lustre_fsfilt.h:  
>>> 312:fsfilt_setattr()) lfs01-OST0001: slow setattr 41s Oct 10  
>>> 13:15:35 oss-1 kernel: Lustre: 18849:0:(filter_io_26.c:  
>>> 765:filter_commitrw_write()) lfs01-OST0001: slow direct_io 31s  
>>> LustreError: 18765:0:(lustre_fsfilt.h:312:fsfilt_setattr())  
>>> lfs01- OST0002: slow setattr 51s Oct 10 13:15:38 oss-1 kernel:  
>>> LustreError: 18765:0:(lustre_fsfilt.h: 312:fsfilt_setattr())  
>>> lfs01-OST0002: slow setattr 51s Lustre: 18756:0:(filter_io_26.c: 
>>> 700:filter_commitrw_write()) lfs01- OST0002: slow i_mutex 45s Oct  
>>> 10 13:15:39 oss-1 kernel: Lustre: 18756:0:(filter_io_26.c:  
>>> 700:filter_commitrw_write()) lfs01-OST0002: slow i_mutex 45s Oct  
>>> 10 13:45:33 oss-1 faultmond: 22:Polling all 48 slots for drive  
>>> fault Oct 10 13:45:41 oss-1 faultmond: Polling cycle 22 is  
>>> complete Oct 10 14:45:41 oss-1 faultmond: 23:Polling all 48 slots  
>>> for drive fault Oct 10 14:45:49 oss-1 faultmond: Polling cycle 23  
>>> is complete Lustre: 18740:0:(lustre_fsfilt.h:312:fsfilt_setattr 
>>> ()) lfs01- OST0000: slow setattr 38s Oct 10 15:40:41 oss-1  
>>> kernel: Lustre: 18740:0:(lustre_fsfilt.h: 312:fsfilt_setattr())  
>>> lfs01-OST0000: slow setattr 38s LustreError: 18830:0: 
>>> (lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- OST0004: slow  
>>> setattr 60s Lustre: 18767:0:(lustre_fsfilt.h:312:fsfilt_setattr 
>>> ()) lfs01- OST0005: slow setattr 38s Oct 10 15:41:13 oss-1  
>>> kernel: LustreError: 18830:0:(lustre_fsfilt.h: 312:fsfilt_setattr 
>>> ()) lfs01-OST0004: slow setattr 60s Oct 10 15:41:13 oss-1 kernel:  
>>> Lustre: 18767:0:(lustre_fsfilt.h: 312:fsfilt_setattr()) lfs01- 
>>> OST0005: slow setattr 38s Lustre: 18796:0:(lustre_fsfilt.h: 
>>> 312:fsfilt_setattr()) lfs01- OST0001: slow setattr 44s Oct 10  
>>> 15:41:20 oss-1 kernel: Lustre: 18796:0:(lustre_fsfilt.h:  
>>> 312:fsfilt_setattr()) lfs01-OST0001: slow setattr 44s  
>>> LustreError: 18831:0:(lustre_fsfilt.h:312:fsfilt_setattr())  
>>> lfs01- OST0002: slow setattr 62s Oct 10 15:41:21 oss-1 kernel:  
>>> LustreError: 18831:0:(lustre_fsfilt.h: 312:fsfilt_setattr())  
>>> lfs01-OST0002: slow setattr 62s Oct 10 15:45:49 oss-1 faultmond:  
>>> 24:Polling all 48 slots for drive fault Oct 10 15:45:58 oss-1  
>>> faultmond: Polling cycle 24 is complete Oct 10 16:45:58 oss-1  
>>> faultmond: 25:Polling all 48 slots for drive fault Oct 10  
>>> 16:46:06 oss-1 faultmond: Polling cycle 25 is complete Oct 10  
>>> 17:46:06 oss-1 faultmond: 26:Polling all 48 slots for drive fault  
>>> Oct 10 17:46:15 oss-1 faultmond: Polling cycle 26 is complete  
>>> Lustre: 18741:0:(lustre_fsfilt.h:312:fsfilt_setattr()) lfs01-  
>>> OST0000: slow setattr 41s Lustre: 18726:0:(service.c: 
>>> 918:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 7s  
>>> req at 00000101e8f1de00 x15789/t0 o13-><?>@<?
>>>>
>>>> :0/0 lens 128/0 e 0 to 0 dl 0 ref 1 fl New:/0/0 rc 0/0
>>> Lustre: 18726:0:(service.c:918:ptlrpc_server_handle_req_in()) @@@  
>>> Slow req_in handling 7s req at 00000101e8f1da00 x15790/t0 o13-><?>@<?
>>>>
>>>> :0/0 lens 128/0 e 0 to 0 dl 0 ref 1 fl New:/0/0 rc 0/0
>>> Lustre: 18726:0:(service.c:918:ptlrpc_server_handle_req_in())  
>>> Skipped 3 previous similar messages Lustre: 18764:0: 
>>> (lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- OST0004: slow  
>>> setattr 40s Oct 10 18:06:33 oss-1 kernel: Lustre: 18741:0: 
>>> (lustre_fsfilt.h: 312:fsfilt_setattr()) lfs01-OST0000: slow  
>>> setattr 41s Oct 10 18:06:33 oss-1 kernel: Lustre: 18726:0: 
>>> (service.c: 918:ptlrpc_server_handle_req_in()) @@@ Slow req_in  
>>> handling 7s req at 00000101e8f1de00 x15789/t0 o13-><?>@<?>:0/0 lens  
>>> 128/0 e 0 to 0 dl 0 ref 1 fl New:/0/0 rc 0/0 Oct 10 18:06:33  
>>> oss-1 kernel: Lustre: 18726:0:(service.c:  
>>> 918:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 7s  
>>> req at 00000101e8f1da00 x15790/t0 o13-><?>@<?>:0/0 lens 128/0 e 0 to  
>>> 0 dl 0 ref 1 fl New:/0/0 rc 0/0 Lustre: 18845:0:(lustre_fsfilt.h: 
>>> 312:fsfilt_setattr()) lfs01- OST0002: slow setattr 44s Lustre:  
>>> 18579:0:(service.c:918:ptlrpc_server_handle_req_in()) @@@ Slow  
>>> req_in handling 14s req at 00000103f8dabe00 x7271650/t0 o103-><?
>>>>
>>>> @<?>:0/0 lens 232/0 e 0 to 0 dl 0 ref 1 fl New:/0/0 rc 0/0
>>> Oct 10 18:06:54 oss-1 kernel: Lustre: 18726:0:(service.c:  
>>> 918:ptlrpc_server_handle_req_in()) Skipped 3 previous similar  
>>> messages Oct 10 18:06:54 oss-1 kernel: Lustre: 18764:0: 
>>> (lustre_fsfilt.h: 312:fsfilt_setattr()) lfs01-OST0004: slow  
>>> setattr 40s Oct 10 18:06:54 oss-1 kernel: Lustre: 18845:0: 
>>> (lustre_fsfilt.h: 312:fsfilt_setattr()) lfs01-OST0002: slow  
>>> setattr 44s Oct 10 18:06:54 oss-1 kernel: Lustre: 18579:0: 
>>> (service.c: 918:ptlrpc_server_handle_req_in()) @@@ Slow req_in  
>>> handling 14s req at 00000103f8dabe00 x7271650/t0 o103-><?>@<?>:0/0  
>>> lens 232/0 e 0 to 0 dl 0 ref 1 fl New:/0/0 rc 0/0 Lustre: 18766:0: 
>>> (lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- OST0005: slow  
>>> setattr 32s Lustre: 18766:0:(lustre_fsfilt.h:312:fsfilt_setattr 
>>> ()) Skipped 1 previous similar message Oct 10 18:06:59 oss-1  
>>> kernel: Lustre: 18766:0:(lustre_fsfilt.h: 312:fsfilt_setattr())  
>>> lfs01-OST0005: slow setattr 32s Oct 10 18:06:59 oss-1 kernel:  
>>> Lustre: 18766:0:(lustre_fsfilt.h: 312:fsfilt_setattr()) Skipped 1  
>>> previous similar message Lustre: 18826:0:(lustre_fsfilt.h: 
>>> 312:fsfilt_setattr()) lfs01- OST0003: slow setattr 45s Oct 10  
>>> 18:07:04 oss-1 kernel: Lustre: 18826:0:(lustre_fsfilt.h:  
>>> 312:fsfilt_setattr()) lfs01-OST0003: slow setattr 45s Oct 10  
>>> 18:46:15 oss-1 faultmond: 27:Polling all 48 slots for drive fault  
>>> ----------- [cut here ] --------- [please bite here ] ---------  
>>> Kernel BUG at spinlock:76 invalid operand: 0000 [1] SMP CPU 2  
>>> Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U)  
>>> lustre(U) lov(U) mdc(U) lquota(U) osc(U) ko2iblnd(U) ptlrpc(U)  
>>> obdclass(U) lvfs(U) ldiskfs(U) lnet(U) libcfs(U) raid5(U) xor(U)  
>>> parport_pc(U) lp(U) parport(U) autofs4(U) i2c_dev(U) i2c_core(U)  
>>> ipmi_devintf(U) ipmi_si(U) ipmi_msghandler(U) sunrpc(U) rdma_ucm 
>>> (U) qlgc_vnic(U) ib_sdp(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib 
>>> (U) md5(U) ipv6(U) iw_cxgb3(U) cxgb3(U) ib_ipath(U) mlx4_ib(U)  
>>> mlx4_core (U) ds(U) yenta_socket(U) pcmcia_core(U) dm_mirror(U)  
>>> dm_multipath (U) dm_mod(U) button(U) battery(U) ac(U) joydev(U)  
>>> ohci_hcd(U) ehci_hcd(U) hw_random(U) edac_mc(U) ib_mthca(U)  
>>> ib_umad(U) ib_ucm (U) ib_uverbs(U) ib_cm(U) ib_sa(U) ib_mad(U)  
>>> ib_core(U) e1000(U) ext3(U) jbd(U) raid1(U) mv_sata(U) sd_mod(U)  
>>> scsi_mod(U) _______________________________________________  
>>> Lustre-discuss mailing list Lustre-discuss at lists.lustre.org  
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>> _______________________________________________ Lustre-discuss  
>> mailing list Lustre-discuss at lists.lustre.org http:// 
>> lists.lustre.org/mailman/listinfo/lustre-discuss
>
> -- 
> <6g_top.gif>
> Malcolm Cowe
> Solutions Integration Engineer
>
> Sun Microsystems, Inc.
> Blackness Road
> Linlithgow, West Lothian EH49 7LR UK
> Phone: x73602 / +44 1506 673 602
> Email: Malcolm.Cowe at Sun.COM




More information about the lustre-discuss mailing list