[Lustre-discuss] Lustre 1.6.5.1 on X4200 and STK 6140 Issues

Malcolm Cowe Malcolm.Cowe at Sun.COM
Mon Oct 13 08:03:46 PDT 2008


Brock Palen wrote:
> I know you say the only addition was the RDAC for the MDS's I assume  
> (we use it also just fine).
>
>   
Yes, the MDS's share a STK 6140.
> When I ran faultmond from suns dcmu rpm (RHEL 4 here)  the x4500's  
> would crash like clock work ~48 hours.  For a very simple bit of code  
> I was surpised that once when I forgot to turn it on when working on  
> the load this would happen.  Just FYI it was unrelated to lustre  
> (using provided rpm's no kernel build)  this solved my problem on the  
> x4500
>
>   
The DCMU RPM is installed. I didn't explicitly install this, so it must 
have been bundled in with the SIA CD... I'll try removing the rpm to see 
what happens. Thanks for the heads up.

Regards,

Malcolm.

> Brock Palen
> www.umich.edu/~brockp
> Center for Advanced Computing
> brockp at umich.edu
> (734)936-1985
>
>
>
> On Oct 13, 2008, at 4:41 AM, Malcolm Cowe wrote:
>
>   
>> The X4200m2 MDS systems and the X4500 OSS were rebuilt using the  
>> stock Lustre packages (Kernel + modules + userspace). With the  
>> exception of the RDAC kernel module, no additional software was  
>> applied to the systems. We recreated our volumes and ran the  
>> servers over the weekend. However, the OSS crashed about 8 hours  
>> in. The syslog output is attached to this message.
>>
>> Looks like it could be similar to bug #16404, which means patching  
>> and rebuilding the kernel. Given my lack of success at trying to  
>> build from source, I am again asking for some guidance on how to do  
>> this. I sent out the steps I used to try and build from source on  
>> the 7th because I was encountering problems and was unable to get a  
>> working set of packages. Included in that messages was output from  
>> quilt that implies that the kernel patching process was not working  
>> properly.
>>
>>
>> Regards,
>>
>> Malcolm.
>>
>> -- 
>> <6g_top.gif>
>> Malcolm Cowe
>> Solutions Integration Engineer
>>
>> Sun Microsystems, Inc.
>> Blackness Road
>> Linlithgow, West Lothian EH49 7LR UK
>> Phone: x73602 / +44 1506 673 602
>> Email: Malcolm.Cowe at Sun.COM
>> <6g_top.gif>
>> Oct 10 06:49:39 oss-1 kernel: LDISKFS FS on md15, internal journal
>> Oct 10 06:49:39 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
>> ordered data mode.
>> Oct 10 06:53:42 oss-1 kernel: kjournald starting.  Commit interval  
>> 5 seconds
>> Oct 10 06:53:42 oss-1 kernel: LDISKFS FS on md16, internal journal
>> Oct 10 06:53:42 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
>> ordered data mode.
>> Oct 10 06:57:49 oss-1 kernel: kjournald starting.  Commit interval  
>> 5 seconds
>> Oct 10 06:57:49 oss-1 kernel: LDISKFS FS on md17, internal journal
>> Oct 10 06:57:49 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
>> ordered data mode.
>> Oct 10 07:44:55 oss-1 faultmond: 16:Polling all 48 slots for drive  
>> fault
>> Oct 10 07:45:00 oss-1 faultmond: Polling cycle 16 is complete
>> Oct 10 07:56:23 oss-1 kernel: Lustre: OBD class driver,  
>> info at clusterfs.com
>> Oct 10 07:56:23 oss-LDISKFS-fs: file extents enabled1 kernel:
>>   Lustre VersionLDISKFS-fs: mballoc enabled
>> : 1.6.5.1
>> Oct 10 07:56:23 oss-1 kernel:         Build Version:  
>> 1.6.5.1-19691231190000-PRISTINE-.cache.OLDRPMS.20080618230526.linux- 
>> smp-2.6.9-67.0.7.EL_lustre.1.6.5.1.x86_64-2.6.9-67.0.7.EL_lustre. 
>> 1.6.5.1smp
>> Oct 10 07:56:24 oss-1 kernel: Lustre: Added LNI 192.168.30.111 at o2ib  
>> [8/64]
>> Oct 10 07:56:24 oss-1 kernel: Lustre: Lustre Client File System;  
>> info at clusterfs.com
>> Oct 10 07:56:24 oss-1 kernel: kjournald starting.  Commit interval  
>> 5 seconds
>> Oct 10 07:56:24 oss-1 kernel: LDISKFS FS on md11, external journal  
>> on md21
>> Oct 10 07:56:24 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
>> journal data mode.
>> Oct 10 07:56:24 oss-1 kernel: kjournald starting.  Commit interval  
>> 5 seconds
>> Oct 10 07:56:24 oss-1 kernel: LDISKFS FS on md11, external journal  
>> on md21
>> Oct 10 07:56:24 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
>> journal data mode.
>> Oct 10 07:56:24 oss-1 kernel: LDISKFS-fs: file extents enabled
>> Oct 10 07:56:24 oss-1 kernel: LDISKFS-fs: mballoc enabled
>> Lustre: Request x1 sent from MGC192.168.30.101 at o2ib to NID  
>> 192.168.30.101 at o2ib 5s ago has timed out (limit 5s).
>> Oct 10 07:56:30 oss-1 kernel: Lustre: Request x1 sent from  
>> MGC192.168.30.101 at o2ib to NID 192.168.30.101 at o2ib 5s ago has timed  
>> out (limit 5s).
>> LustreError: 4685:0:(events.c:55:request_out_callback()) @@@ type  
>> 4, status -113  req at 00000101f8ef3200 x3/t0 o250- 
>>     
>>> MGS at MGC192.168.30.101@o2ib_1:26/25 lens 240/400 e 0 to 5 dl  
>>>       
>> 1223621815 ref 2 fl Rpc:/0/0 rc 0/0
>> Lustre: Request x3 sent from MGC192.168.30.101 at o2ib to NID  
>> 192.168.30.102 at o2ib 0s ago has timed out (limit 5s).
>> LustreError: 18125:0:(obd_mount.c:1062:server_start_targets())  
>> Required registration failed for lfs01-OSTffff: -5
>> LustreError: 15f-b: Communication error with the MGS.  Is the MGS  
>> running?
>> LustreError: 18125:0:(obd_mount.c:1597:server_fill_super()) Unable  
>> to start targets: -5
>> LustreError: 18125:0:(obd_mount.c:1382:server_put_super()) no obd  
>> lfs01-OSTffff
>> LustreError: 18125:0:(obd_mount.c:119:server_deregister_mount())  
>> lfs01-OSTffff not registered
>> LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success)
>> LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0  
>> breaks, 0 lost
>> LDISKFS-fs: mballoc: 0 generated and it took 0
>> LDISKFS-fs: mballoc: 0 preallocated, 0 discarded
>> Oct 10 07:56:50 oss-1 kernel: Lustre: Changing connection for  
>> MGC192.168.30.101 at o2ib to MGC192.1Lustre: server umount lfs01- 
>> OSTffff complete
>> 68.30.101 at o2ib_1LustreError: 18125:0:(obd_mount.c: 
>> 1951:lustre_fill_super()) Unable to mount  (-5)
>> /192.168.30.102 at o2ib
>> Oct 10 07:56:50 oss-1 kernel: LustreError: 4685:0:(events.c: 
>> 55:request_out_callback()) @@@ type 4, status -113   
>> req at 00000101f8ef3200 x3/t0 o250->MGS at MGC192.168.30.101@o2ib_1:26/25  
>> lens 240/400 e 0 to 5 dl 1223621815 ref 2 fl Rpc:/0/0 rc 0/0Oct 10  
>> 07:56:50 oss-1 kernel: Lustre: Request x3 sent from  
>> MGC192.168.30.101 at o2ib to NID 192.168.30.102 at o2ib 0s ago has timed  
>> out (limit 5s).
>> Oct 10 07:56:50 oss-1 kernel: LustreError: 18125:0:(obd_mount.c: 
>> 1062:server_start_targets()) Required registration failed for lfs01- 
>> OSTffff: -5
>> Oct 10 07:56:50 oss-1 kernel: LustreError: 15f-b: Communication  
>> error with the MGS.  Is the MGS running?
>> Oct 10 07:56:50 oss-1 kernel: LustreError: 18125:0:(obd_mount.c: 
>> 1597:server_fill_super()) Unable to start targets: -5
>> Oct 10 07:56:50 oss-1 kernel: LustreError: 18125:0:(obd_mount.c: 
>> 1382:server_put_super()) no obd lfs01-OSTffff
>> Oct 10 07:56:50 oss-1 kernel: LustreError: 18125:0:(obd_mount.c: 
>> 119:server_deregister_mount()) lfs01-OSTffff not registered
>> Oct 10 07:56:50 oss-1 kernel: LDISKFS-fs: mballoc: 0 blocks 0 reqs  
>> (0 success)
>> Oct 10 07:56:50 oss-1 kernel: LDISKFS-fs: mballoc: 0 extents  
>> scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost
>> Oct 10 07:56:51 oss-1 kernel: LDISKFS-fs: mballoc: 0 generated and  
>> it took 0
>> Oct 10 07:56:51 oss-1 kernel: LDISKFS-fs: mballoc: 0 preallocated,  
>> 0 discarded
>> Oct 10 07:56:51 oss-1 kernel: Lustre: server umount lfs01-OSTffff  
>> complete
>> Oct 10 07:56:51 oss-1 kernel: LustreError: 18125:0:(obd_mount.c: 
>> 1951:lustre_fill_super()) Unable to mount  (-5)
>> LustreError: 6644:0:(events.c:55:request_out_callback()) @@@ type  
>> 4, status -113  req at 00000103f7a50600 x1/t0 o250- 
>>     
>>> MGS at MGC192.168.30.101@o2ib_1:26/25 lens 240/400 e 0 to 5 dl  
>>>       
>> 1223621790 ref 1 fl Complete:EX/0/0 rc -110/0
>> Oct 10 07:57:15 oss-1 kernel: LustreError: 6644:0:(events.c: 
>> 55:request_out_callback()) @@@ type 4, status -113   
>> req at 00000103f7a50600 x1/t0 o250->MGS at MGC192.168.30.101@o2ib_1:26/25  
>> lens 240/400 e 0 to 5 dl 1223621790 ref 1 fl Complete:EX/0/0 rc -110/0
>> Oct 10 08:04:09 oss-1 sshd(pam_unix)[18530]: session opened for  
>> user root by root(uid=0)
>> LDISKFS-fs: file extents enabled
>> LDISKFS-fs: mballoc enabled
>> Lustre: lfs01-OST0000: new disk, initializing
>> Lustre: Server lfs01-OST0000 on device /dev/md11 has started
>> Oct 10 08:06:49 oss-1 kernel: kjournald starting.  Commit interval  
>> 5 seconds
>> Oct 10 08:06:49 oss-1 kernel: LDISKFS FS on md11, external journal  
>> on md21
>> Oct 10 08:06:49 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
>> journal data mode.
>> Oct 10 08:06:49 oss-1 kernel: kjournald starting.  Commit interval  
>> 5 seconds
>> Oct 10 08:06:49 oss-1 kernel: LDISKFS FS on md11, external journal  
>> on md21
>> Oct 10 08:06:49 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
>> journal data mode.
>> Oct 10 08:06:49 oss-1 kernel: LDISKFS-fs: file extents enabled
>> Oct 10 08:06:49 oss-1 kernel: LDISKFS-fs: mballoc enabled
>> Oct 10 08:06:49 oss-1 kernel: Lustre: Filtering OBD driver;  
>> info at clusterfs.com
>> Oct 10 08:06:49 oss-1 kernel: Lustre: lfs01-OST0000: new disk,  
>> initializing
>> Oct 10 08:06:49 oss-1 kernel: Lustre: OST lfs01-OST0000 now serving  
>> dev (lfs01-OST0000/ccc68ac6-5b58-acd6-455b-2df9d2980009) with  
>> recovery enabled
>> Oct 10 08:06:49 oss-1 kernel: Lustre: Server lfs01-OST0000 on  
>> device /dev/md11 has started
>> Lustre: lfs01-OST0000: received MDS connection from  
>> 192.168.30.101 at o2ib
>> Oct 10 08:06:54 oss-1 kernel: Lustre: lfs01-OST0000: received MDS  
>> connection from 192.168.30.101 at o2ib
>> LDISKFS-fs: file extents enabled
>> LDISKFS-fs: mballoc enabled
>> Lustre: lfs01-OST0001: new disk, initializing
>> Lustre: Server lfs01-OST0001 on device /dev/md12 has started
>> Oct 10 08:06:56 oss-1 kernel: kjournald starting.  Commit interval  
>> 5 seconds
>> Oct 10 08:06:56 oss-1 kernel: LDISKFS FS on md12, external journal  
>> on md22
>> Oct 10 08:06:56 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
>> journal data mode.
>> Oct 10 08:06:56 oss-1 kernel: kjournald starting.  Commit interval  
>> 5 seconds
>> Oct 10 08:06:56 oss-1 kernel: LDISKFS FS on md12, external journal  
>> on md22
>> Oct 10 08:06:56 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
>> journal data mode.
>> Oct 10 08:06:56 oss-1 kernel: LDISKFS-fs: file extents enabled
>> Oct 10 08:06:56 oss-1 kernel: LDISKFS-fs: mballoc enabled
>> Oct 10 08:06:56 oss-1 kernel: Lustre: lfs01-OST0001: new disk,  
>> initializing
>> Oct 10 08:06:56 oss-1 kernel: Lustre: OST lfs01-OST0001 now serving  
>> dev (lfs01-OST0001/b2122e87-be36-bd1a-4e40-fdd41e626d0b) with  
>> recovery enabled
>> Oct 10 08:06:56 oss-1 kernel: Lustre: Server lfs01-OST0001 on  
>> device /dev/md12 has started
>> Lustre: lfs01-OST0001: received MDS connection from  
>> 192.168.30.101 at o2ib
>> Oct 10 08:07:01 oss-1 kernel: Lustre: lfs01-OST0001: received MDS  
>> connection from 192.168.30.101 at o2ib
>> LDISKFS-fs: file extents enabled
>> LDISKFS-fs: mballoc enabled
>> Lustre: lfs01-OST0002: new disk, initializing
>> Lustre: Server lfs01-OST0002 on device /dev/md13 has started
>> Oct 10 08:07:02 oss-1 kernel: kjournald starting.  Commit interval  
>> 5 seconds
>> Oct 10 08:07:02 oss-1 kernel: LDISKFS FS on md13, external journal  
>> on md23
>> Oct 10 08:07:02 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
>> journal data mode.
>> Oct 10 08:07:02 oss-1 kernel: kjournald starting.  Commit interval  
>> 5 seconds
>> Oct 10 08:07:02 oss-1 kernel: LDISKFS FS on md13, external journal  
>> on md23
>> Oct 10 08:07:02 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
>> journal data mode.
>> Oct 10 08:07:02 oss-1 kernel: LDISKFS-fs: file extents enabled
>> Oct 10 08:07:02 oss-1 kernel: LDISKFS-fs: mballoc enabled
>> Oct 10 08:07:02 oss-1 kernel: Lustre: lfs01-OST0002: new disk,  
>> initializing
>> Oct 10 08:07:02 oss-1 kernel: Lustre: OST lfs01-OST0002 now serving  
>> dev (lfs01-OST0002/13c66dfa-47c5-b350-43e3-3c3b67c358b6) with  
>> recovery enabled
>> Oct 10 08:07:02 oss-1 kernel: Lustre: Server lfs01-OST0002 on  
>> device /dev/md13 has started
>> Lustre: lfs01-OST0002: received MDS connection from  
>> 192.168.30.101 at o2ib
>> Oct 10 08:07:06 oss-1 kernel: Lustre: lfs01-OST0002: received MDS  
>> connection from 192.168.30.101 at o2ib
>> LDISKFS-fs: file extents enabled
>> LDISKFS-fs: mballoc enabled
>> Oct 10 08:07:08 oss-1 kernel: kjournald starting.  Commit interval  
>> 5 seconds
>> OcLustre: lfs01-OST0003: new disk, initializing
>> t 10 08:07:08 oss-1 kernel: LDISKFS FS on md15, external  
>> journalLustre: Server lfs01-OST0003 on device /dev/md15 has started
>>  on md25
>> Oct 10 08:07:08 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
>> journal data mode.
>> Oct 10 08:07:08 oss-1 kernel: kjournald starting.  Commit interval  
>> 5 seconds
>> Oct 10 08:07:08 oss-1 kernel: LDISKFS FS on md15, external journal  
>> on md25
>> Oct 10 08:07:08 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
>> journal data mode.
>> Oct 10 08:07:08 oss-1 kernel: LDISKFS-fs: file extents enabled
>> Oct 10 08:07:08 oss-1 kernel: LDISKFS-fs: mballoc enabled
>> Oct 10 08:07:08 oss-1 kernel: Lustre: lfs01-OST0003: new disk,  
>> initializing
>> Oct 10 08:07:08 oss-1 kernel: Lustre: OST lfs01-OST0003 now serving  
>> dev (lfs01-OST0003/d6fd7a9d-3bb8-ae05-41ed-bbfb1b6b0303) with  
>> recovery enabled
>> Oct 10 08:07:08 oss-1 kernel: Lustre: Server lfs01-OST0003 on  
>> device /dev/md15 has started
>> Lustre: lfs01-OST0003: received MDS connection from  
>> 192.168.30.101 at o2ib
>> Oct 10 08:07:12 oss-1 kernel: Lustre: lfs01-OST0003: received MDS  
>> connection from 192.168.30.101 at o2ib
>> LDISKFS-fs: file extents enabled
>> LDISKFS-fs: mballoc enabled
>> Lustre: lfs01-OST0004: new disk, initializing
>> Oct 10 08:07:14 oss-1 kernel: kjournald starting.  Commit  
>> intervLustre: Server lfs01-OST0004 on device /dev/md16 has started
>> al 5 seconds
>> Oct 10 08:07:14 oss-1 kernel: LDISKFS FS on md16, external journal  
>> on md26
>> Oct 10 08:07:14 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
>> journal data mode.
>> Oct 10 08:07:14 oss-1 kernel: kjournald starting.  Commit interval  
>> 5 seconds
>> Oct 10 08:07:14 oss-1 kernel: LDISKFS FS on md16, external journal  
>> on md26
>> Oct 10 08:07:14 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
>> journal data mode.
>> Oct 10 08:07:14 oss-1 kernel: LDISKFS-fs: file extents enabled
>> Oct 10 08:07:14 oss-1 kernel: LDISKFS-fs: mballoc enabled
>> Oct 10 08:07:14 oss-1 kernel: Lustre: lfs01-OST0004: new disk,  
>> initializing
>> Oct 10 08:07:14 oss-1 kernel: Lustre: OST lfs01-OST0004 now serving  
>> dev (lfs01-OST0004/661dcb52-7ef9-8274-45d7-4441e36410d1) with  
>> recovery enabled
>> Oct 10 08:07:14 oss-1 kernel: Lustre: Server lfs01-OST0004 on  
>> device /dev/md16 has started
>> Lustre: lfs01-OST0004: received MDS connection from  
>> 192.168.30.101 at o2ib
>> Oct 10 08:07:18 oss-1 kernel: Lustre: lfs01-OST0004: received MDS  
>> connection from 192.168.30.101 at o2ib
>> LDISKFS-fs: file extents enabled
>> LDISKFS-fs: mballoc enabled
>> Lustre: lfs01-OST0005: new disk, initializing
>> Lustre: Server lfs01-OST0005 on device /dev/md17 has started
>> Oct 10 08:07:19 oss-1 kernel: kjournald starting.  Commit interval  
>> 5 seconds
>> Oct 10 08:07:19 oss-1 kernel: LDISKFS FS on md17, external journal  
>> on md27
>> Oct 10 08:07:19 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
>> journal data mode.
>> Oct 10 08:07:19 oss-1 kernel: kjournald starting.  Commit interval  
>> 5 seconds
>> Oct 10 08:07:19 oss-1 kernel: LDISKFS FS on md17, external journal  
>> on md27
>> Oct 10 08:07:19 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
>> journal data mode.
>> Oct 10 08:07:19 oss-1 kernel: LDISKFS-fs: file extents enabled
>> Oct 10 08:07:20 oss-1 kernel: LDISKFS-fs: mballoc enabled
>> Oct 10 08:07:20 oss-1 kernel: Lustre: lfs01-OST0005: new disk,  
>> initializing
>> Oct 10 08:07:20 oss-1 kernel: Lustre: OST lfs01-OST0005 now serving  
>> dev (lfs01-OST0005/978ba68c-0ba7-9ac7-439f-964ca7bf86a3) with  
>> recovery enabled
>> Oct 10 08:07:20 oss-1 kernel: Lustre: Server lfs01-OST0005 on  
>> device /dev/md17 has started
>> Lustre: lfs01-OST0005: received MDS connection from  
>> 192.168.30.101 at o2ib
>> Oct 10 08:07:25 oss-1 kernel: Lustre: lfs01-OST0005: received MDS  
>> connection from 192.168.30.101 at o2ib
>> Oct 10 08:45:00 oss-1 faultmond: 17:Polling all 48 slots for drive  
>> fault
>> Oct 10 08:45:06 oss-1 faultmond: Polling cycle 17 is complete
>> Oct 10 09:45:06 oss-1 faultmond: 18:Polling all 48 slots for drive  
>> fault
>> Oct 10 09:45:12 oss-1 faultmond: Polling cycle 18 is complete
>> Oct 10 10:45:12 oss-1 faultmond: 19:Polling all 48 slots for drive  
>> fault
>> Oct 10 10:45:17 oss-1 faultmond: Polling cycle 19 is complete
>>
>> LustreError: 18732:0:(lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- 
>> OST0001: slow setattr 85s
>> Oct 10 10:48:14 oss-1 kernel: LustreError: 18732:0:(lustre_fsfilt.h: 
>> 312:fsfilt_setattr()) lfs01-OST0001: slow setattr 85s
>> Oct 10 11:45:17 oss-1 faultmond: 20:Polling all 48 slots for drive  
>> fault
>> Oct 10 11:45:25 oss-1 faultmond: Polling cycle 20 is complete
>> Oct 10 12:45:25 oss-1 faultmond: 21:Polling all 48 slots for drive  
>> fault
>> Oct 10 12:45:33 oss-1 faultmond: Polling cycle 21 is complete
>> Lustre: 18805:0:(lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- 
>> OST0005: slow setattr 33s
>> Oct 10 13:14:46 oss-1 kernel: Lustre: 18805:0:(lustre_fsfilt.h: 
>> 312:fsfilt_setattr()) lfs01-OST0005: slow setattr 33s
>> Lustre: 18794:0:(lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- 
>> OST0000: slow setattr 43s
>> Oct 10 13:15:03 oss-1 kernel: Lustre: 18794:0:(lustre_fsfilt.h: 
>> 312:fsfilt_setattr()) lfs01-OST0000: slow setattr 43s
>> Lustre: 18815:0:(lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- 
>> OST0004: slow setattr 40s
>> Oct 10 13:15:13 oss-1 kernel: Lustre: 18815:0:(lustre_fsfilt.h: 
>> 312:fsfilt_setattr()) lfs01-OST0004: slow setattr 40s
>> Lustre: 18809:0:(filter_io_26.c:700:filter_commitrw_write()) lfs01- 
>> OST0003: slow i_mutex 31s
>> Lustre: 18753:0:(filter_io_26.c:700:filter_commitrw_write()) lfs01- 
>> OST0003: slow i_mutex 31s
>> Oct 10 13:15:25 oss-1 kernel: Lustre: 18809:0:(filter_io_26.c: 
>> 700:filter_commitrw_write()) lfs01-OST0003: slow i_mutex 31s
>> Oct 10 13:15:25 oss-1 kernel: Lustre: 18753:0:(filter_io_26.c: 
>> 700:filter_commitrw_write()) lfs01-OST0003: slow i_mutex 31s
>> Lustre: 18768:0:(filter_io_26.c:700:filter_commitrw_write()) lfs01- 
>> OST0002: slow i_mutex 34s
>> Lustre: 18768:0:(filter_io_26.c:700:filter_commitrw_write())  
>> Skipped 2 previous similar messages
>> Oct 10 13:15:28 oss-1 kernel: Lustre: 18768:0:(filter_io_26.c: 
>> 700:filter_commitrw_write()) lfs01-OST0002: slow i_mutex 34s
>> Oct 10 13:15:28 oss-1 kernel: Lustre: 18768:0:(filter_io_26.c: 
>> 700:filter_commitrw_write()) Skipped 2 previous similar messages
>> Lustre: 18833:0:(filter_io_26.c:700:filter_commitrw_write()) lfs01- 
>> OST0001: slow i_mutex 37s
>> Oct 10 13:15:31 oss-1 kernel: Lustre: 18833:0:(filter_io_26.c: 
>> 700:filter_commitrw_write()) lfs01-OST0001: slow i_mutex 37s
>> Lustre: 18812:0:(filter_io_26.c:700:filter_commitrw_write()) lfs01- 
>> OST0002: slow i_mutex 40s
>> Lustre: 18844:0:(filter_io_26.c:765:filter_commitrw_write()) lfs01- 
>> OST0003: slow direct_io 40s
>> Oct 10 13:15:34 oss-1 kernel: Lustre: 18812:0:(filter_io_26.c: 
>> 700:filter_commitrw_write()) lfs01-OST0002: slow i_mutex 40s
>> Oct 10 13:15:34 oss-1 kernel: Lustre: 18844:0:(filter_io_26.c: 
>> 765:filter_commitrw_write()) lfs01-OST0003: slow direct_io 40s
>> Lustre: 18741:0:(lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- 
>> OST0001: slow setattr 41s
>> Lustre: 18849:0:(filter_io_26.c:765:filter_commitrw_write()) lfs01- 
>> OST0001: slow direct_io 31s
>> Oct 10 13:15:35 oss-1 kernel: Lustre: 18741:0:(lustre_fsfilt.h: 
>> 312:fsfilt_setattr()) lfs01-OST0001: slow setattr 41s
>> Oct 10 13:15:35 oss-1 kernel: Lustre: 18849:0:(filter_io_26.c: 
>> 765:filter_commitrw_write()) lfs01-OST0001: slow direct_io 31s
>> LustreError: 18765:0:(lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- 
>> OST0002: slow setattr 51s
>> Oct 10 13:15:38 oss-1 kernel: LustreError: 18765:0:(lustre_fsfilt.h: 
>> 312:fsfilt_setattr()) lfs01-OST0002: slow setattr 51s
>> Lustre: 18756:0:(filter_io_26.c:700:filter_commitrw_write()) lfs01- 
>> OST0002: slow i_mutex 45s
>> Oct 10 13:15:39 oss-1 kernel: Lustre: 18756:0:(filter_io_26.c: 
>> 700:filter_commitrw_write()) lfs01-OST0002: slow i_mutex 45s
>> Oct 10 13:45:33 oss-1 faultmond: 22:Polling all 48 slots for drive  
>> fault
>> Oct 10 13:45:41 oss-1 faultmond: Polling cycle 22 is complete
>> Oct 10 14:45:41 oss-1 faultmond: 23:Polling all 48 slots for drive  
>> fault
>> Oct 10 14:45:49 oss-1 faultmond: Polling cycle 23 is complete
>> Lustre: 18740:0:(lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- 
>> OST0000: slow setattr 38s
>> Oct 10 15:40:41 oss-1 kernel: Lustre: 18740:0:(lustre_fsfilt.h: 
>> 312:fsfilt_setattr()) lfs01-OST0000: slow setattr 38s
>> LustreError: 18830:0:(lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- 
>> OST0004: slow setattr 60s
>> Lustre: 18767:0:(lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- 
>> OST0005: slow setattr 38s
>> Oct 10 15:41:13 oss-1 kernel: LustreError: 18830:0:(lustre_fsfilt.h: 
>> 312:fsfilt_setattr()) lfs01-OST0004: slow setattr 60s
>> Oct 10 15:41:13 oss-1 kernel: Lustre: 18767:0:(lustre_fsfilt.h: 
>> 312:fsfilt_setattr()) lfs01-OST0005: slow setattr 38s
>> Lustre: 18796:0:(lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- 
>> OST0001: slow setattr 44s
>> Oct 10 15:41:20 oss-1 kernel: Lustre: 18796:0:(lustre_fsfilt.h: 
>> 312:fsfilt_setattr()) lfs01-OST0001: slow setattr 44s
>> LustreError: 18831:0:(lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- 
>> OST0002: slow setattr 62s
>> Oct 10 15:41:21 oss-1 kernel: LustreError: 18831:0:(lustre_fsfilt.h: 
>> 312:fsfilt_setattr()) lfs01-OST0002: slow setattr 62s
>> Oct 10 15:45:49 oss-1 faultmond: 24:Polling all 48 slots for drive  
>> fault
>> Oct 10 15:45:58 oss-1 faultmond: Polling cycle 24 is complete
>> Oct 10 16:45:58 oss-1 faultmond: 25:Polling all 48 slots for drive  
>> fault
>> Oct 10 16:46:06 oss-1 faultmond: Polling cycle 25 is complete
>> Oct 10 17:46:06 oss-1 faultmond: 26:Polling all 48 slots for drive  
>> fault
>> Oct 10 17:46:15 oss-1 faultmond: Polling cycle 26 is complete
>> Lustre: 18741:0:(lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- 
>> OST0000: slow setattr 41s
>> Lustre: 18726:0:(service.c:918:ptlrpc_server_handle_req_in()) @@@  
>> Slow req_in handling 7s  req at 00000101e8f1de00 x15789/t0 o13-><?>@<? 
>>     
>>> :0/0 lens 128/0 e 0 to 0 dl 0 ref 1 fl New:/0/0 rc 0/0
>>>       
>> Lustre: 18726:0:(service.c:918:ptlrpc_server_handle_req_in()) @@@  
>> Slow req_in handling 7s  req at 00000101e8f1da00 x15790/t0 o13-><?>@<? 
>>     
>>> :0/0 lens 128/0 e 0 to 0 dl 0 ref 1 fl New:/0/0 rc 0/0
>>>       
>> Lustre: 18726:0:(service.c:918:ptlrpc_server_handle_req_in())  
>> Skipped 3 previous similar messages
>> Lustre: 18764:0:(lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- 
>> OST0004: slow setattr 40s
>> Oct 10 18:06:33 oss-1 kernel: Lustre: 18741:0:(lustre_fsfilt.h: 
>> 312:fsfilt_setattr()) lfs01-OST0000: slow setattr 41s
>> Oct 10 18:06:33 oss-1 kernel: Lustre: 18726:0:(service.c: 
>> 918:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 7s   
>> req at 00000101e8f1de00 x15789/t0 o13-><?>@<?>:0/0 lens 128/0 e 0 to 0  
>> dl 0 ref 1 fl New:/0/0 rc 0/0
>> Oct 10 18:06:33 oss-1 kernel: Lustre: 18726:0:(service.c: 
>> 918:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 7s   
>> req at 00000101e8f1da00 x15790/t0 o13-><?>@<?>:0/0 lens 128/0 e 0 to 0  
>> dl 0 ref 1 fl New:/0/0 rc 0/0
>> Lustre: 18845:0:(lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- 
>> OST0002: slow setattr 44s
>> Lustre: 18579:0:(service.c:918:ptlrpc_server_handle_req_in()) @@@  
>> Slow req_in handling 14s  req at 00000103f8dabe00 x7271650/t0 o103-><? 
>>     
>>> @<?>:0/0 lens 232/0 e 0 to 0 dl 0 ref 1 fl New:/0/0 rc 0/0
>>>       
>> Oct 10 18:06:54 oss-1 kernel: Lustre: 18726:0:(service.c: 
>> 918:ptlrpc_server_handle_req_in()) Skipped 3 previous similar messages
>> Oct 10 18:06:54 oss-1 kernel: Lustre: 18764:0:(lustre_fsfilt.h: 
>> 312:fsfilt_setattr()) lfs01-OST0004: slow setattr 40s
>> Oct 10 18:06:54 oss-1 kernel: Lustre: 18845:0:(lustre_fsfilt.h: 
>> 312:fsfilt_setattr()) lfs01-OST0002: slow setattr 44s
>> Oct 10 18:06:54 oss-1 kernel: Lustre: 18579:0:(service.c: 
>> 918:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 14s   
>> req at 00000103f8dabe00 x7271650/t0 o103-><?>@<?>:0/0 lens 232/0 e 0  
>> to 0 dl 0 ref 1 fl New:/0/0 rc 0/0
>> Lustre: 18766:0:(lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- 
>> OST0005: slow setattr 32s
>> Lustre: 18766:0:(lustre_fsfilt.h:312:fsfilt_setattr()) Skipped 1  
>> previous similar message
>> Oct 10 18:06:59 oss-1 kernel: Lustre: 18766:0:(lustre_fsfilt.h: 
>> 312:fsfilt_setattr()) lfs01-OST0005: slow setattr 32s
>> Oct 10 18:06:59 oss-1 kernel: Lustre: 18766:0:(lustre_fsfilt.h: 
>> 312:fsfilt_setattr()) Skipped 1 previous similar message
>> Lustre: 18826:0:(lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- 
>> OST0003: slow setattr 45s
>> Oct 10 18:07:04 oss-1 kernel: Lustre: 18826:0:(lustre_fsfilt.h: 
>> 312:fsfilt_setattr()) lfs01-OST0003: slow setattr 45s
>> Oct 10 18:46:15 oss-1 faultmond: 27:Polling all 48 slots for drive  
>> fault
>> ----------- [cut here ] --------- [please bite here ] ---------
>> Kernel BUG at spinlock:76
>> invalid operand: 0000 [1] SMP
>> CPU 2
>> Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U)  
>> lustre(U) lov(U) mdc(U) lquota(U) osc(U) ko2iblnd(U) ptlrpc(U)  
>> obdclass(U) lvfs(U) ldiskfs(U) lnet(U) libcfs(U) raid5(U) xor(U)  
>> parport_pc(U) lp(U) parport(U) autofs4(U) i2c_dev(U) i2c_core(U)  
>> ipmi_devintf(U) ipmi_si(U) ipmi_msghandler(U) sunrpc(U) rdma_ucm(U)  
>> qlgc_vnic(U) ib_sdp(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U)  
>> md5(U) ipv6(U) iw_cxgb3(U) cxgb3(U) ib_ipath(U) mlx4_ib(U) mlx4_core 
>> (U) ds(U) yenta_socket(U) pcmcia_core(U) dm_mirror(U) dm_multipath 
>> (U) dm_mod(U) button(U) battery(U) ac(U) joydev(U) ohci_hcd(U)  
>> ehci_hcd(U) hw_random(U) edac_mc(U) ib_mthca(U) ib_umad(U) ib_ucm 
>> (U) ib_uverbs(U) ib_cm(U) ib_sa(U) ib_mad(U) ib_core(U) e1000(U)  
>> ext3(U) jbd(U) raid1(U) mv_sata(U) sd_mod(U) scsi_mod(U)
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>     
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>   

-- 
<http://www.sun.com> 	
*Malcolm Cowe*
/Solutions Integration Engineer/

*Sun Microsystems, Inc.*
Blackness Road
Linlithgow, West Lothian EH49 7LR UK
Phone: x73602 / +44 1506 673 602
Email: Malcolm.Cowe at Sun.COM

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20081013/3439fae3/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 6g_top.gif
Type: image/gif
Size: 1257 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20081013/3439fae3/attachment.gif>


More information about the lustre-discuss mailing list