[Lustre-discuss] Lustre 1.6.5.1 on X4200 and STK 6140 Issues

Brock Palen brockp at umich.edu
Mon Oct 13 06:53:45 PDT 2008


I know you say the only addition was the RDAC for the MDS's I assume  
(we use it also just fine).

When I ran faultmond from suns dcmu rpm (RHEL 4 here)  the x4500's  
would crash like clock work ~48 hours.  For a very simple bit of code  
I was surpised that once when I forgot to turn it on when working on  
the load this would happen.  Just FYI it was unrelated to lustre  
(using provided rpm's no kernel build)  this solved my problem on the  
x4500

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp at umich.edu
(734)936-1985



On Oct 13, 2008, at 4:41 AM, Malcolm Cowe wrote:

> The X4200m2 MDS systems and the X4500 OSS were rebuilt using the  
> stock Lustre packages (Kernel + modules + userspace). With the  
> exception of the RDAC kernel module, no additional software was  
> applied to the systems. We recreated our volumes and ran the  
> servers over the weekend. However, the OSS crashed about 8 hours  
> in. The syslog output is attached to this message.
>
> Looks like it could be similar to bug #16404, which means patching  
> and rebuilding the kernel. Given my lack of success at trying to  
> build from source, I am again asking for some guidance on how to do  
> this. I sent out the steps I used to try and build from source on  
> the 7th because I was encountering problems and was unable to get a  
> working set of packages. Included in that messages was output from  
> quilt that implies that the kernel patching process was not working  
> properly.
>
>
> Regards,
>
> Malcolm.
>
> -- 
> <6g_top.gif>
> Malcolm Cowe
> Solutions Integration Engineer
>
> Sun Microsystems, Inc.
> Blackness Road
> Linlithgow, West Lothian EH49 7LR UK
> Phone: x73602 / +44 1506 673 602
> Email: Malcolm.Cowe at Sun.COM
> <6g_top.gif>
> Oct 10 06:49:39 oss-1 kernel: LDISKFS FS on md15, internal journal
> Oct 10 06:49:39 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
> ordered data mode.
> Oct 10 06:53:42 oss-1 kernel: kjournald starting.  Commit interval  
> 5 seconds
> Oct 10 06:53:42 oss-1 kernel: LDISKFS FS on md16, internal journal
> Oct 10 06:53:42 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
> ordered data mode.
> Oct 10 06:57:49 oss-1 kernel: kjournald starting.  Commit interval  
> 5 seconds
> Oct 10 06:57:49 oss-1 kernel: LDISKFS FS on md17, internal journal
> Oct 10 06:57:49 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
> ordered data mode.
> Oct 10 07:44:55 oss-1 faultmond: 16:Polling all 48 slots for drive  
> fault
> Oct 10 07:45:00 oss-1 faultmond: Polling cycle 16 is complete
> Oct 10 07:56:23 oss-1 kernel: Lustre: OBD class driver,  
> info at clusterfs.com
> Oct 10 07:56:23 oss-LDISKFS-fs: file extents enabled1 kernel:
>   Lustre VersionLDISKFS-fs: mballoc enabled
> : 1.6.5.1
> Oct 10 07:56:23 oss-1 kernel:         Build Version:  
> 1.6.5.1-19691231190000-PRISTINE-.cache.OLDRPMS.20080618230526.linux- 
> smp-2.6.9-67.0.7.EL_lustre.1.6.5.1.x86_64-2.6.9-67.0.7.EL_lustre. 
> 1.6.5.1smp
> Oct 10 07:56:24 oss-1 kernel: Lustre: Added LNI 192.168.30.111 at o2ib  
> [8/64]
> Oct 10 07:56:24 oss-1 kernel: Lustre: Lustre Client File System;  
> info at clusterfs.com
> Oct 10 07:56:24 oss-1 kernel: kjournald starting.  Commit interval  
> 5 seconds
> Oct 10 07:56:24 oss-1 kernel: LDISKFS FS on md11, external journal  
> on md21
> Oct 10 07:56:24 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
> journal data mode.
> Oct 10 07:56:24 oss-1 kernel: kjournald starting.  Commit interval  
> 5 seconds
> Oct 10 07:56:24 oss-1 kernel: LDISKFS FS on md11, external journal  
> on md21
> Oct 10 07:56:24 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
> journal data mode.
> Oct 10 07:56:24 oss-1 kernel: LDISKFS-fs: file extents enabled
> Oct 10 07:56:24 oss-1 kernel: LDISKFS-fs: mballoc enabled
> Lustre: Request x1 sent from MGC192.168.30.101 at o2ib to NID  
> 192.168.30.101 at o2ib 5s ago has timed out (limit 5s).
> Oct 10 07:56:30 oss-1 kernel: Lustre: Request x1 sent from  
> MGC192.168.30.101 at o2ib to NID 192.168.30.101 at o2ib 5s ago has timed  
> out (limit 5s).
> LustreError: 4685:0:(events.c:55:request_out_callback()) @@@ type  
> 4, status -113  req at 00000101f8ef3200 x3/t0 o250- 
> >MGS at MGC192.168.30.101@o2ib_1:26/25 lens 240/400 e 0 to 5 dl  
> 1223621815 ref 2 fl Rpc:/0/0 rc 0/0
> Lustre: Request x3 sent from MGC192.168.30.101 at o2ib to NID  
> 192.168.30.102 at o2ib 0s ago has timed out (limit 5s).
> LustreError: 18125:0:(obd_mount.c:1062:server_start_targets())  
> Required registration failed for lfs01-OSTffff: -5
> LustreError: 15f-b: Communication error with the MGS.  Is the MGS  
> running?
> LustreError: 18125:0:(obd_mount.c:1597:server_fill_super()) Unable  
> to start targets: -5
> LustreError: 18125:0:(obd_mount.c:1382:server_put_super()) no obd  
> lfs01-OSTffff
> LustreError: 18125:0:(obd_mount.c:119:server_deregister_mount())  
> lfs01-OSTffff not registered
> LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success)
> LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0  
> breaks, 0 lost
> LDISKFS-fs: mballoc: 0 generated and it took 0
> LDISKFS-fs: mballoc: 0 preallocated, 0 discarded
> Oct 10 07:56:50 oss-1 kernel: Lustre: Changing connection for  
> MGC192.168.30.101 at o2ib to MGC192.1Lustre: server umount lfs01- 
> OSTffff complete
> 68.30.101 at o2ib_1LustreError: 18125:0:(obd_mount.c: 
> 1951:lustre_fill_super()) Unable to mount  (-5)
> /192.168.30.102 at o2ib
> Oct 10 07:56:50 oss-1 kernel: LustreError: 4685:0:(events.c: 
> 55:request_out_callback()) @@@ type 4, status -113   
> req at 00000101f8ef3200 x3/t0 o250->MGS at MGC192.168.30.101@o2ib_1:26/25  
> lens 240/400 e 0 to 5 dl 1223621815 ref 2 fl Rpc:/0/0 rc 0/0Oct 10  
> 07:56:50 oss-1 kernel: Lustre: Request x3 sent from  
> MGC192.168.30.101 at o2ib to NID 192.168.30.102 at o2ib 0s ago has timed  
> out (limit 5s).
> Oct 10 07:56:50 oss-1 kernel: LustreError: 18125:0:(obd_mount.c: 
> 1062:server_start_targets()) Required registration failed for lfs01- 
> OSTffff: -5
> Oct 10 07:56:50 oss-1 kernel: LustreError: 15f-b: Communication  
> error with the MGS.  Is the MGS running?
> Oct 10 07:56:50 oss-1 kernel: LustreError: 18125:0:(obd_mount.c: 
> 1597:server_fill_super()) Unable to start targets: -5
> Oct 10 07:56:50 oss-1 kernel: LustreError: 18125:0:(obd_mount.c: 
> 1382:server_put_super()) no obd lfs01-OSTffff
> Oct 10 07:56:50 oss-1 kernel: LustreError: 18125:0:(obd_mount.c: 
> 119:server_deregister_mount()) lfs01-OSTffff not registered
> Oct 10 07:56:50 oss-1 kernel: LDISKFS-fs: mballoc: 0 blocks 0 reqs  
> (0 success)
> Oct 10 07:56:50 oss-1 kernel: LDISKFS-fs: mballoc: 0 extents  
> scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost
> Oct 10 07:56:51 oss-1 kernel: LDISKFS-fs: mballoc: 0 generated and  
> it took 0
> Oct 10 07:56:51 oss-1 kernel: LDISKFS-fs: mballoc: 0 preallocated,  
> 0 discarded
> Oct 10 07:56:51 oss-1 kernel: Lustre: server umount lfs01-OSTffff  
> complete
> Oct 10 07:56:51 oss-1 kernel: LustreError: 18125:0:(obd_mount.c: 
> 1951:lustre_fill_super()) Unable to mount  (-5)
> LustreError: 6644:0:(events.c:55:request_out_callback()) @@@ type  
> 4, status -113  req at 00000103f7a50600 x1/t0 o250- 
> >MGS at MGC192.168.30.101@o2ib_1:26/25 lens 240/400 e 0 to 5 dl  
> 1223621790 ref 1 fl Complete:EX/0/0 rc -110/0
> Oct 10 07:57:15 oss-1 kernel: LustreError: 6644:0:(events.c: 
> 55:request_out_callback()) @@@ type 4, status -113   
> req at 00000103f7a50600 x1/t0 o250->MGS at MGC192.168.30.101@o2ib_1:26/25  
> lens 240/400 e 0 to 5 dl 1223621790 ref 1 fl Complete:EX/0/0 rc -110/0
> Oct 10 08:04:09 oss-1 sshd(pam_unix)[18530]: session opened for  
> user root by root(uid=0)
> LDISKFS-fs: file extents enabled
> LDISKFS-fs: mballoc enabled
> Lustre: lfs01-OST0000: new disk, initializing
> Lustre: Server lfs01-OST0000 on device /dev/md11 has started
> Oct 10 08:06:49 oss-1 kernel: kjournald starting.  Commit interval  
> 5 seconds
> Oct 10 08:06:49 oss-1 kernel: LDISKFS FS on md11, external journal  
> on md21
> Oct 10 08:06:49 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
> journal data mode.
> Oct 10 08:06:49 oss-1 kernel: kjournald starting.  Commit interval  
> 5 seconds
> Oct 10 08:06:49 oss-1 kernel: LDISKFS FS on md11, external journal  
> on md21
> Oct 10 08:06:49 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
> journal data mode.
> Oct 10 08:06:49 oss-1 kernel: LDISKFS-fs: file extents enabled
> Oct 10 08:06:49 oss-1 kernel: LDISKFS-fs: mballoc enabled
> Oct 10 08:06:49 oss-1 kernel: Lustre: Filtering OBD driver;  
> info at clusterfs.com
> Oct 10 08:06:49 oss-1 kernel: Lustre: lfs01-OST0000: new disk,  
> initializing
> Oct 10 08:06:49 oss-1 kernel: Lustre: OST lfs01-OST0000 now serving  
> dev (lfs01-OST0000/ccc68ac6-5b58-acd6-455b-2df9d2980009) with  
> recovery enabled
> Oct 10 08:06:49 oss-1 kernel: Lustre: Server lfs01-OST0000 on  
> device /dev/md11 has started
> Lustre: lfs01-OST0000: received MDS connection from  
> 192.168.30.101 at o2ib
> Oct 10 08:06:54 oss-1 kernel: Lustre: lfs01-OST0000: received MDS  
> connection from 192.168.30.101 at o2ib
> LDISKFS-fs: file extents enabled
> LDISKFS-fs: mballoc enabled
> Lustre: lfs01-OST0001: new disk, initializing
> Lustre: Server lfs01-OST0001 on device /dev/md12 has started
> Oct 10 08:06:56 oss-1 kernel: kjournald starting.  Commit interval  
> 5 seconds
> Oct 10 08:06:56 oss-1 kernel: LDISKFS FS on md12, external journal  
> on md22
> Oct 10 08:06:56 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
> journal data mode.
> Oct 10 08:06:56 oss-1 kernel: kjournald starting.  Commit interval  
> 5 seconds
> Oct 10 08:06:56 oss-1 kernel: LDISKFS FS on md12, external journal  
> on md22
> Oct 10 08:06:56 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
> journal data mode.
> Oct 10 08:06:56 oss-1 kernel: LDISKFS-fs: file extents enabled
> Oct 10 08:06:56 oss-1 kernel: LDISKFS-fs: mballoc enabled
> Oct 10 08:06:56 oss-1 kernel: Lustre: lfs01-OST0001: new disk,  
> initializing
> Oct 10 08:06:56 oss-1 kernel: Lustre: OST lfs01-OST0001 now serving  
> dev (lfs01-OST0001/b2122e87-be36-bd1a-4e40-fdd41e626d0b) with  
> recovery enabled
> Oct 10 08:06:56 oss-1 kernel: Lustre: Server lfs01-OST0001 on  
> device /dev/md12 has started
> Lustre: lfs01-OST0001: received MDS connection from  
> 192.168.30.101 at o2ib
> Oct 10 08:07:01 oss-1 kernel: Lustre: lfs01-OST0001: received MDS  
> connection from 192.168.30.101 at o2ib
> LDISKFS-fs: file extents enabled
> LDISKFS-fs: mballoc enabled
> Lustre: lfs01-OST0002: new disk, initializing
> Lustre: Server lfs01-OST0002 on device /dev/md13 has started
> Oct 10 08:07:02 oss-1 kernel: kjournald starting.  Commit interval  
> 5 seconds
> Oct 10 08:07:02 oss-1 kernel: LDISKFS FS on md13, external journal  
> on md23
> Oct 10 08:07:02 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
> journal data mode.
> Oct 10 08:07:02 oss-1 kernel: kjournald starting.  Commit interval  
> 5 seconds
> Oct 10 08:07:02 oss-1 kernel: LDISKFS FS on md13, external journal  
> on md23
> Oct 10 08:07:02 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
> journal data mode.
> Oct 10 08:07:02 oss-1 kernel: LDISKFS-fs: file extents enabled
> Oct 10 08:07:02 oss-1 kernel: LDISKFS-fs: mballoc enabled
> Oct 10 08:07:02 oss-1 kernel: Lustre: lfs01-OST0002: new disk,  
> initializing
> Oct 10 08:07:02 oss-1 kernel: Lustre: OST lfs01-OST0002 now serving  
> dev (lfs01-OST0002/13c66dfa-47c5-b350-43e3-3c3b67c358b6) with  
> recovery enabled
> Oct 10 08:07:02 oss-1 kernel: Lustre: Server lfs01-OST0002 on  
> device /dev/md13 has started
> Lustre: lfs01-OST0002: received MDS connection from  
> 192.168.30.101 at o2ib
> Oct 10 08:07:06 oss-1 kernel: Lustre: lfs01-OST0002: received MDS  
> connection from 192.168.30.101 at o2ib
> LDISKFS-fs: file extents enabled
> LDISKFS-fs: mballoc enabled
> Oct 10 08:07:08 oss-1 kernel: kjournald starting.  Commit interval  
> 5 seconds
> OcLustre: lfs01-OST0003: new disk, initializing
> t 10 08:07:08 oss-1 kernel: LDISKFS FS on md15, external  
> journalLustre: Server lfs01-OST0003 on device /dev/md15 has started
>  on md25
> Oct 10 08:07:08 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
> journal data mode.
> Oct 10 08:07:08 oss-1 kernel: kjournald starting.  Commit interval  
> 5 seconds
> Oct 10 08:07:08 oss-1 kernel: LDISKFS FS on md15, external journal  
> on md25
> Oct 10 08:07:08 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
> journal data mode.
> Oct 10 08:07:08 oss-1 kernel: LDISKFS-fs: file extents enabled
> Oct 10 08:07:08 oss-1 kernel: LDISKFS-fs: mballoc enabled
> Oct 10 08:07:08 oss-1 kernel: Lustre: lfs01-OST0003: new disk,  
> initializing
> Oct 10 08:07:08 oss-1 kernel: Lustre: OST lfs01-OST0003 now serving  
> dev (lfs01-OST0003/d6fd7a9d-3bb8-ae05-41ed-bbfb1b6b0303) with  
> recovery enabled
> Oct 10 08:07:08 oss-1 kernel: Lustre: Server lfs01-OST0003 on  
> device /dev/md15 has started
> Lustre: lfs01-OST0003: received MDS connection from  
> 192.168.30.101 at o2ib
> Oct 10 08:07:12 oss-1 kernel: Lustre: lfs01-OST0003: received MDS  
> connection from 192.168.30.101 at o2ib
> LDISKFS-fs: file extents enabled
> LDISKFS-fs: mballoc enabled
> Lustre: lfs01-OST0004: new disk, initializing
> Oct 10 08:07:14 oss-1 kernel: kjournald starting.  Commit  
> intervLustre: Server lfs01-OST0004 on device /dev/md16 has started
> al 5 seconds
> Oct 10 08:07:14 oss-1 kernel: LDISKFS FS on md16, external journal  
> on md26
> Oct 10 08:07:14 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
> journal data mode.
> Oct 10 08:07:14 oss-1 kernel: kjournald starting.  Commit interval  
> 5 seconds
> Oct 10 08:07:14 oss-1 kernel: LDISKFS FS on md16, external journal  
> on md26
> Oct 10 08:07:14 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
> journal data mode.
> Oct 10 08:07:14 oss-1 kernel: LDISKFS-fs: file extents enabled
> Oct 10 08:07:14 oss-1 kernel: LDISKFS-fs: mballoc enabled
> Oct 10 08:07:14 oss-1 kernel: Lustre: lfs01-OST0004: new disk,  
> initializing
> Oct 10 08:07:14 oss-1 kernel: Lustre: OST lfs01-OST0004 now serving  
> dev (lfs01-OST0004/661dcb52-7ef9-8274-45d7-4441e36410d1) with  
> recovery enabled
> Oct 10 08:07:14 oss-1 kernel: Lustre: Server lfs01-OST0004 on  
> device /dev/md16 has started
> Lustre: lfs01-OST0004: received MDS connection from  
> 192.168.30.101 at o2ib
> Oct 10 08:07:18 oss-1 kernel: Lustre: lfs01-OST0004: received MDS  
> connection from 192.168.30.101 at o2ib
> LDISKFS-fs: file extents enabled
> LDISKFS-fs: mballoc enabled
> Lustre: lfs01-OST0005: new disk, initializing
> Lustre: Server lfs01-OST0005 on device /dev/md17 has started
> Oct 10 08:07:19 oss-1 kernel: kjournald starting.  Commit interval  
> 5 seconds
> Oct 10 08:07:19 oss-1 kernel: LDISKFS FS on md17, external journal  
> on md27
> Oct 10 08:07:19 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
> journal data mode.
> Oct 10 08:07:19 oss-1 kernel: kjournald starting.  Commit interval  
> 5 seconds
> Oct 10 08:07:19 oss-1 kernel: LDISKFS FS on md17, external journal  
> on md27
> Oct 10 08:07:19 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
> journal data mode.
> Oct 10 08:07:19 oss-1 kernel: LDISKFS-fs: file extents enabled
> Oct 10 08:07:20 oss-1 kernel: LDISKFS-fs: mballoc enabled
> Oct 10 08:07:20 oss-1 kernel: Lustre: lfs01-OST0005: new disk,  
> initializing
> Oct 10 08:07:20 oss-1 kernel: Lustre: OST lfs01-OST0005 now serving  
> dev (lfs01-OST0005/978ba68c-0ba7-9ac7-439f-964ca7bf86a3) with  
> recovery enabled
> Oct 10 08:07:20 oss-1 kernel: Lustre: Server lfs01-OST0005 on  
> device /dev/md17 has started
> Lustre: lfs01-OST0005: received MDS connection from  
> 192.168.30.101 at o2ib
> Oct 10 08:07:25 oss-1 kernel: Lustre: lfs01-OST0005: received MDS  
> connection from 192.168.30.101 at o2ib
> Oct 10 08:45:00 oss-1 faultmond: 17:Polling all 48 slots for drive  
> fault
> Oct 10 08:45:06 oss-1 faultmond: Polling cycle 17 is complete
> Oct 10 09:45:06 oss-1 faultmond: 18:Polling all 48 slots for drive  
> fault
> Oct 10 09:45:12 oss-1 faultmond: Polling cycle 18 is complete
> Oct 10 10:45:12 oss-1 faultmond: 19:Polling all 48 slots for drive  
> fault
> Oct 10 10:45:17 oss-1 faultmond: Polling cycle 19 is complete
>
> LustreError: 18732:0:(lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- 
> OST0001: slow setattr 85s
> Oct 10 10:48:14 oss-1 kernel: LustreError: 18732:0:(lustre_fsfilt.h: 
> 312:fsfilt_setattr()) lfs01-OST0001: slow setattr 85s
> Oct 10 11:45:17 oss-1 faultmond: 20:Polling all 48 slots for drive  
> fault
> Oct 10 11:45:25 oss-1 faultmond: Polling cycle 20 is complete
> Oct 10 12:45:25 oss-1 faultmond: 21:Polling all 48 slots for drive  
> fault
> Oct 10 12:45:33 oss-1 faultmond: Polling cycle 21 is complete
> Lustre: 18805:0:(lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- 
> OST0005: slow setattr 33s
> Oct 10 13:14:46 oss-1 kernel: Lustre: 18805:0:(lustre_fsfilt.h: 
> 312:fsfilt_setattr()) lfs01-OST0005: slow setattr 33s
> Lustre: 18794:0:(lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- 
> OST0000: slow setattr 43s
> Oct 10 13:15:03 oss-1 kernel: Lustre: 18794:0:(lustre_fsfilt.h: 
> 312:fsfilt_setattr()) lfs01-OST0000: slow setattr 43s
> Lustre: 18815:0:(lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- 
> OST0004: slow setattr 40s
> Oct 10 13:15:13 oss-1 kernel: Lustre: 18815:0:(lustre_fsfilt.h: 
> 312:fsfilt_setattr()) lfs01-OST0004: slow setattr 40s
> Lustre: 18809:0:(filter_io_26.c:700:filter_commitrw_write()) lfs01- 
> OST0003: slow i_mutex 31s
> Lustre: 18753:0:(filter_io_26.c:700:filter_commitrw_write()) lfs01- 
> OST0003: slow i_mutex 31s
> Oct 10 13:15:25 oss-1 kernel: Lustre: 18809:0:(filter_io_26.c: 
> 700:filter_commitrw_write()) lfs01-OST0003: slow i_mutex 31s
> Oct 10 13:15:25 oss-1 kernel: Lustre: 18753:0:(filter_io_26.c: 
> 700:filter_commitrw_write()) lfs01-OST0003: slow i_mutex 31s
> Lustre: 18768:0:(filter_io_26.c:700:filter_commitrw_write()) lfs01- 
> OST0002: slow i_mutex 34s
> Lustre: 18768:0:(filter_io_26.c:700:filter_commitrw_write())  
> Skipped 2 previous similar messages
> Oct 10 13:15:28 oss-1 kernel: Lustre: 18768:0:(filter_io_26.c: 
> 700:filter_commitrw_write()) lfs01-OST0002: slow i_mutex 34s
> Oct 10 13:15:28 oss-1 kernel: Lustre: 18768:0:(filter_io_26.c: 
> 700:filter_commitrw_write()) Skipped 2 previous similar messages
> Lustre: 18833:0:(filter_io_26.c:700:filter_commitrw_write()) lfs01- 
> OST0001: slow i_mutex 37s
> Oct 10 13:15:31 oss-1 kernel: Lustre: 18833:0:(filter_io_26.c: 
> 700:filter_commitrw_write()) lfs01-OST0001: slow i_mutex 37s
> Lustre: 18812:0:(filter_io_26.c:700:filter_commitrw_write()) lfs01- 
> OST0002: slow i_mutex 40s
> Lustre: 18844:0:(filter_io_26.c:765:filter_commitrw_write()) lfs01- 
> OST0003: slow direct_io 40s
> Oct 10 13:15:34 oss-1 kernel: Lustre: 18812:0:(filter_io_26.c: 
> 700:filter_commitrw_write()) lfs01-OST0002: slow i_mutex 40s
> Oct 10 13:15:34 oss-1 kernel: Lustre: 18844:0:(filter_io_26.c: 
> 765:filter_commitrw_write()) lfs01-OST0003: slow direct_io 40s
> Lustre: 18741:0:(lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- 
> OST0001: slow setattr 41s
> Lustre: 18849:0:(filter_io_26.c:765:filter_commitrw_write()) lfs01- 
> OST0001: slow direct_io 31s
> Oct 10 13:15:35 oss-1 kernel: Lustre: 18741:0:(lustre_fsfilt.h: 
> 312:fsfilt_setattr()) lfs01-OST0001: slow setattr 41s
> Oct 10 13:15:35 oss-1 kernel: Lustre: 18849:0:(filter_io_26.c: 
> 765:filter_commitrw_write()) lfs01-OST0001: slow direct_io 31s
> LustreError: 18765:0:(lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- 
> OST0002: slow setattr 51s
> Oct 10 13:15:38 oss-1 kernel: LustreError: 18765:0:(lustre_fsfilt.h: 
> 312:fsfilt_setattr()) lfs01-OST0002: slow setattr 51s
> Lustre: 18756:0:(filter_io_26.c:700:filter_commitrw_write()) lfs01- 
> OST0002: slow i_mutex 45s
> Oct 10 13:15:39 oss-1 kernel: Lustre: 18756:0:(filter_io_26.c: 
> 700:filter_commitrw_write()) lfs01-OST0002: slow i_mutex 45s
> Oct 10 13:45:33 oss-1 faultmond: 22:Polling all 48 slots for drive  
> fault
> Oct 10 13:45:41 oss-1 faultmond: Polling cycle 22 is complete
> Oct 10 14:45:41 oss-1 faultmond: 23:Polling all 48 slots for drive  
> fault
> Oct 10 14:45:49 oss-1 faultmond: Polling cycle 23 is complete
> Lustre: 18740:0:(lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- 
> OST0000: slow setattr 38s
> Oct 10 15:40:41 oss-1 kernel: Lustre: 18740:0:(lustre_fsfilt.h: 
> 312:fsfilt_setattr()) lfs01-OST0000: slow setattr 38s
> LustreError: 18830:0:(lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- 
> OST0004: slow setattr 60s
> Lustre: 18767:0:(lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- 
> OST0005: slow setattr 38s
> Oct 10 15:41:13 oss-1 kernel: LustreError: 18830:0:(lustre_fsfilt.h: 
> 312:fsfilt_setattr()) lfs01-OST0004: slow setattr 60s
> Oct 10 15:41:13 oss-1 kernel: Lustre: 18767:0:(lustre_fsfilt.h: 
> 312:fsfilt_setattr()) lfs01-OST0005: slow setattr 38s
> Lustre: 18796:0:(lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- 
> OST0001: slow setattr 44s
> Oct 10 15:41:20 oss-1 kernel: Lustre: 18796:0:(lustre_fsfilt.h: 
> 312:fsfilt_setattr()) lfs01-OST0001: slow setattr 44s
> LustreError: 18831:0:(lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- 
> OST0002: slow setattr 62s
> Oct 10 15:41:21 oss-1 kernel: LustreError: 18831:0:(lustre_fsfilt.h: 
> 312:fsfilt_setattr()) lfs01-OST0002: slow setattr 62s
> Oct 10 15:45:49 oss-1 faultmond: 24:Polling all 48 slots for drive  
> fault
> Oct 10 15:45:58 oss-1 faultmond: Polling cycle 24 is complete
> Oct 10 16:45:58 oss-1 faultmond: 25:Polling all 48 slots for drive  
> fault
> Oct 10 16:46:06 oss-1 faultmond: Polling cycle 25 is complete
> Oct 10 17:46:06 oss-1 faultmond: 26:Polling all 48 slots for drive  
> fault
> Oct 10 17:46:15 oss-1 faultmond: Polling cycle 26 is complete
> Lustre: 18741:0:(lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- 
> OST0000: slow setattr 41s
> Lustre: 18726:0:(service.c:918:ptlrpc_server_handle_req_in()) @@@  
> Slow req_in handling 7s  req at 00000101e8f1de00 x15789/t0 o13-><?>@<? 
> >:0/0 lens 128/0 e 0 to 0 dl 0 ref 1 fl New:/0/0 rc 0/0
> Lustre: 18726:0:(service.c:918:ptlrpc_server_handle_req_in()) @@@  
> Slow req_in handling 7s  req at 00000101e8f1da00 x15790/t0 o13-><?>@<? 
> >:0/0 lens 128/0 e 0 to 0 dl 0 ref 1 fl New:/0/0 rc 0/0
> Lustre: 18726:0:(service.c:918:ptlrpc_server_handle_req_in())  
> Skipped 3 previous similar messages
> Lustre: 18764:0:(lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- 
> OST0004: slow setattr 40s
> Oct 10 18:06:33 oss-1 kernel: Lustre: 18741:0:(lustre_fsfilt.h: 
> 312:fsfilt_setattr()) lfs01-OST0000: slow setattr 41s
> Oct 10 18:06:33 oss-1 kernel: Lustre: 18726:0:(service.c: 
> 918:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 7s   
> req at 00000101e8f1de00 x15789/t0 o13-><?>@<?>:0/0 lens 128/0 e 0 to 0  
> dl 0 ref 1 fl New:/0/0 rc 0/0
> Oct 10 18:06:33 oss-1 kernel: Lustre: 18726:0:(service.c: 
> 918:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 7s   
> req at 00000101e8f1da00 x15790/t0 o13-><?>@<?>:0/0 lens 128/0 e 0 to 0  
> dl 0 ref 1 fl New:/0/0 rc 0/0
> Lustre: 18845:0:(lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- 
> OST0002: slow setattr 44s
> Lustre: 18579:0:(service.c:918:ptlrpc_server_handle_req_in()) @@@  
> Slow req_in handling 14s  req at 00000103f8dabe00 x7271650/t0 o103-><? 
> >@<?>:0/0 lens 232/0 e 0 to 0 dl 0 ref 1 fl New:/0/0 rc 0/0
> Oct 10 18:06:54 oss-1 kernel: Lustre: 18726:0:(service.c: 
> 918:ptlrpc_server_handle_req_in()) Skipped 3 previous similar messages
> Oct 10 18:06:54 oss-1 kernel: Lustre: 18764:0:(lustre_fsfilt.h: 
> 312:fsfilt_setattr()) lfs01-OST0004: slow setattr 40s
> Oct 10 18:06:54 oss-1 kernel: Lustre: 18845:0:(lustre_fsfilt.h: 
> 312:fsfilt_setattr()) lfs01-OST0002: slow setattr 44s
> Oct 10 18:06:54 oss-1 kernel: Lustre: 18579:0:(service.c: 
> 918:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 14s   
> req at 00000103f8dabe00 x7271650/t0 o103-><?>@<?>:0/0 lens 232/0 e 0  
> to 0 dl 0 ref 1 fl New:/0/0 rc 0/0
> Lustre: 18766:0:(lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- 
> OST0005: slow setattr 32s
> Lustre: 18766:0:(lustre_fsfilt.h:312:fsfilt_setattr()) Skipped 1  
> previous similar message
> Oct 10 18:06:59 oss-1 kernel: Lustre: 18766:0:(lustre_fsfilt.h: 
> 312:fsfilt_setattr()) lfs01-OST0005: slow setattr 32s
> Oct 10 18:06:59 oss-1 kernel: Lustre: 18766:0:(lustre_fsfilt.h: 
> 312:fsfilt_setattr()) Skipped 1 previous similar message
> Lustre: 18826:0:(lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- 
> OST0003: slow setattr 45s
> Oct 10 18:07:04 oss-1 kernel: Lustre: 18826:0:(lustre_fsfilt.h: 
> 312:fsfilt_setattr()) lfs01-OST0003: slow setattr 45s
> Oct 10 18:46:15 oss-1 faultmond: 27:Polling all 48 slots for drive  
> fault
> ----------- [cut here ] --------- [please bite here ] ---------
> Kernel BUG at spinlock:76
> invalid operand: 0000 [1] SMP
> CPU 2
> Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U)  
> lustre(U) lov(U) mdc(U) lquota(U) osc(U) ko2iblnd(U) ptlrpc(U)  
> obdclass(U) lvfs(U) ldiskfs(U) lnet(U) libcfs(U) raid5(U) xor(U)  
> parport_pc(U) lp(U) parport(U) autofs4(U) i2c_dev(U) i2c_core(U)  
> ipmi_devintf(U) ipmi_si(U) ipmi_msghandler(U) sunrpc(U) rdma_ucm(U)  
> qlgc_vnic(U) ib_sdp(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U)  
> md5(U) ipv6(U) iw_cxgb3(U) cxgb3(U) ib_ipath(U) mlx4_ib(U) mlx4_core 
> (U) ds(U) yenta_socket(U) pcmcia_core(U) dm_mirror(U) dm_multipath 
> (U) dm_mod(U) button(U) battery(U) ac(U) joydev(U) ohci_hcd(U)  
> ehci_hcd(U) hw_random(U) edac_mc(U) ib_mthca(U) ib_umad(U) ib_ucm 
> (U) ib_uverbs(U) ib_cm(U) ib_sa(U) ib_mad(U) ib_core(U) e1000(U)  
> ext3(U) jbd(U) raid1(U) mv_sata(U) sd_mod(U) scsi_mod(U)
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss




More information about the lustre-discuss mailing list