[Lustre-discuss] Lustre 1.6.5.1 on X4200 and STK 6140 Issues
Brock Palen
brockp at umich.edu
Mon Oct 13 08:31:47 PDT 2008
I never uninstalled it (i still use some of the tools in it)
Faultmond is a service, just chkconfig it off.
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp at umich.edu
(734)936-1985
On Oct 13, 2008, at 11:03 AM, Malcolm Cowe wrote:
> Brock Palen wrote:
>>
>> I know you say the only addition was the RDAC for the MDS's I
>> assume (we use it also just fine).
> Yes, the MDS's share a STK 6140.
>> When I ran faultmond from suns dcmu rpm (RHEL 4 here) the x4500's
>> would crash like clock work ~48 hours. For a very simple bit of
>> code I was surpised that once when I forgot to turn it on when
>> working on the load this would happen. Just FYI it was unrelated
>> to lustre (using provided rpm's no kernel build) this solved my
>> problem on the x4500
> The DCMU RPM is installed. I didn't explicitly install this, so it
> must have been bundled in with the SIA CD... I'll try removing the
> rpm to see what happens. Thanks for the heads up.
>
> Regards,
>
> Malcolm.
>
>> Brock Palen www.umich.edu/~brockp Center for Advanced Computing
>> brockp at umich.edu (734)936-1985 On Oct 13, 2008, at 4:41 AM,
>> Malcolm Cowe wrote:
>>>
>>> The X4200m2 MDS systems and the X4500 OSS were rebuilt using the
>>> stock Lustre packages (Kernel + modules + userspace). With the
>>> exception of the RDAC kernel module, no additional software was
>>> applied to the systems. We recreated our volumes and ran the
>>> servers over the weekend. However, the OSS crashed about 8 hours
>>> in. The syslog output is attached to this message. Looks like it
>>> could be similar to bug #16404, which means patching and
>>> rebuilding the kernel. Given my lack of success at trying to
>>> build from source, I am again asking for some guidance on how to
>>> do this. I sent out the steps I used to try and build from source
>>> on the 7th because I was encountering problems and was unable to
>>> get a working set of packages. Included in that messages was
>>> output from quilt that implies that the kernel patching process
>>> was not working properly. Regards, Malcolm. -- <6g_top.gif>
>>> Malcolm Cowe Solutions Integration Engineer Sun Microsystems,
>>> Inc. Blackness Road Linlithgow, West Lothian EH49 7LR UK Phone:
>>> x73602 / +44 1506 673 602 Email: Malcolm.Cowe at Sun.COM
>>> <6g_top.gif> Oct 10 06:49:39 oss-1 kernel: LDISKFS FS on md15,
>>> internal journal Oct 10 06:49:39 oss-1 kernel: LDISKFS-fs:
>>> mounted filesystem with ordered data mode. Oct 10 06:53:42 oss-1
>>> kernel: kjournald starting. Commit interval 5 seconds Oct 10
>>> 06:53:42 oss-1 kernel: LDISKFS FS on md16, internal journal Oct
>>> 10 06:53:42 oss-1 kernel: LDISKFS-fs: mounted filesystem with
>>> ordered data mode. Oct 10 06:57:49 oss-1 kernel: kjournald
>>> starting. Commit interval 5 seconds Oct 10 06:57:49 oss-1 kernel:
>>> LDISKFS FS on md17, internal journal Oct 10 06:57:49 oss-1
>>> kernel: LDISKFS-fs: mounted filesystem with ordered data mode.
>>> Oct 10 07:44:55 oss-1 faultmond: 16:Polling all 48 slots for
>>> drive fault Oct 10 07:45:00 oss-1 faultmond: Polling cycle 16 is
>>> complete Oct 10 07:56:23 oss-1 kernel: Lustre: OBD class driver,
>>> info at clusterfs.com Oct 10 07:56:23 oss-LDISKFS-fs: file extents
>>> enabled1 kernel: Lustre VersionLDISKFS-fs: mballoc enabled :
>>> 1.6.5.1 Oct 10 07:56:23 oss-1 kernel: Build Version:
>>> 1.6.5.1-19691231190000-PRISTINE-.cache.OLDRPMS.
>>> 20080618230526.linux- smp-2.6.9-67.0.7.EL_lustre.
>>> 1.6.5.1.x86_64-2.6.9-67.0.7.EL_lustre. 1.6.5.1smp Oct 10 07:56:24
>>> oss-1 kernel: Lustre: Added LNI 192.168.30.111 at o2ib [8/64] Oct 10
>>> 07:56:24 oss-1 kernel: Lustre: Lustre Client File System;
>>> info at clusterfs.com Oct 10 07:56:24 oss-1 kernel: kjournald
>>> starting. Commit interval 5 seconds Oct 10 07:56:24 oss-1 kernel:
>>> LDISKFS FS on md11, external journal on md21 Oct 10 07:56:24
>>> oss-1 kernel: LDISKFS-fs: mounted filesystem with journal data
>>> mode. Oct 10 07:56:24 oss-1 kernel: kjournald starting. Commit
>>> interval 5 seconds Oct 10 07:56:24 oss-1 kernel: LDISKFS FS on
>>> md11, external journal on md21 Oct 10 07:56:24 oss-1 kernel:
>>> LDISKFS-fs: mounted filesystem with journal data mode. Oct 10
>>> 07:56:24 oss-1 kernel: LDISKFS-fs: file extents enabled Oct 10
>>> 07:56:24 oss-1 kernel: LDISKFS-fs: mballoc enabled Lustre:
>>> Request x1 sent from MGC192.168.30.101 at o2ib to NID
>>> 192.168.30.101 at o2ib 5s ago has timed out (limit 5s). Oct 10
>>> 07:56:30 oss-1 kernel: Lustre: Request x1 sent from
>>> MGC192.168.30.101 at o2ib to NID 192.168.30.101 at o2ib 5s ago has
>>> timed out (limit 5s). LustreError: 4685:0:(events.c:
>>> 55:request_out_callback()) @@@ type 4, status -113
>>> req at 00000101f8ef3200 x3/t0 o250-
>>>>
>>>> MGS at MGC192.168.30.101@o2ib_1:26/25 lens 240/400 e 0 to 5 dl
>>> 1223621815 ref 2 fl Rpc:/0/0 rc 0/0 Lustre: Request x3 sent from
>>> MGC192.168.30.101 at o2ib to NID 192.168.30.102 at o2ib 0s ago has
>>> timed out (limit 5s). LustreError: 18125:0:(obd_mount.c:
>>> 1062:server_start_targets()) Required registration failed for
>>> lfs01-OSTffff: -5 LustreError: 15f-b: Communication error with
>>> the MGS. Is the MGS running? LustreError: 18125:0:(obd_mount.c:
>>> 1597:server_fill_super()) Unable to start targets: -5
>>> LustreError: 18125:0:(obd_mount.c:1382:server_put_super()) no obd
>>> lfs01-OSTffff LustreError: 18125:0:(obd_mount.c:
>>> 119:server_deregister_mount()) lfs01-OSTffff not registered
>>> LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success) LDISKFS-fs:
>>> mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks,
>>> 0 lost LDISKFS-fs: mballoc: 0 generated and it took 0 LDISKFS-fs:
>>> mballoc: 0 preallocated, 0 discarded Oct 10 07:56:50 oss-1
>>> kernel: Lustre: Changing connection for MGC192.168.30.101 at o2ib to
>>> MGC192.1Lustre: server umount lfs01- OSTffff complete
>>> 68.30.101 at o2ib_1LustreError: 18125:0:(obd_mount.c:
>>> 1951:lustre_fill_super()) Unable to mount (-5) /
>>> 192.168.30.102 at o2ib Oct 10 07:56:50 oss-1 kernel: LustreError:
>>> 4685:0:(events.c: 55:request_out_callback()) @@@ type 4, status
>>> -113 req at 00000101f8ef3200 x3/t0 o250-
>>> >MGS at MGC192.168.30.101@o2ib_1:26/25 lens 240/400 e 0 to 5 dl
>>> 1223621815 ref 2 fl Rpc:/0/0 rc 0/0Oct 10 07:56:50 oss-1 kernel:
>>> Lustre: Request x3 sent from MGC192.168.30.101 at o2ib to NID
>>> 192.168.30.102 at o2ib 0s ago has timed out (limit 5s). Oct 10
>>> 07:56:50 oss-1 kernel: LustreError: 18125:0:(obd_mount.c:
>>> 1062:server_start_targets()) Required registration failed for
>>> lfs01- OSTffff: -5 Oct 10 07:56:50 oss-1 kernel: LustreError: 15f-
>>> b: Communication error with the MGS. Is the MGS running? Oct 10
>>> 07:56:50 oss-1 kernel: LustreError: 18125:0:(obd_mount.c:
>>> 1597:server_fill_super()) Unable to start targets: -5 Oct 10
>>> 07:56:50 oss-1 kernel: LustreError: 18125:0:(obd_mount.c:
>>> 1382:server_put_super()) no obd lfs01-OSTffff Oct 10 07:56:50
>>> oss-1 kernel: LustreError: 18125:0:(obd_mount.c:
>>> 119:server_deregister_mount()) lfs01-OSTffff not registered Oct
>>> 10 07:56:50 oss-1 kernel: LDISKFS-fs: mballoc: 0 blocks 0 reqs (0
>>> success) Oct 10 07:56:50 oss-1 kernel: LDISKFS-fs: mballoc: 0
>>> extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost Oct 10
>>> 07:56:51 oss-1 kernel: LDISKFS-fs: mballoc: 0 generated and it
>>> took 0 Oct 10 07:56:51 oss-1 kernel: LDISKFS-fs: mballoc: 0
>>> preallocated, 0 discarded Oct 10 07:56:51 oss-1 kernel: Lustre:
>>> server umount lfs01-OSTffff complete Oct 10 07:56:51 oss-1
>>> kernel: LustreError: 18125:0:(obd_mount.c: 1951:lustre_fill_super
>>> ()) Unable to mount (-5) LustreError: 6644:0:(events.c:
>>> 55:request_out_callback()) @@@ type 4, status -113
>>> req at 00000103f7a50600 x1/t0 o250-
>>>>
>>>> MGS at MGC192.168.30.101@o2ib_1:26/25 lens 240/400 e 0 to 5 dl
>>> 1223621790 ref 1 fl Complete:EX/0/0 rc -110/0 Oct 10 07:57:15
>>> oss-1 kernel: LustreError: 6644:0:(events.c:
>>> 55:request_out_callback()) @@@ type 4, status -113
>>> req at 00000103f7a50600 x1/t0 o250-
>>> >MGS at MGC192.168.30.101@o2ib_1:26/25 lens 240/400 e 0 to 5 dl
>>> 1223621790 ref 1 fl Complete:EX/0/0 rc -110/0 Oct 10 08:04:09
>>> oss-1 sshd(pam_unix)[18530]: session opened for user root by root
>>> (uid=0) LDISKFS-fs: file extents enabled LDISKFS-fs: mballoc
>>> enabled Lustre: lfs01-OST0000: new disk, initializing Lustre:
>>> Server lfs01-OST0000 on device /dev/md11 has started Oct 10
>>> 08:06:49 oss-1 kernel: kjournald starting. Commit interval 5
>>> seconds Oct 10 08:06:49 oss-1 kernel: LDISKFS FS on md11,
>>> external journal on md21 Oct 10 08:06:49 oss-1 kernel: LDISKFS-
>>> fs: mounted filesystem with journal data mode. Oct 10 08:06:49
>>> oss-1 kernel: kjournald starting. Commit interval 5 seconds Oct
>>> 10 08:06:49 oss-1 kernel: LDISKFS FS on md11, external journal on
>>> md21 Oct 10 08:06:49 oss-1 kernel: LDISKFS-fs: mounted filesystem
>>> with journal data mode. Oct 10 08:06:49 oss-1 kernel: LDISKFS-fs:
>>> file extents enabled Oct 10 08:06:49 oss-1 kernel: LDISKFS-fs:
>>> mballoc enabled Oct 10 08:06:49 oss-1 kernel: Lustre: Filtering
>>> OBD driver; info at clusterfs.com Oct 10 08:06:49 oss-1 kernel:
>>> Lustre: lfs01-OST0000: new disk, initializing Oct 10 08:06:49
>>> oss-1 kernel: Lustre: OST lfs01-OST0000 now serving dev (lfs01-
>>> OST0000/ccc68ac6-5b58-acd6-455b-2df9d2980009) with recovery
>>> enabled Oct 10 08:06:49 oss-1 kernel: Lustre: Server lfs01-
>>> OST0000 on device /dev/md11 has started Lustre: lfs01-OST0000:
>>> received MDS connection from 192.168.30.101 at o2ib Oct 10 08:06:54
>>> oss-1 kernel: Lustre: lfs01-OST0000: received MDS connection from
>>> 192.168.30.101 at o2ib LDISKFS-fs: file extents enabled LDISKFS-fs:
>>> mballoc enabled Lustre: lfs01-OST0001: new disk, initializing
>>> Lustre: Server lfs01-OST0001 on device /dev/md12 has started Oct
>>> 10 08:06:56 oss-1 kernel: kjournald starting. Commit interval 5
>>> seconds Oct 10 08:06:56 oss-1 kernel: LDISKFS FS on md12,
>>> external journal on md22 Oct 10 08:06:56 oss-1 kernel: LDISKFS-
>>> fs: mounted filesystem with journal data mode. Oct 10 08:06:56
>>> oss-1 kernel: kjournald starting. Commit interval 5 seconds Oct
>>> 10 08:06:56 oss-1 kernel: LDISKFS FS on md12, external journal on
>>> md22 Oct 10 08:06:56 oss-1 kernel: LDISKFS-fs: mounted filesystem
>>> with journal data mode. Oct 10 08:06:56 oss-1 kernel: LDISKFS-fs:
>>> file extents enabled Oct 10 08:06:56 oss-1 kernel: LDISKFS-fs:
>>> mballoc enabled Oct 10 08:06:56 oss-1 kernel: Lustre: lfs01-
>>> OST0001: new disk, initializing Oct 10 08:06:56 oss-1 kernel:
>>> Lustre: OST lfs01-OST0001 now serving dev (lfs01-OST0001/b2122e87-
>>> be36-bd1a-4e40-fdd41e626d0b) with recovery enabled Oct 10
>>> 08:06:56 oss-1 kernel: Lustre: Server lfs01-OST0001 on device /
>>> dev/md12 has started Lustre: lfs01-OST0001: received MDS
>>> connection from 192.168.30.101 at o2ib Oct 10 08:07:01 oss-1 kernel:
>>> Lustre: lfs01-OST0001: received MDS connection from
>>> 192.168.30.101 at o2ib LDISKFS-fs: file extents enabled LDISKFS-fs:
>>> mballoc enabled Lustre: lfs01-OST0002: new disk, initializing
>>> Lustre: Server lfs01-OST0002 on device /dev/md13 has started Oct
>>> 10 08:07:02 oss-1 kernel: kjournald starting. Commit interval 5
>>> seconds Oct 10 08:07:02 oss-1 kernel: LDISKFS FS on md13,
>>> external journal on md23 Oct 10 08:07:02 oss-1 kernel: LDISKFS-
>>> fs: mounted filesystem with journal data mode. Oct 10 08:07:02
>>> oss-1 kernel: kjournald starting. Commit interval 5 seconds Oct
>>> 10 08:07:02 oss-1 kernel: LDISKFS FS on md13, external journal on
>>> md23 Oct 10 08:07:02 oss-1 kernel: LDISKFS-fs: mounted filesystem
>>> with journal data mode. Oct 10 08:07:02 oss-1 kernel: LDISKFS-fs:
>>> file extents enabled Oct 10 08:07:02 oss-1 kernel: LDISKFS-fs:
>>> mballoc enabled Oct 10 08:07:02 oss-1 kernel: Lustre: lfs01-
>>> OST0002: new disk, initializing Oct 10 08:07:02 oss-1 kernel:
>>> Lustre: OST lfs01-OST0002 now serving dev (lfs01-
>>> OST0002/13c66dfa-47c5-b350-43e3-3c3b67c358b6) with recovery
>>> enabled Oct 10 08:07:02 oss-1 kernel: Lustre: Server lfs01-
>>> OST0002 on device /dev/md13 has started Lustre: lfs01-OST0002:
>>> received MDS connection from 192.168.30.101 at o2ib Oct 10 08:07:06
>>> oss-1 kernel: Lustre: lfs01-OST0002: received MDS connection from
>>> 192.168.30.101 at o2ib LDISKFS-fs: file extents enabled LDISKFS-fs:
>>> mballoc enabled Oct 10 08:07:08 oss-1 kernel: kjournald starting.
>>> Commit interval 5 seconds OcLustre: lfs01-OST0003: new disk,
>>> initializing t 10 08:07:08 oss-1 kernel: LDISKFS FS on md15,
>>> external journalLustre: Server lfs01-OST0003 on device /dev/md15
>>> has started on md25 Oct 10 08:07:08 oss-1 kernel: LDISKFS-fs:
>>> mounted filesystem with journal data mode. Oct 10 08:07:08 oss-1
>>> kernel: kjournald starting. Commit interval 5 seconds Oct 10
>>> 08:07:08 oss-1 kernel: LDISKFS FS on md15, external journal on
>>> md25 Oct 10 08:07:08 oss-1 kernel: LDISKFS-fs: mounted filesystem
>>> with journal data mode. Oct 10 08:07:08 oss-1 kernel: LDISKFS-fs:
>>> file extents enabled Oct 10 08:07:08 oss-1 kernel: LDISKFS-fs:
>>> mballoc enabled Oct 10 08:07:08 oss-1 kernel: Lustre: lfs01-
>>> OST0003: new disk, initializing Oct 10 08:07:08 oss-1 kernel:
>>> Lustre: OST lfs01-OST0003 now serving dev (lfs01-OST0003/
>>> d6fd7a9d-3bb8-ae05-41ed-bbfb1b6b0303) with recovery enabled Oct
>>> 10 08:07:08 oss-1 kernel: Lustre: Server lfs01-OST0003 on device /
>>> dev/md15 has started Lustre: lfs01-OST0003: received MDS
>>> connection from 192.168.30.101 at o2ib Oct 10 08:07:12 oss-1 kernel:
>>> Lustre: lfs01-OST0003: received MDS connection from
>>> 192.168.30.101 at o2ib LDISKFS-fs: file extents enabled LDISKFS-fs:
>>> mballoc enabled Lustre: lfs01-OST0004: new disk, initializing Oct
>>> 10 08:07:14 oss-1 kernel: kjournald starting. Commit
>>> intervLustre: Server lfs01-OST0004 on device /dev/md16 has
>>> started al 5 seconds Oct 10 08:07:14 oss-1 kernel: LDISKFS FS on
>>> md16, external journal on md26 Oct 10 08:07:14 oss-1 kernel:
>>> LDISKFS-fs: mounted filesystem with journal data mode. Oct 10
>>> 08:07:14 oss-1 kernel: kjournald starting. Commit interval 5
>>> seconds Oct 10 08:07:14 oss-1 kernel: LDISKFS FS on md16,
>>> external journal on md26 Oct 10 08:07:14 oss-1 kernel: LDISKFS-
>>> fs: mounted filesystem with journal data mode. Oct 10 08:07:14
>>> oss-1 kernel: LDISKFS-fs: file extents enabled Oct 10 08:07:14
>>> oss-1 kernel: LDISKFS-fs: mballoc enabled Oct 10 08:07:14 oss-1
>>> kernel: Lustre: lfs01-OST0004: new disk, initializing Oct 10
>>> 08:07:14 oss-1 kernel: Lustre: OST lfs01-OST0004 now serving
>>> dev (lfs01-OST0004/661dcb52-7ef9-8274-45d7-4441e36410d1) with
>>> recovery enabled Oct 10 08:07:14 oss-1 kernel: Lustre: Server
>>> lfs01-OST0004 on device /dev/md16 has started Lustre: lfs01-
>>> OST0004: received MDS connection from 192.168.30.101 at o2ib Oct 10
>>> 08:07:18 oss-1 kernel: Lustre: lfs01-OST0004: received MDS
>>> connection from 192.168.30.101 at o2ib LDISKFS-fs: file extents
>>> enabled LDISKFS-fs: mballoc enabled Lustre: lfs01-OST0005: new
>>> disk, initializing Lustre: Server lfs01-OST0005 on device /dev/
>>> md17 has started Oct 10 08:07:19 oss-1 kernel: kjournald
>>> starting. Commit interval 5 seconds Oct 10 08:07:19 oss-1 kernel:
>>> LDISKFS FS on md17, external journal on md27 Oct 10 08:07:19
>>> oss-1 kernel: LDISKFS-fs: mounted filesystem with journal data
>>> mode. Oct 10 08:07:19 oss-1 kernel: kjournald starting. Commit
>>> interval 5 seconds Oct 10 08:07:19 oss-1 kernel: LDISKFS FS on
>>> md17, external journal on md27 Oct 10 08:07:19 oss-1 kernel:
>>> LDISKFS-fs: mounted filesystem with journal data mode. Oct 10
>>> 08:07:19 oss-1 kernel: LDISKFS-fs: file extents enabled Oct 10
>>> 08:07:20 oss-1 kernel: LDISKFS-fs: mballoc enabled Oct 10
>>> 08:07:20 oss-1 kernel: Lustre: lfs01-OST0005: new disk,
>>> initializing Oct 10 08:07:20 oss-1 kernel: Lustre: OST lfs01-
>>> OST0005 now serving dev (lfs01-
>>> OST0005/978ba68c-0ba7-9ac7-439f-964ca7bf86a3) with recovery
>>> enabled Oct 10 08:07:20 oss-1 kernel: Lustre: Server lfs01-
>>> OST0005 on device /dev/md17 has started Lustre: lfs01-OST0005:
>>> received MDS connection from 192.168.30.101 at o2ib Oct 10 08:07:25
>>> oss-1 kernel: Lustre: lfs01-OST0005: received MDS connection from
>>> 192.168.30.101 at o2ib Oct 10 08:45:00 oss-1 faultmond: 17:Polling
>>> all 48 slots for drive fault Oct 10 08:45:06 oss-1 faultmond:
>>> Polling cycle 17 is complete Oct 10 09:45:06 oss-1 faultmond:
>>> 18:Polling all 48 slots for drive fault Oct 10 09:45:12 oss-1
>>> faultmond: Polling cycle 18 is complete Oct 10 10:45:12 oss-1
>>> faultmond: 19:Polling all 48 slots for drive fault Oct 10
>>> 10:45:17 oss-1 faultmond: Polling cycle 19 is complete
>>> LustreError: 18732:0:(lustre_fsfilt.h:312:fsfilt_setattr())
>>> lfs01- OST0001: slow setattr 85s Oct 10 10:48:14 oss-1 kernel:
>>> LustreError: 18732:0:(lustre_fsfilt.h: 312:fsfilt_setattr())
>>> lfs01-OST0001: slow setattr 85s Oct 10 11:45:17 oss-1 faultmond:
>>> 20:Polling all 48 slots for drive fault Oct 10 11:45:25 oss-1
>>> faultmond: Polling cycle 20 is complete Oct 10 12:45:25 oss-1
>>> faultmond: 21:Polling all 48 slots for drive fault Oct 10
>>> 12:45:33 oss-1 faultmond: Polling cycle 21 is complete Lustre:
>>> 18805:0:(lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- OST0005:
>>> slow setattr 33s Oct 10 13:14:46 oss-1 kernel: Lustre: 18805:0:
>>> (lustre_fsfilt.h: 312:fsfilt_setattr()) lfs01-OST0005: slow
>>> setattr 33s Lustre: 18794:0:(lustre_fsfilt.h:312:fsfilt_setattr
>>> ()) lfs01- OST0000: slow setattr 43s Oct 10 13:15:03 oss-1
>>> kernel: Lustre: 18794:0:(lustre_fsfilt.h: 312:fsfilt_setattr())
>>> lfs01-OST0000: slow setattr 43s Lustre: 18815:0:(lustre_fsfilt.h:
>>> 312:fsfilt_setattr()) lfs01- OST0004: slow setattr 40s Oct 10
>>> 13:15:13 oss-1 kernel: Lustre: 18815:0:(lustre_fsfilt.h:
>>> 312:fsfilt_setattr()) lfs01-OST0004: slow setattr 40s Lustre:
>>> 18809:0:(filter_io_26.c:700:filter_commitrw_write()) lfs01-
>>> OST0003: slow i_mutex 31s Lustre: 18753:0:(filter_io_26.c:
>>> 700:filter_commitrw_write()) lfs01- OST0003: slow i_mutex 31s Oct
>>> 10 13:15:25 oss-1 kernel: Lustre: 18809:0:(filter_io_26.c:
>>> 700:filter_commitrw_write()) lfs01-OST0003: slow i_mutex 31s Oct
>>> 10 13:15:25 oss-1 kernel: Lustre: 18753:0:(filter_io_26.c:
>>> 700:filter_commitrw_write()) lfs01-OST0003: slow i_mutex 31s
>>> Lustre: 18768:0:(filter_io_26.c:700:filter_commitrw_write())
>>> lfs01- OST0002: slow i_mutex 34s Lustre: 18768:0:(filter_io_26.c:
>>> 700:filter_commitrw_write()) Skipped 2 previous similar messages
>>> Oct 10 13:15:28 oss-1 kernel: Lustre: 18768:0:(filter_io_26.c:
>>> 700:filter_commitrw_write()) lfs01-OST0002: slow i_mutex 34s Oct
>>> 10 13:15:28 oss-1 kernel: Lustre: 18768:0:(filter_io_26.c:
>>> 700:filter_commitrw_write()) Skipped 2 previous similar messages
>>> Lustre: 18833:0:(filter_io_26.c:700:filter_commitrw_write())
>>> lfs01- OST0001: slow i_mutex 37s Oct 10 13:15:31 oss-1 kernel:
>>> Lustre: 18833:0:(filter_io_26.c: 700:filter_commitrw_write())
>>> lfs01-OST0001: slow i_mutex 37s Lustre: 18812:0:(filter_io_26.c:
>>> 700:filter_commitrw_write()) lfs01- OST0002: slow i_mutex 40s
>>> Lustre: 18844:0:(filter_io_26.c:765:filter_commitrw_write())
>>> lfs01- OST0003: slow direct_io 40s Oct 10 13:15:34 oss-1 kernel:
>>> Lustre: 18812:0:(filter_io_26.c: 700:filter_commitrw_write())
>>> lfs01-OST0002: slow i_mutex 40s Oct 10 13:15:34 oss-1 kernel:
>>> Lustre: 18844:0:(filter_io_26.c: 765:filter_commitrw_write())
>>> lfs01-OST0003: slow direct_io 40s Lustre: 18741:0:
>>> (lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- OST0001: slow
>>> setattr 41s Lustre: 18849:0:(filter_io_26.c:
>>> 765:filter_commitrw_write()) lfs01- OST0001: slow direct_io 31s
>>> Oct 10 13:15:35 oss-1 kernel: Lustre: 18741:0:(lustre_fsfilt.h:
>>> 312:fsfilt_setattr()) lfs01-OST0001: slow setattr 41s Oct 10
>>> 13:15:35 oss-1 kernel: Lustre: 18849:0:(filter_io_26.c:
>>> 765:filter_commitrw_write()) lfs01-OST0001: slow direct_io 31s
>>> LustreError: 18765:0:(lustre_fsfilt.h:312:fsfilt_setattr())
>>> lfs01- OST0002: slow setattr 51s Oct 10 13:15:38 oss-1 kernel:
>>> LustreError: 18765:0:(lustre_fsfilt.h: 312:fsfilt_setattr())
>>> lfs01-OST0002: slow setattr 51s Lustre: 18756:0:(filter_io_26.c:
>>> 700:filter_commitrw_write()) lfs01- OST0002: slow i_mutex 45s Oct
>>> 10 13:15:39 oss-1 kernel: Lustre: 18756:0:(filter_io_26.c:
>>> 700:filter_commitrw_write()) lfs01-OST0002: slow i_mutex 45s Oct
>>> 10 13:45:33 oss-1 faultmond: 22:Polling all 48 slots for drive
>>> fault Oct 10 13:45:41 oss-1 faultmond: Polling cycle 22 is
>>> complete Oct 10 14:45:41 oss-1 faultmond: 23:Polling all 48 slots
>>> for drive fault Oct 10 14:45:49 oss-1 faultmond: Polling cycle 23
>>> is complete Lustre: 18740:0:(lustre_fsfilt.h:312:fsfilt_setattr
>>> ()) lfs01- OST0000: slow setattr 38s Oct 10 15:40:41 oss-1
>>> kernel: Lustre: 18740:0:(lustre_fsfilt.h: 312:fsfilt_setattr())
>>> lfs01-OST0000: slow setattr 38s LustreError: 18830:0:
>>> (lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- OST0004: slow
>>> setattr 60s Lustre: 18767:0:(lustre_fsfilt.h:312:fsfilt_setattr
>>> ()) lfs01- OST0005: slow setattr 38s Oct 10 15:41:13 oss-1
>>> kernel: LustreError: 18830:0:(lustre_fsfilt.h: 312:fsfilt_setattr
>>> ()) lfs01-OST0004: slow setattr 60s Oct 10 15:41:13 oss-1 kernel:
>>> Lustre: 18767:0:(lustre_fsfilt.h: 312:fsfilt_setattr()) lfs01-
>>> OST0005: slow setattr 38s Lustre: 18796:0:(lustre_fsfilt.h:
>>> 312:fsfilt_setattr()) lfs01- OST0001: slow setattr 44s Oct 10
>>> 15:41:20 oss-1 kernel: Lustre: 18796:0:(lustre_fsfilt.h:
>>> 312:fsfilt_setattr()) lfs01-OST0001: slow setattr 44s
>>> LustreError: 18831:0:(lustre_fsfilt.h:312:fsfilt_setattr())
>>> lfs01- OST0002: slow setattr 62s Oct 10 15:41:21 oss-1 kernel:
>>> LustreError: 18831:0:(lustre_fsfilt.h: 312:fsfilt_setattr())
>>> lfs01-OST0002: slow setattr 62s Oct 10 15:45:49 oss-1 faultmond:
>>> 24:Polling all 48 slots for drive fault Oct 10 15:45:58 oss-1
>>> faultmond: Polling cycle 24 is complete Oct 10 16:45:58 oss-1
>>> faultmond: 25:Polling all 48 slots for drive fault Oct 10
>>> 16:46:06 oss-1 faultmond: Polling cycle 25 is complete Oct 10
>>> 17:46:06 oss-1 faultmond: 26:Polling all 48 slots for drive fault
>>> Oct 10 17:46:15 oss-1 faultmond: Polling cycle 26 is complete
>>> Lustre: 18741:0:(lustre_fsfilt.h:312:fsfilt_setattr()) lfs01-
>>> OST0000: slow setattr 41s Lustre: 18726:0:(service.c:
>>> 918:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 7s
>>> req at 00000101e8f1de00 x15789/t0 o13-><?>@<?
>>>>
>>>> :0/0 lens 128/0 e 0 to 0 dl 0 ref 1 fl New:/0/0 rc 0/0
>>> Lustre: 18726:0:(service.c:918:ptlrpc_server_handle_req_in()) @@@
>>> Slow req_in handling 7s req at 00000101e8f1da00 x15790/t0 o13-><?>@<?
>>>>
>>>> :0/0 lens 128/0 e 0 to 0 dl 0 ref 1 fl New:/0/0 rc 0/0
>>> Lustre: 18726:0:(service.c:918:ptlrpc_server_handle_req_in())
>>> Skipped 3 previous similar messages Lustre: 18764:0:
>>> (lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- OST0004: slow
>>> setattr 40s Oct 10 18:06:33 oss-1 kernel: Lustre: 18741:0:
>>> (lustre_fsfilt.h: 312:fsfilt_setattr()) lfs01-OST0000: slow
>>> setattr 41s Oct 10 18:06:33 oss-1 kernel: Lustre: 18726:0:
>>> (service.c: 918:ptlrpc_server_handle_req_in()) @@@ Slow req_in
>>> handling 7s req at 00000101e8f1de00 x15789/t0 o13-><?>@<?>:0/0 lens
>>> 128/0 e 0 to 0 dl 0 ref 1 fl New:/0/0 rc 0/0 Oct 10 18:06:33
>>> oss-1 kernel: Lustre: 18726:0:(service.c:
>>> 918:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 7s
>>> req at 00000101e8f1da00 x15790/t0 o13-><?>@<?>:0/0 lens 128/0 e 0 to
>>> 0 dl 0 ref 1 fl New:/0/0 rc 0/0 Lustre: 18845:0:(lustre_fsfilt.h:
>>> 312:fsfilt_setattr()) lfs01- OST0002: slow setattr 44s Lustre:
>>> 18579:0:(service.c:918:ptlrpc_server_handle_req_in()) @@@ Slow
>>> req_in handling 14s req at 00000103f8dabe00 x7271650/t0 o103-><?
>>>>
>>>> @<?>:0/0 lens 232/0 e 0 to 0 dl 0 ref 1 fl New:/0/0 rc 0/0
>>> Oct 10 18:06:54 oss-1 kernel: Lustre: 18726:0:(service.c:
>>> 918:ptlrpc_server_handle_req_in()) Skipped 3 previous similar
>>> messages Oct 10 18:06:54 oss-1 kernel: Lustre: 18764:0:
>>> (lustre_fsfilt.h: 312:fsfilt_setattr()) lfs01-OST0004: slow
>>> setattr 40s Oct 10 18:06:54 oss-1 kernel: Lustre: 18845:0:
>>> (lustre_fsfilt.h: 312:fsfilt_setattr()) lfs01-OST0002: slow
>>> setattr 44s Oct 10 18:06:54 oss-1 kernel: Lustre: 18579:0:
>>> (service.c: 918:ptlrpc_server_handle_req_in()) @@@ Slow req_in
>>> handling 14s req at 00000103f8dabe00 x7271650/t0 o103-><?>@<?>:0/0
>>> lens 232/0 e 0 to 0 dl 0 ref 1 fl New:/0/0 rc 0/0 Lustre: 18766:0:
>>> (lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- OST0005: slow
>>> setattr 32s Lustre: 18766:0:(lustre_fsfilt.h:312:fsfilt_setattr
>>> ()) Skipped 1 previous similar message Oct 10 18:06:59 oss-1
>>> kernel: Lustre: 18766:0:(lustre_fsfilt.h: 312:fsfilt_setattr())
>>> lfs01-OST0005: slow setattr 32s Oct 10 18:06:59 oss-1 kernel:
>>> Lustre: 18766:0:(lustre_fsfilt.h: 312:fsfilt_setattr()) Skipped 1
>>> previous similar message Lustre: 18826:0:(lustre_fsfilt.h:
>>> 312:fsfilt_setattr()) lfs01- OST0003: slow setattr 45s Oct 10
>>> 18:07:04 oss-1 kernel: Lustre: 18826:0:(lustre_fsfilt.h:
>>> 312:fsfilt_setattr()) lfs01-OST0003: slow setattr 45s Oct 10
>>> 18:46:15 oss-1 faultmond: 27:Polling all 48 slots for drive fault
>>> ----------- [cut here ] --------- [please bite here ] ---------
>>> Kernel BUG at spinlock:76 invalid operand: 0000 [1] SMP CPU 2
>>> Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U)
>>> lustre(U) lov(U) mdc(U) lquota(U) osc(U) ko2iblnd(U) ptlrpc(U)
>>> obdclass(U) lvfs(U) ldiskfs(U) lnet(U) libcfs(U) raid5(U) xor(U)
>>> parport_pc(U) lp(U) parport(U) autofs4(U) i2c_dev(U) i2c_core(U)
>>> ipmi_devintf(U) ipmi_si(U) ipmi_msghandler(U) sunrpc(U) rdma_ucm
>>> (U) qlgc_vnic(U) ib_sdp(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib
>>> (U) md5(U) ipv6(U) iw_cxgb3(U) cxgb3(U) ib_ipath(U) mlx4_ib(U)
>>> mlx4_core (U) ds(U) yenta_socket(U) pcmcia_core(U) dm_mirror(U)
>>> dm_multipath (U) dm_mod(U) button(U) battery(U) ac(U) joydev(U)
>>> ohci_hcd(U) ehci_hcd(U) hw_random(U) edac_mc(U) ib_mthca(U)
>>> ib_umad(U) ib_ucm (U) ib_uverbs(U) ib_cm(U) ib_sa(U) ib_mad(U)
>>> ib_core(U) e1000(U) ext3(U) jbd(U) raid1(U) mv_sata(U) sd_mod(U)
>>> scsi_mod(U) _______________________________________________
>>> Lustre-discuss mailing list Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>> _______________________________________________ Lustre-discuss
>> mailing list Lustre-discuss at lists.lustre.org http://
>> lists.lustre.org/mailman/listinfo/lustre-discuss
>
> --
> <6g_top.gif>
> Malcolm Cowe
> Solutions Integration Engineer
>
> Sun Microsystems, Inc.
> Blackness Road
> Linlithgow, West Lothian EH49 7LR UK
> Phone: x73602 / +44 1506 673 602
> Email: Malcolm.Cowe at Sun.COM
More information about the lustre-discuss
mailing list