[Lustre-discuss] OSS1 Node issue

Kevin Van Maren KVanMaren at fusionio.com
Tue Feb 21 22:06:13 PST 2012


The logs you attached start sometime after the issue: to tell what happened you need to find the error in the logs before you started getting these errors:
  Feb  5 04:03:13 oss1 kernel: LustreError: 9222:0:(filter_io_26.c:693:filter_commitrw_write()) error starting transaction: rc = -30

It looks like you rebooted the server, and OST0 and OST1 were mounted, and you are NOT getting those errors any more, but both OSTs reported errors on mount.

So unmount the OSTs, and run:
  e2fsck /dev/dm-0
  e2fsck /dev/dm-1

I don't know how mangled your OSTs are, so I don't know what e2fsck will report.  See also http://wiki.lustre.org/index.php/Handling_File_System_Errors

Kevin



On Feb 21, 2012, at 10:43 PM, VIJESH EK wrote:

Dear Kevin,

Herewith i have attached the /var/log/messages , kindly go through the logs and
give me a solution for this immly.
Can u tell me How to run e2fsck for OST  ? ,
Pl tell the exact command with switch how to run e2fsck
without effecting the data.....

we are waiting for your reply.....

Thanks & Regards

VIJESH E K


On Tue, Feb 21, 2012 at 8:38 PM, Kevin Van Maren <KVanMaren at fusionio.com<mailto:KVanMaren at fusionio.com>> wrote:
This is not the correct list for help with SGE.

That being said, the real issue (as has been mentioned by several people) is that an OST has gone read-only due to some issue.  The file system will not function properly until this is resolved, irrespective of where you put SGE.

You will need to check the logs on oss1 to find the initial issue, stop the bad ost, and take corrective action (the details of which depend on the issue),

Kevin

Sent from my iPhone

On Feb 21, 2012, at 3:23 AM, "VIJESH EK" <ekvijesh at gmail.com<mailto:ekvijesh at gmail.com>> wrote:

-


We are waiting for your feedback.........

Thanks & Regards

VIJESH E K



On Tue, Feb 21, 2012 at 12:22 PM, VIJESH EK <<mailto:ekvijesh at gmail.com>ekvijesh at gmail.com<mailto:ekvijesh at gmail.com>> wrote:
Dear All,

We have done the following changes  in the exec Nodes , still now also we are
getting the same errors in /var/log/messages.

1. We have changed the exec Nodes spool directory to local directory by editing the file /home/appl/sge-root/default/common/configuration and changes the parameter  execd_spool_dir.

After changing this also the same error, i.e below mentioned error is coming in OSS1 Node. This error is generating only in the OSS1 Node.

Feb  6 18:32:10 oss1 kernel: LustreError: 9362:0:(filter_io_26.c:693:filter_commitrw_write()) error starting transaction: rc = -30
Feb  6 18:32:05 oss1 kernel: LustreError: 9422:0:(filter_io_26.c:693:filter_commitrw_write()) error starting transaction: rc = -30
Feb  6 18:32:06 oss1 kernel: LustreError: 9432:0:(filter_io_26.c:693:filter_commitrw_write()) error starting transaction: rc = -30
Feb  6 18:32:07 oss1 kernel: LustreError: 9369:0:(filter_io_26.c:693:filter_commitrw_write()) error starting transaction: rc = -30
Feb  6 18:32:10 oss1 kernel: LustreError: 9362:0:(filter_io_26.c:693:filter_commitrw_write()) error starting transaction: rc = -30


Can u tell me how to change the Master spool directory  ?
Is it possible to change the directory in live mode ?

Kindly explain briefly, so that we can proceed for the next step..


Thanks and Regards

VIJESH







On Fri, Feb 10, 2012 at 1:19 PM, Carlos Thomaz <<mailto:cthomaz at ddn.com>cthomaz at ddn.com<mailto:cthomaz at ddn.com>> wrote:
Hi vijesh.

Are you running the SGE master spooling on lustre?!?! What about the exec nodes spooling?!

I strongly recommend you to do not run the master spooling on lustre. And if possible use local spooling on local disk for the exec nodes.

SGE (át. least until version 6.2u7) is known to get unstable when running the spooling on lustre.

Carlos

On Feb 10, 2012, at 1:18 AM, "VIJESH EK" <<mailto:ekvijesh at gmail.com>ekvijesh at gmail.com<mailto:ekvijesh at gmail.com>> wrote:

Dear All,

Kindly get a solution for these below issue...........

Thanks & Regards

VIJESH E K



On Thu, Feb 9, 2012 at 3:26 PM, VIJESH EK <<mailto:ekvijesh at gmail.com>ekvijesh at gmail.com<mailto:ekvijesh at gmail.com>> wrote:
Dear Sir,

I am getting below mentioned error messages continuously in OSS1 Node,it causes that
sge service is not running intermittently.......


Feb  5 04:03:37 oss1 kernel: LustreError: 9193:0:(filter_io_26.c:693:filter_commitrw_write()) error starting transaction: rc = -30
Feb  5 04:03:47 oss1 kernel: LustreError: 9164:0:(filter_io_26.c:693:filter_commitrw_write()) error starting transaction: rc = -30
Feb  5 04:03:47 oss1 kernel: LustreError: 28420:0:(filter_io_26.c:693:filter_commitrw_write()) error starting transaction: rc = -30
Feb  5 04:03:48 oss1 kernel: LustreError: 9266:0:(filter_io_26.c:693:filter_commitrw_write()) error starting transaction: rc = -30
Feb  5 04:03:50 oss1 kernel: LustreError: 9200:0:(filter_io_26.c:693:filter_commitrw_write()) error starting transaction: rc = -30
Feb  5 04:03:53 oss1 kernel: LustreError: 9230:0:(filter_io_26.c:693:filter_commitrw_write()) error starting transaction: rc = -30
Feb  5 04:03:57 oss1 kernel: LustreError: 9212:0:(filter_io_26.c:693:filter_commitrw_write()) error starting transaction: rc = -30
Feb  5 04:04:03 oss1 kernel: LustreError: 9262:0:(filter_io_26.c:693:filter_commitrw_write()) error starting transaction: rc = -30
Feb  5 04:04:08 oss1 kernel: LustreError: 9162:0:(filter_io_26.c:693:filter_commitrw_write()) error starting transaction: rc = -30
Feb  5 04:04:15 oss1 kernel: LustreError: 9271:0:(filter_io_26.c:693:filter_commitrw_write()) error starting transaction: rc = -30
Feb  5 04:04:23 oss1 kernel: LustreError: 9191:0:(filter_io_26.c:693:filter_commitrw_write()) error starting transaction: rc = -30
Feb  5 04:04:32 oss1 kernel: LustreError: 9242:0:(filter_io_26.c:693:filter_commitrw_write()) error starting transaction: rc = -30


The detailed log information  i have attached herewith.. The attached file containes the /var/log/messages
continuous logs seperated by *.

So kindly give me a solution for this issue.......

Thanks & Regards

VIJESH E K





-
<ATT00001.c>



-




-


Confidentiality Notice: This e-mail message, its contents and any attachments to it are confidential to the intended recipient, and may contain information that is privileged and/or exempt from disclosure under applicable law. If you are not the intended recipient, please immediately notify the sender and destroy the original e-mail message and any attachments (and any copies that may have been made) from your system or otherwise. Any unauthorized use, copying, disclosure or distribution of this information is strictly prohibited. Email addresses that end with a “-c” identify the sender as a Fusion-io contractor.
  ­­



<newoss1messages>


Confidentiality Notice: This e-mail message, its contents and any attachments to it are confidential to the intended recipient, and may contain information that is privileged and/or exempt from disclosure under applicable law. If you are not the intended recipient, please immediately notify the sender and destroy the original e-mail message and any attachments (and any copies that may have been made) from your system or otherwise. Any unauthorized use, copying, disclosure or distribution of this information is strictly prohibited.  Email addresses that end with a “-c” identify the sender as a Fusion-io contractor.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20120221/a35391da/attachment.htm>


More information about the lustre-discuss mailing list