<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

</head>

<body bgcolor="#ffffff" text="#000000">

The compute nodes for Redstorm are Catamount. Here at SNL we've

traditionally had a preoccupation with light weight kernels. I don't

have anything to do with these decisions or the discussion in general,

but this may finally give way now that we can get gigabytes of memory

for each processor.<br>

<br>

MLB<br>

<br>

<br>

Weikuan Yu wrote:

<blockquote cite="mid:47D03FDB.4080000@gmail.com" type="cite">

  <pre wrap="">Thanks for the information. There are choices about the stripe count,

depends on the targeted pattern.

Is redstorm running under CNL or Catamount?

--Weikuan

Marty Barnaby wrote:

  </pre>

  <blockquote type="cite">

    <pre wrap="">I had tried the Direct I/O last year and it didn't seem to be working at

the time, so I gave up and haven't been back there again.

For the file-per-processor vs. shared, I made many different benchmark

trials, but never really head-to-head. My efforts were all with our

redstorm:/scratch_grande:

/home/mlbarna> lfs getstripe -v /scratch_grande | grep ACTIVE | wc -l

320

/home/mlbarna> lfs getstripe -v /scratch_grande | grep -v ACTIVE

OBDS:

/scratch_grande/

default stripe_count: 4 stripe_size: 2097152 stripe_offset: -1

/scratch_grande/test.sh

lmm_magic:          0x0BD10BD0

lmm_object_gr:      0

lmm_object_id:      0x4e92503

lmm_stripe_count:   4

lmm_stripe_size:    2097152

lmm_stripe_pattern: 1

        obdidx           objid          objid            group

           281         2777792       0x2a62c0                0

           282         2780317       0x2a6c9d                0

           283         2778125       0x2a640d                0

           284         2778316       0x2a64cc                0

My one-file-per-processor mode was executed with a NetCDF benchmark code

someone had put together. I can't remember final numbers, or processor

count, but, at the time, we were interested in actual, scientific

computing usage patterns, so we had only an 80-400 KB range in

blocksizes, per processor, respectively, which will never demonstrate a

maximal byte-rate with a huge Lustre FS. The one point here I do know is

the performance was always highest when the directory the files were

written into was lfs setstripe with the values 0 -1 1. I found no

improvement in adjusting the stripe_size from the default 2 MB, but, for

large processor count runs, a stripe_count of 1 was patently fastest.

My maximal MPI-IO collective writing to a shared file benchmarking,

again with a simple, unique program, wrote into a directory defined with

the lfs setstripe settings 0 -1 160. I found my appex 26 GB/s running on

only 160 processors with a per-processor, respective blocksize of 20 MB.

To clarify my use of blocksize, the NetCDF trials are something like

running IOR with '-b 100m -t 80k'; and for the MPI-IO collective, I'd

have '-b100m -t 20m'. Limiting -b option is not important, one would

want it to be as large as the available memory would allow.

Both the benchmarking codes I employed differed somewhat from the

approach in IOR. They each simply malloced a single buffer of the

specified blocksize, and, after the file or files openings, iterated on

a barried loop, appending the same buffer for 'n' many rotations.

Usually, the timer is stopped as soon as the loop is exited, before the

file closings.

I recently completed some modifications for my own IOR, to execute more

like this. I moved the loop for repetitions  inside the file open and

close, and adjusted the offset to be continuous, so every blocksize of

transfers appends to the end of the still open file; then sum up the

product of the blocksize and the repetitions for the total written to

the file. I have this basically working for Posix single-shared-file,

and also PNetCDF.

MLB

Weikuan Yu wrote:

    </pre>

    <blockquote type="cite">

      <blockquote type="cite">

        <pre wrap="">What is the stripe_size of this test? 4M? If it is 4M, then

transfer_size is a little

bigger(64M). And we have seen this situation before, finally it seems

because client hold

too much lock in each write(because of lustre down-forward extent lock

policy) which might

block other client writing, so impact the parallel of the whole system.

Maybe you could try

decrease transfer size to stripe_size. Or increase stripe_size to 64M

and see how is it?

        </pre>

      </blockquote>

      <pre wrap="">Yes, the situation between shared file and separated files has been seen

before. But I have never seen an explanation regarding CNL. BTW, this

performance difference between shared/separated stays the same,

regardless what transfer size is.

Anybody wants to post a reason regarding direct I/O too?

--Weikuan

_______________________________________________

Lustre-discuss mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Lustre-discuss@lists.lustre.org">Lustre-discuss@lists.lustre.org</a>

<a class="moz-txt-link-freetext" href="http://lists.lustre.org/mailman/listinfo/lustre-discuss">http://lists.lustre.org/mailman/listinfo/lustre-discuss</a>

      </pre>

    </blockquote>

    <pre wrap="">

------------------------------------------------------------------------

_______________________________________________

Lustre-discuss mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Lustre-discuss@lists.lustre.org">Lustre-discuss@lists.lustre.org</a>

<a class="moz-txt-link-freetext" href="http://lists.lustre.org/mailman/listinfo/lustre-discuss">http://lists.lustre.org/mailman/listinfo/lustre-discuss</a>

    </pre>

  </blockquote>

  <pre wrap=""><!---->

--

Weikuan Yu <+> 1-865-574-7990

<a class="moz-txt-link-freetext" href="http://ft.ornl.gov/~wyu/">http://ft.ornl.gov/~wyu/</a>

_______________________________________________

Lustre-discuss mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Lustre-discuss@lists.lustre.org">Lustre-discuss@lists.lustre.org</a>

<a class="moz-txt-link-freetext" href="http://lists.lustre.org/mailman/listinfo/lustre-discuss">http://lists.lustre.org/mailman/listinfo/lustre-discuss</a>

  </pre>

</blockquote>

<br>

</body>

</html>