[Lustre-discuss] Lustre on an Altix4700

Craig Tierney Craig.Tierney at noaa.gov
Fri Jul 25 08:07:14 PDT 2008


Andreas Dilger wrote:
> On Jul 24, 2008  11:52 -0600, Craig Tierney wrote:
>> Is anyone running, or does anyone know of someone
>> running Lustre on an Altix 4700 (or other large
>> Itanium SMP system)?  I was wondering if there
>> are any quirks to getting very large aggregate
>> performance to a single node (1024+ cores).
> 
> I believe there were some patches added to CVS (not sure if they are in
> 1.6.5 or not) that addressed allocation problems with per-CPU data
> structures that were hit on 128-node system.
> 
> There are also patches in bug 11817 that are addressing issues in
> many-core SMP clients, but there is likely still work to be done in
> this area.
> 
> What kind of network do you have on such a system?  Do all of the
> cores have equal access to the external network?  If not, it would
> be good to e.g. bind the ptlrpcd thread to one of the IO nodes for
> better performance.
> 
> There hasn't been any effort yet to e.g. have multiple ptlrpcd threads
> (1 per IO node) to handle RPC requests from a thousand other cores.
> If that became a bottleneck I suspect it wouldn't be too hard to bind
> multiple ptlrpcd threads to multiple IO nodes, each having a ptlrpcd_pc
> list and ptlrpc_add_set() could get some kind of smarts about locality
> for which ptlrpcd_pc to add the outgoing request to.
> 
> There have been tests in the past to get 2GB/s+ from clients with
> good networks and 32 IA64 CPUs, but depending on what kind of throughput
> you are looking at there may still be a bunch of work to be done.
> 
> We'd be very interested to get feedback about any issues you hit on
> such a large system, because we don't get much chance to test on a
> single system with so many cores.
> 

I don't have any Altix SMP systems, but I know some others that do.
They are having some issues where a very large scratch space could
be very helpful.  We just brought in a DDN9900 with Lustre 1.6.5.1
and I have been extraordinarily happy with its performance at this point.
So I was asking to understand if the setup we have could be helpful
in other settings with the Altix systems.

thanks,
Craig


> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 


-- 
Craig Tierney (craig.tierney at noaa.gov)



More information about the lustre-discuss mailing list