[Lustre-discuss] lustre ram usage

Balagopal Pillai pillai at mathstat.dal.ca
Sat Dec 22 15:15:45 PST 2007


Hi,

        One of our Lustre installations have the following hardware -
one dell pe1950 with 4 MD1000 connected and have 8 OST exported. The 
second OSS server, a dell 2950  (which is also functioning as the MGS and 
MDS) also has 4 MD 1000 connected and have 8 OST exported. 16 OST from 
both OSS serve 6 volumes of about 50 TB in total. Both servers have dual 
quad core xeons and 4 GB ram. The 2950 has a second perc 5 too 
that serves MGS and MDS. 

      Now one of the volumes is getting full and the overnight rsync 
crashes the 2950 (and sometimes 1950 too), likely due to exhausting of 
ram. The crash always seems to happen about half an hour after the rsync is 
scheduled early morning. Today i manually did the rsync and found that 
it takes almost 3G of ram on the 2950 and >2G on the 1950 for the rsync to 
complete. Rsync is run from one of the compute nodes, where lustre volumes 
are mounted. I reduced the journal size on all OST to 128M as per the 
advice on one of the emails in the list, but that still doesn't reduce 
the memory consumption on the Lustre OSS when a lustre client runs an 
rsync backup. Here is the current status on 2950 -

             total       used       free     shared    buffers
cached
Mem:       4041880    3985712      56168          0     883388
53448
-/+ buffers/cache:    3048876     993004
Swap:      4096564        240    4096324


                 Even after i terminated the rsync, the used ram doesn't 
seem to get freed up. On hind sight, i could have got 16GB per OSS. But i wasn't expecting such 
high memory usage by Lustre. The version of Lustre is 1.6.3. The ram 
utilization on OSS seems to go up in the "building file list" stage in the 
rsync. Other than the occassional hang for the NFS server that re-exports 
Lustre volumes to other servers, Lustre is working fine and is stable 
when cluster is fully loaded with compute jobs. 

		Is there anything extra that can be tried (may be 
ost_num_threads for example), other than adding more ram to both OSS, that 
could solve the high memory consumption problem during rsync backup on 
the two OSS?  Thanks in advance.


Regards
Balagopal Pillai



 




More information about the lustre-discuss mailing list