[Lustre-discuss] Large scale delete results in lag on clients

Thu Aug 6 13:27:26 PDT 2009

On Aug 06, 2009  15:08 -0400, Jim McCusker wrote:
> We have a 15 TB luster volume across 4 OSTs and we recently deleted over 4
> million files from it in order to free up the 80 GB MDT/MDS (going from 100%
> capacity on it to 81%. As a result, after the rm completed, there is
> significant lag on most file system operations (but fast access once it
> occurs) even after the two servers that host the targets were rebooted. It
> seems to clear up for a little while after reboot, but comes back after some
> time.
> 
> Any ideas?

The Lustre unlink processing is somewhat asynchronous, so you may still be
catching up with unlinks.  You can check this by looking at the OSS service
RPC stats file to see if there are still object destroys being processed
by the OSTs.  You could also just check the system load/io on the OSTs to
see how busy they are in a "no load" situation.

> For the curious, we host a large image archive (almost 400k images) and do
> research on processing them. We had a lot of intermediate files that we
> needed to clean up:
> 
>  http://krauthammerlab.med.yale.edu/imagefinder (currently laggy and
> unresponsive due to this problem)
> 
> Thanks,
> Jim
> --
> Jim McCusker
> Programmer Analyst
> Krauthammer Lab, Pathology Informatics
> Yale School of Medicine
> james.mccusker at yale.edu | (203) 785-6330
> http://krauthammerlab.med.yale.edu

> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.