[lustre-discuss] liblustreapi.so llapi_layout_get_by_fd() taking a long time to complete

John Bauer bauerj at iodoctors.com
Tue Nov 22 12:57:23 PST 2022


Hi all,

I am making a call to *llapi_layout_get_by_fd()*  from each rank of a 16 
rank MPI job.  One rank per node.

About 75% of the time, one of the ranks, typically rank 0, takes a very 
long time to complete this call.  I have placed fprintf() calls with 
wall clock timers around the call.  If it does take a long time it is 
generally about 260 seconds.  Otherwise it takes only micro-seconds.

How I access llapi_layout_get_by_fd() :

liblustreapi = dlopen("liblustreapi.so", RTLD_LAZY ) ;
LLAPI.layout_get_by_fd = dlsym( liblustreapi, "llapi_layout_get_by_fd" ) ;

How I call llapi_layout_get_by_fd() :
if(dbg)fprintf(stderr,"%s %12.8f %s() before 
LLAPI.layout_get_by_fd()\n",host,rtc(),__func__);
    struct llapi_layout *layout = (*LLAPI.layout_get_by_fd)( fd, 0);
if(dbg)fprintf(stderr,"%s %12.8f %s() after 
LLAPI.layout_get_by_fd()\n",host,rtc(),__func__);

The resulting prints from rank 0 :

r401i2n10 7.22477698 LustreLayout_get_by_fd() before 
LLAPI.layout_get_by_fd()
r401i2n10 269.52539992 LustreLayout_get_by_fd() after 
LLAPI.layout_get_by_fd()

Any ideas on what might be triggering this.  The layout returned seems 
to be correct every time, whether it takes a long time or not.  The 
layout returned has the correct striping information, but the component 
has no OSTs as the component has yet to be instantiated for the new file.

cat /sys/fs/lustre/version

2.12.8_ddn12


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20221122/411af6d1/attachment.htm>


More information about the lustre-discuss mailing list