[lustre-discuss] OST Mount Error

Mohr Jr, Richard Frank rmohr at utk.edu
Mon Jun 1 12:10:01 PDT 2020


It looks like the writeconf flag is set on the ost you are trying to mount.  Did you completely replace the ost with a newly formatted ost?  Or did you set the writeconf flag on the existing ost?

The writeconf flag is an indicator for lustre to regenerated configuration logs, but it needs to be regenerated on all the mdts and osts.  This is why the mds logs contained the message to writeconf the mdt first.  The luster manual contains the procedure on how to do this.

—
Rick Mohr
Senior HPC System Administrator
Joint Institute for Computational Sciences
University of Tennessee

> On Jun 1, 2020, at 12:05 PM, Quijano, Omar E. <omarq at slac.stanford.edu> wrote:
> 
> [External Email]
> 
> Dear Lustre Users,
> 
> There was an issue with a degraded volume group.
> After replacing the failed disks and mount the OST in question, I get the following error:
> 
> From OSS side:
> # mount -v -t lustre /ost_5
> arg[0] = /sbin/mount.lustre
> arg[1] = -v
> arg[2] = -o
> arg[3] = rw
> arg[4] = /dev/sdg
> arg[5] = /ost_5
> source = /dev/sdg (/dev/sdg), target = /ost_5
> options = rw
> checking for existing Lustre data: found
> Reading CONFIGS/mountdata
> mounting device /dev/sdg at /ost_5, flags=0x1000000 options=osd=osd-ldiskfs,,errors=remount-ro,mgsnode=172.21.49.70 at tcp,writeconf,param=mgsnode=172.21.49.70 at tcp,svname=ana04-OST0005,device=/dev/sdg
> mount.lustre: mount /dev/sdg at /ost_5 failed: File exists retries left: 0
> mount.lustre: mount /dev/sdg at /ost_5 failed: File exists
> 
> From the MDS Side:
>  MGS: Connection restored to 172.21.52.57 at o2ib (at 172.21.49.57 at tcp)
> Jun  1 08:52:13 kernel: [283815.063427] Lustre: MGS: Regenerating ana04-OST0005 log by user request.
> Jun  1 08:52:13  kernel: [283815.063435] Lustre: Found index 5 for ana04-OST0005, updating log
> Jun  1 08:52:13  kernel: [283815.063588] Lustre: Client log for ana04-OST0005 was not updated; writeconf the MDT first to regenerate it.
> Jun  1 08:52:16  kernel: [283818.785764] Lustre: ana04-MDT0000: Connection restored to 172.21.52.57 at o2ib (at 172.21.49.57 at tcp)
> Jun  1 08:56:56  kernel: [284098.343206] Lustre: 21769:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591026960/real 1591026960]  req at ffff8803cbb52d00 x1668283989359148/t0(0) o8->ana04-OST0004-osc-MDT0000 at 172.21.49.57@tcp:28/4 lens 520/544 e 0 to 1 dl 1591027016 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
> Jun  1 08:56:56  kernel: [284098.343214] Lustre: 21769:0:(client.c:2063:ptlrpc_expire_one_request()) Skipped 96 previous similar messages
> 
> Any input would be greatly appreciated it.
> Thank you,
>> Omar E. Quijano
> LCLS IT/Networking Department Head
> SLAC National Accelerator Laboratory 
> T: (650) 926-5436
> 
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org








More information about the lustre-discuss mailing list