[lustre-discuss] 回复: Cannot add new OST after upgrade from 2.5.3 to 2.10.6

wanglu wanglu at ihep.ac.cn
Fri Dec 28 19:06:55 PST 2018


Hi, 

This new OSTs are formated with e2fsprogs-1.44.3.wc1-0.el7.x86_64, while the MGS and other old OSTs  are formated with e2fsprogs-1.42.12.wc1 last year, and mount with  e2fsprogs-1.44.3.wc1-0.el7.x86_64
Do we need to run writeconf on all the devices following this process?
https://lustre-discuss.lustre.narkive.com/Z5s6LU8B/lustre-2-5-2-unable-to-mount-ost 

Thanks,
Lu

====================================================================
Computing center,the Institute of High Energy Physics, CAS, China
Wang Lu                                        Tel: (+86) 10 8823 6087
P.O. Box 918-7                               Fax: (+86) 10 8823 6839
Beijing 100049  P.R. China            Email: Lu.Wang at ihep.ac.cn
===================================================================
 
From: wanglu
Date: 2018-12-28 10:45
To: lustre-discuss at lists.lustre.org
Subject: [lustre-discuss] Cannot add new OST after upgrade from 2.5.3 to 2.10.6
Hi, 

For hardware compatibiility reason, we just upgraded a 2.5.3 instance to 2.10.6.  After that, when we tried to mount a new formated OST on 2.10.6, we got  failures on OSS. Here is the symptom:
1. The ost mount operation will stuck for about 10 mins, and then we got “Is the MGS running?...” on terminal 
2. In syslog, we found
 LustreError: 166-1: MGC192.168.50.63 at tcp: Connection to MGS (at 192.168.50.63 at tcp) was lost; in progress operations using this service will fail 
   LustreError: 105461:0:(ldlm_request.c:148:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1545962328, 300s ago), entering recovery for MGS at MGC192.168.50.63@tcp_0 ns: MGC192.168.50.63 at tcp lock: ffff9ae9283b8200/0xa4c148c2f2e256b9 lrc: 4/1,0 mode: -/CR res: [0x73666361:0x0:0x0].0x0 rrc: 3 type: PLN flags: 0x1000000000000 nid: local remote: 0x38d3cf901311c189 expref: -99 pid: 105461 timeout: 0 lvb_type: 0
3. During the stuck, we can see  ll_OST_XX and lazyldiskfsinit running on the new OSS, but the obdfilter directory can not be found under /proc/fs/lustre
4. On MDS+MGS node, we got  
   " 166-1: MGC192.168.50.63 at tcp: Connection to MGS (at 0 at lo) was lost; in progress operations using this service will fail" on MGS
5. After that , other new clients cannot mount the system.  
6. It seemed the OST mount operation had caused problems on MGS, so we umounted the MDT and run e2fsck, and remount it.  
7. After that,client mount  is possible, and we got deactivate ost on "lfs df".
8. When we tried to mount the new OSS, the symptom repeat again...

Any one has a hint on this problem?

Cheers,
Lu

====================================================================
Computing center,the Institute of High Energy Physics, CAS, China
Wang Lu                                        Tel: (+86) 10 8823 6087
P.O. Box 918-7                               Fax: (+86) 10 8823 6839
Beijing 100049  P.R. China            Email: Lu.Wang at ihep.ac.cn
===================================================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20181229/7305bc43/attachment-0001.html>


More information about the lustre-discuss mailing list