[lustre-discuss] Clients hang mounting after IO node crash

Stepan Nassyr s.nassyr at fz-juelich.de
Tue May 3 13:23:57 PDT 2022


Hi Lustre community,

I am running a lustre setup on a small cluster. It has 2 nodes that can be used for IO with 36 HDDs and 4 SSDs each. All nodes are AArch64 and running Rocky Linux 8.5. I've decided on a non-HA configuration on those two nodes using lustre 2.14.56 + zfs 2.1.2 (both built from source), using the SSDs for the MDT and the HDDs for OST. On one of the nodes I dedicated some HDDs for the MGT. So the nodes are running both MDS and OSS and one of them also the MGS.

One of the nodes crashed unexpectedly. After bringing it back up, the filesystem could not be mounted on clients, hanging indefinitely. Output of dmesg on IO nodes showed that both MDTs were stuck recovering

What I have tried so far:

  1.  sudo lctl --device <mdt device> abort_recovery
  2.  The usual restarts and reboots and remounts
  3.  lfsck
  4.  tunefs --writeconf on every mdt/ost
  5.  (getting frustrated and desperate)
  6.  reformat mgs
  7.   lctl clear_conf <all mdts/osts>

I'm not sure how much worse I made the issue. The data on the filesystem was non-critical but it would take a couple of days to rebuild, so I'd rather recover.

Right now the situation is as follows:

io01 lctl dl:

[snassyr at io01 ~]$ sudo lctl dl
  0 UP osd-zfs MGS-osd MGS-osd_UUID 4
  1 UP mgs MGS MGS 20
  2 UP mgc MGC10.31.7.61 at o2ib 7fc3c479-903d-4bf5-8239-77f6bb25d72f 4
  3 UP osd-zfs storage-MDT0000-osd storage-MDT0000-osd_UUID 9
  4 UP mds MDS MDS_uuid 2
  5 UP lod storage-MDT0000-mdtlov storage-MDT0000-mdtlov_UUID 3
  6 UP mdt storage-MDT0000 storage-MDT0000_UUID 22
  7 UP mdd storage-MDD0000 storage-MDD0000_UUID 3
  8 UP qmt storage-QMT0000 storage-QMT0000_UUID 3
  9 UP osp storage-OST0000-osc-MDT0000 storage-MDT0000-mdtlov_UUID 4
 10 UP osp storage-OST0001-osc-MDT0000 storage-MDT0000-mdtlov_UUID 4
 11 UP osp storage-MDT0001-osp-MDT0000 storage-MDT0000-mdtlov_UUID 4
 12 UP lwp storage-MDT0000-lwp-MDT0000 storage-MDT0000-lwp-MDT0000_UUID 4
 13 UP osd-zfs storage-OST0000-osd storage-OST0000-osd_UUID 4
 14 UP ost OSS OSS_uuid 2
 15 UP obdfilter storage-OST0000 storage-OST0000_UUID 6
 16 UP lwp storage-MDT0000-lwp-OST0000 storage-MDT0000-lwp-OST0000_UUID 4
 17 UP lwp storage-MDT0001-lwp-OST0000 storage-MDT0001-lwp-OST0000_UUID 4

io02 lctl dl:

[snassyr at io02 ~]$ sudo lctl dl
  0 UP osd-zfs storage-MDT0001-osd storage-MDT0001-osd_UUID 8
  1 UP mgc MGC10.31.7.61 at o2ib 30fafefb-17d5-4b7e-b37d-f57b8cec706f 4
  2 UP mds MDS MDS_uuid 2
  3 UP lod storage-MDT0001-mdtlov storage-MDT0001-mdtlov_UUID 3
  4 UP mdt storage-MDT0001 storage-MDT0001_UUID 10
  5 UP mdd storage-MDD0001 storage-MDD0001_UUID 3
  6 UP osp storage-MDT0000-osp-MDT0001 storage-MDT0001-mdtlov_UUID 4
  7 UP osp storage-OST0000-osc-MDT0001 storage-MDT0001-mdtlov_UUID 4
  8 UP osp storage-OST0001-osc-MDT0001 storage-MDT0001-mdtlov_UUID 4
  9 UP lwp storage-MDT0000-lwp-MDT0001 storage-MDT0000-lwp-MDT0001_UUID 4
 10 UP osd-zfs storage-OST0001-osd storage-OST0001-osd_UUID 4
 11 UP ost OSS OSS_uuid 2
 12 UP obdfilter storage-OST0001 storage-OST0001_UUID 6
 13 UP lwp storage-MDT0000-lwp-OST0001 storage-MDT0000-lwp-OST0001_UUID 4
 14 UP lwp storage-MDT0001-lwp-OST0001 storage-MDT0001-lwp-OST0001_UUID 4

When mounting the filesystem on a client:

io01 lctl dk:

010000:00080000:44.0F:1651608945.066329:1728:7485:0:(ldlm_lib.c:1363:target_handle_connect()) MGS: connection from 2977c7ca-d420-4590-8587-6dcca277b9ce at 10.31.7.2@o2ib<mailto:2977c7ca-d420-4590-8587-6dcca277b9ce at 10.31.7.2@o2ib> t0 exp 0000000080b2f6b3 cur 7766 last 0
00000020:00000080:44.0:1651608945.066340:1904:7485:0:(genops.c:1357:class_connect()) connect: client 2977c7ca-d420-4590-8587-6dcca277b9ce, cookie 0x58c556ebddf199e6
00000020:01000000:44.0:1651608945.066343:2176:7485:0:(lprocfs_status_server.c:513:lprocfs_exp_setup()) using hash 00000000a7721ebc
00000100:00080000:44.0:1651608945.066354:1984:7485:0:(import.c:85:import_set_state_nolock()) 000000004427b505 : changing import state from RECOVER to FULL
20000000:01000000:44.0:1651608945.069929:1712:7485:0:(mgs_nids.c:636:mgs_get_ir_logs()) Reading IR log storage-cliir bufsize 1048576.
20000000:01000000:44.0:1651608945.069933:1920:7485:0:(mgs_nids.c:193:mgs_nidtbl_read()) fsname storage, entry size 32, pages 4064/1/16/255.
20000000:01000000:44.0:1651608945.069934:1920:7485:0:(mgs_nids.c:193:mgs_nidtbl_read()) fsname storage, entry size 32, pages 4032/1/16/255.
20000000:01000000:44.0:1651608945.069935:1920:7485:0:(mgs_nids.c:193:mgs_nidtbl_read()) fsname storage, entry size 32, pages 4000/1/16/255.
20000000:01000000:44.0:1651608945.069936:1920:7485:0:(mgs_nids.c:193:mgs_nidtbl_read()) fsname storage, entry size 32, pages 3968/1/16/255.
20000000:01000000:44.0:1651608945.069937:1920:7485:0:(mgs_nids.c:205:mgs_nidtbl_read()) Read IR logs storage return with 128, version 15
00000040:00080000:44.0:1651608945.070263:1936:7485:0:(llog_osd.c:233:llog_osd_read_header()) not reading header from 0-byte log
00010000:00080000:76.0F:1651608945.070500:1728:107218:0:(ldlm_lib.c:1363:target_handle_connect()) storage-MDT0000: connection from fcf3db3e-1e74-404e-9b5b-c5ddd03f655e at 10.31.7.2@o2ib<mailto:fcf3db3e-1e74-404e-9b5b-c5ddd03f655e at 10.31.7.2@o2ib> t0 exp 0000000080b2f6b3 cur 7766 last 0
00010000:00080000:76.0:1651608945.073446:1728:107218:0:(ldlm_lib.c:1363:target_handle_connect()) storage-MDT0000: connection from fcf3db3e-1e74-404e-9b5b-c5ddd03f655e at 10.31.7.2@o2ib<mailto:fcf3db3e-1e74-404e-9b5b-c5ddd03f655e at 10.31.7.2@o2ib> t0 exp 0000000080b2f6b3 cur 7766 last 0

client lctl dk:

00000080:01200004:23.0F:1651608945.064695:0:97748:0:(super25.c:114:lustre_fill_super()) VFS Op: sb 00000000671f73ed
00000020:01000004:23.0:1651608945.064703:0:97748:0:(obd_mount.c:951:lmd_print())   mount data:
00000020:01000004:23.0:1651608945.064704:0:97748:0:(obd_mount.c:953:lmd_print()) profile: storage-client
00000020:01000004:23.0:1651608945.064704:0:97748:0:(obd_mount.c:954:lmd_print()) device:  10.31.7.61 at o2ib:/storage
00000020:01000004:23.0:1651608945.064705:0:97748:0:(obd_mount.c:955:lmd_print()) flags:   2
00000080:01000004:23.0:1651608945.064705:0:97748:0:(super25.c:159:lustre_fill_super()) Mounting client storage-client
00000020:01000004:23.0:1651608945.064744:0:97748:0:(obd_mount.c:340:lustre_start_mgc()) Start MGC 'MGC10.31.7.61 at o2ib'
00000020:00000080:23.0:1651608945.064747:0:97748:0:(obd_config.c:1356:class_process_config()) processing cmd: cf005
00000020:00000080:23.0:1651608945.064749:0:97748:0:(obd_config.c:1368:class_process_config()) adding mapping from uuid MGC10.31.7.61 at o2ib_0 to nid 0x500000a1f073d (10.31.7.61 at o2ib)
00000020:01000004:23.0:1651608945.064759:0:97748:0:(obd_mount.c:191:lustre_start_simple()) Starting OBD MGC10.31.7.61 at o2ib (typ=mgc)
00000020:00000080:23.0:1651608945.064760:0:97748:0:(obd_config.c:1356:class_process_config()) processing cmd: cf001
00000020:00000080:23.0:1651608945.064771:0:97748:0:(genops.c:415:class_newdev()) Allocate new device MGC10.31.7.61 at o2ib (000000009d5385c4)
00000020:00000080:23.0:1651608945.064792:0:97748:0:(obd_config.c:648:class_attach()) OBD: dev 0 attached type mgc with refcount 1
00000020:00000080:23.0:1651608945.064793:0:97748:0:(obd_config.c:1356:class_process_config()) processing cmd: cf003
00010000:00080000:6.0F:1651608945.065940:0:97748:0:(ldlm_lib.c:115:import_set_conn()) imp 00000000d1420d43 at MGC10.31.7.61@o2ib<mailto:00000000d1420d43 at MGC10.31.7.61@o2ib>: add connection MGC10.31.7.61 at o2ib_0 at head
00000040:01000000:6.0:1651608945.065968:0:97748:0:(llog_obd.c:212:llog_setup()) obd MGC10.31.7.61 at o2ib ctxt 1 is initialized
10000000:01000000:30.0F:1651608945.066034:0:97767:0:(mgc_request.c:628:mgc_requeue_thread()) Starting requeue thread
00000020:00000080:6.0:1651608945.066039:0:97748:0:(obd_config.c:752:class_setup()) finished setup of obd MGC10.31.7.61 at o2ib (uuid 2977c7ca-d420-4590-8587-6dcca277b9ce)
00000020:00000080:6.0:1651608945.066044:0:97748:0:(genops.c:1357:class_connect()) connect: client 2977c7ca-d420-4590-8587-6dcca277b9ce, cookie 0xb433eadff9fbd3f0
00000100:00080000:6.0:1651608945.066048:0:97748:0:(import.c:533:import_select_connection()) MGC10.31.7.61 at o2ib: connect to NID 10.31.7.61 at o2ib last attempt 0
00000100:00080000:6.0:1651608945.066049:0:97748:0:(import.c:614:import_select_connection()) MGC10.31.7.61 at o2ib: import 00000000d1420d43 using connection MGC10.31.7.61 at o2ib_0/10.31.7.61 at o2ib<mailto:MGC10.31.7.61 at o2ib_0/10.31.7.61 at o2ib>
00000100:00080000:6.0:1651608945.066061:0:97748:0:(pinger.c:388:ptlrpc_pinger_add_import()) adding pingable import 2977c7ca-d420-4590-8587-6dcca277b9ce->MGS
00000080:01000000:6.0:1651608945.066081:0:97748:0:(llite_lib.c:1252:ll_fill_super()) llite sb uuid: fcf3db3e-1e74-404e-9b5b-c5ddd03f655e
10000000:01000000:6.0:1651608945.066151:0:97748:0:(mgc_request.c:2201:mgc_process_config()) parse_log storage-client from 0
10000000:01000000:6.0:1651608945.066152:0:97748:0:(mgc_request.c:334:config_log_add()) add config log storage-client-ffff910943185800
10000000:01000000:6.0:1651608945.066154:0:97748:0:(mgc_request.c:215:do_config_log_add()) do adding config log storage-sptlrpc-ffff910e650a0000
10000000:01000000:6.0:1651608945.066155:0:97748:0:(mgc_request.c:90:mgc_name2resid()) log storage-sptlrpc to resid 0x656761726f7473/0x0 (storage)
10000000:01000000:6.0:1651608945.066157:0:97748:0:(mgc_request.c:2060:mgc_process_log()) Process log storage-sptlrpc-ffff910e650a0000 from 1
10000000:01000000:6.0:1651608945.066158:0:97748:0:(mgc_request.c:1102:mgc_enqueue()) Enqueue for storage-sptlrpc (res 0x656761726f7473)
00000100:00080000:6.0:1651608945.066178:0:97748:0:(client.c:1659:ptlrpc_send_new_req()) @@@ req waiting for recovery: (FULL != CONNECTING)  req at 00000000a5867c8c x1731815472708096/t0(0) o101->MGC10.31.7.61 at o2ib@10.31.7.61 at o2ib:26/25<mailto:MGC10.31.7.61 at o2ib@10.31.7.61 at o2ib:26/25> lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:WQU/0/ffffffff rc 0/-1 job:''
00000100:00080000:20.0F:1651608945.066200:0:41661:0:(import.c:1073:ptlrpc_connect_interpret()) MGC10.31.7.61 at o2ib: connect to target with instance 0
00000100:00080000:20.0:1651608945.066205:0:41661:0:(import.c:933:ptlrpc_connect_set_flags()) MGC10.31.7.61 at o2ib: Resetting ns_connect_flags to server flags: 0xa000011001002020
10000000:01000000:20.0:1651608945.066207:0:41661:0:(mgc_request.c:1327:mgc_import_event()) import event 0x808005
00000100:00080000:20.0:1651608945.066209:0:41661:0:(import.c:85:import_set_state_nolock()) 00000000d1420d43 MGS: changing import state from CONNECTING to FULL
10000000:01000000:20.0:1651608945.066211:0:41661:0:(mgc_request.c:1327:mgc_import_event()) import event 0x808004
00000100:00080000:20.0:1651608945.066213:0:41661:0:(pinger.c:207:ptlrpc_pinger_ir_up()) IR up
00000100:00080000:20.0:1651608945.066215:0:41661:0:(recover.c:218:ptlrpc_wake_delayed()) @@@ waking (set 00000000ffe9ef2e):  req at 00000000a5867c8c x1731815472708096/t0(0) o101->MGC10.31.7.61 at o2ib@10.31.7.61 at o2ib:26/25<mailto:MGC10.31.7.61 at o2ib@10.31.7.61 at o2ib:26/25> lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:WQU/0/ffffffff rc 0/-1 job:''
10000000:01000000:6.0:1651608945.066426:0:97748:0:(mgc_request.c:2132:mgc_process_log()) MGC10.31.7.61 at o2ib: configuration from log 'storage-sptlrpc' failed (-2).
10000000:01000000:6.0:1651608945.066428:0:97748:0:(mgc_request.c:215:do_config_log_add()) do adding config log params-ffff910943185800
10000000:01000000:6.0:1651608945.066430:0:97748:0:(mgc_request.c:90:mgc_name2resid()) log params to resid 0x736d61726170/0x3 (params)
10000000:01000000:6.0:1651608945.066430:0:97748:0:(mgc_request.c:215:do_config_log_add()) do adding config log storage-client-ffff910943185800
10000000:01000000:6.0:1651608945.066431:0:97748:0:(mgc_request.c:90:mgc_name2resid()) log storage-client to resid 0x656761726f7473/0x0 (storage)
10000000:01000000:6.0:1651608945.066432:0:97748:0:(mgc_request.c:215:do_config_log_add()) do adding config log storage-cliir-ffff910943185800
10000000:01000000:6.0:1651608945.066433:0:97748:0:(mgc_request.c:90:mgc_name2resid()) log storage-cliir to resid 0x656761726f7473/0x2 (storage)
10000000:01000000:6.0:1651608945.066433:0:97748:0:(mgc_request.c:2060:mgc_process_log()) Process log storage-client-ffff910943185800 from 1
10000000:01000000:6.0:1651608945.066434:0:97748:0:(mgc_request.c:1102:mgc_enqueue()) Enqueue for storage-client (res 0x656761726f7473)
00000020:01000000:4.0F:1651608945.067039:0:97769:0:(obd_config.c:1770:class_config_llog_handler()) Marker, inst_flg=0x0 mark_flg=0x1
00000020:00000080:4.0:1651608945.067043:0:97769:0:(obd_config.c:1356:class_process_config()) processing cmd: cf010
00000020:00000080:4.0:1651608945.067045:0:97769:0:(obd_config.c:1432:class_process_config()) marker 4 (0x1) storage-clilov lov setup
00000020:01000000:4.0:1651608945.067047:0:97769:0:(obd_config.c:1885:class_config_llog_handler()) cmd cf001, instance name: storage-clilov-ffff910943185800
00000020:00000080:4.0:1651608945.067048:0:97769:0:(obd_config.c:1356:class_process_config()) processing cmd: cf001
00000020:00000080:4.0:1651608945.067052:0:97769:0:(genops.c:415:class_newdev()) Allocate new device storage-clilov-ffff910943185800 (00000000e39226db)
00000020:00000080:4.0:1651608945.067074:0:97769:0:(obd_config.c:648:class_attach()) OBD: dev 1 attached type lov with refcount 1
00000020:01000000:4.0:1651608945.067076:0:97769:0:(obd_config.c:1885:class_config_llog_handler()) cmd cf003, instance name: storage-clilov-ffff910943185800
00000020:00000080:4.0:1651608945.067077:0:97769:0:(obd_config.c:1356:class_process_config()) processing cmd: cf003
00000020:00000080:4.0:1651608945.067139:0:97769:0:(obd_config.c:752:class_setup()) finished setup of obd storage-clilov-ffff910943185800 (uuid fcf3db3e-1e74-404e-9b5b-c5ddd03f655e)
00000020:01000000:4.0:1651608945.067140:0:97769:0:(obd_config.c:1770:class_config_llog_handler()) Marker, inst_flg=0x2 mark_flg=0x2
00000020:00000080:4.0:1651608945.067141:0:97769:0:(obd_config.c:1356:class_process_config()) processing cmd: cf010
00000020:00000080:4.0:1651608945.067142:0:97769:0:(obd_config.c:1432:class_process_config()) marker 4 (0x2) storage-clilov lov setup
00000020:01000000:4.0:1651608945.067142:0:97769:0:(obd_config.c:1770:class_config_llog_handler()) Marker, inst_flg=0x0 mark_flg=0x1
00000020:00000080:4.0:1651608945.067143:0:97769:0:(obd_config.c:1356:class_process_config()) processing cmd: cf010
00000020:00000080:4.0:1651608945.067144:0:97769:0:(obd_config.c:1432:class_process_config()) marker 5 (0x1) storage-clilmv lmv setup
00000020:01000000:4.0:1651608945.067144:0:97769:0:(obd_config.c:1885:class_config_llog_handler()) cmd cf001, instance name: storage-clilmv-ffff910943185800
00000020:00000080:4.0:1651608945.067145:0:97769:0:(obd_config.c:1356:class_process_config()) processing cmd: cf001
00000020:00000080:4.0:1651608945.067147:0:97769:0:(genops.c:415:class_newdev()) Allocate new device storage-clilmv-ffff910943185800 (00000000c0d1efc4)
00000020:00000080:4.0:1651608945.067166:0:97769:0:(obd_config.c:648:class_attach()) OBD: dev 2 attached type lmv with refcount 1
00000020:01000000:4.0:1651608945.067167:0:97769:0:(obd_config.c:1885:class_config_llog_handler()) cmd cf003, instance name: storage-clilmv-ffff910943185800
00000020:00000080:4.0:1651608945.067168:0:97769:0:(obd_config.c:1356:class_process_config()) processing cmd: cf003
00000020:00000080:4.0:1651608945.067189:0:97769:0:(obd_config.c:752:class_setup()) finished setup of obd storage-clilmv-ffff910943185800 (uuid fcf3db3e-1e74-404e-9b5b-c5ddd03f655e)
00000020:01000000:4.0:1651608945.067190:0:97769:0:(obd_config.c:1770:class_config_llog_handler()) Marker, inst_flg=0x2 mark_flg=0x2
00000020:00000080:4.0:1651608945.067191:0:97769:0:(obd_config.c:1356:class_process_config()) processing cmd: cf010
00000020:00000080:4.0:1651608945.067191:0:97769:0:(obd_config.c:1432:class_process_config()) marker 5 (0x2) storage-clilmv lmv setup
00000020:01000000:4.0:1651608945.067192:0:97769:0:(obd_config.c:1770:class_config_llog_handler()) Marker, inst_flg=0x0 mark_flg=0x1
00000020:00000080:4.0:1651608945.067192:0:97769:0:(obd_config.c:1356:class_process_config()) processing cmd: cf010
00000020:00000080:4.0:1651608945.067193:0:97769:0:(obd_config.c:1432:class_process_config()) marker 6 (0x1) storage-MDT0000 add mdc
00000020:00000080:4.0:1651608945.067194:0:97769:0:(obd_config.c:1356:class_process_config()) processing cmd: cf005
00000020:00000080:4.0:1651608945.067195:0:97769:0:(obd_config.c:1368:class_process_config()) adding mapping from uuid 10.31.7.61 at o2ib to nid 0x500000a1f073d (10.31.7.61 at o2ib)
00000020:01000000:4.0:1651608945.067198:0:97769:0:(obd_config.c:1885:class_config_llog_handler()) cmd cf001, instance name: storage-MDT0000-mdc-ffff910943185800
00000020:00000080:4.0:1651608945.067199:0:97769:0:(obd_config.c:1356:class_process_config()) processing cmd: cf001
00000020:00000080:4.0:1651608945.067201:0:97769:0:(genops.c:415:class_newdev()) Allocate new device storage-MDT0000-mdc-ffff910943185800 (0000000049300b8f)
00000020:00000080:4.0:1651608945.067220:0:97769:0:(obd_config.c:648:class_attach()) OBD: dev 3 attached type mdc with refcount 1
00000020:01000000:4.0:1651608945.067221:0:97769:0:(obd_config.c:1885:class_config_llog_handler()) cmd cf003, instance name: storage-MDT0000-mdc-ffff910943185800
00000020:00000080:4.0:1651608945.067222:0:97769:0:(obd_config.c:1356:class_process_config()) processing cmd: cf003
00010000:00080000:4.0:1651608945.067234:0:97769:0:(ldlm_lib.c:115:import_set_conn()) imp 00000000ec1d7cd0 at storage-MDT0000-mdc-ffff910943185800: add connection 10.31.7.61 at o2ib at head
00000040:01000000:4.0:1651608945.068071:0:97769:0:(llog_obd.c:212:llog_setup()) obd storage-MDT0000-mdc-ffff910943185800 ctxt 13 is initialized
00000020:00000080:5.0F:1651608945.068125:0:97769:0:(obd_config.c:752:class_setup()) finished setup of obd storage-MDT0000-mdc-ffff910943185800 (uuid fcf3db3e-1e74-404e-9b5b-c5ddd03f655e)
00000020:01000000:5.0:1651608945.068128:0:97769:0:(obd_config.c:1885:class_config_llog_handler()) cmd cf014, instance name: storage-clilmv-ffff910943185800
00000020:00000080:5.0:1651608945.068129:0:97769:0:(obd_config.c:1356:class_process_config()) processing cmd: cf014
00800000:01000000:5.0:1651608945.068132:0:97769:0:(lmv_obd.c:386:lmv_add_target()) Target uuid: storage-MDT0000_UUID. index 0
00000020:01000000:5.0:1651608945.068140:0:97769:0:(obd_config.c:1770:class_config_llog_handler()) Marker, inst_flg=0x2 mark_flg=0x2
00000020:00000080:5.0:1651608945.068141:0:97769:0:(obd_config.c:1356:class_process_config()) processing cmd: cf010
00000020:00000080:5.0:1651608945.068142:0:97769:0:(obd_config.c:1432:class_process_config()) marker 6 (0x2) storage-MDT0000 add mdc
00000020:01000000:5.0:1651608945.068143:0:97769:0:(obd_config.c:1770:class_config_llog_handler()) Marker, inst_flg=0x0 mark_flg=0x1
00000020:00000080:5.0:1651608945.068144:0:97769:0:(obd_config.c:1356:class_process_config()) processing cmd: cf010
00000020:00000080:5.0:1651608945.068144:0:97769:0:(obd_config.c:1432:class_process_config()) marker 7 (0x1) storage-client mount opts
00000020:00000080:5.0:1651608945.068146:0:97769:0:(obd_config.c:1356:class_process_config()) processing cmd: cf007
00000020:00000080:5.0:1651608945.068146:0:97769:0:(obd_config.c:1386:class_process_config()) mountopt: profile storage-client osc storage-clilov mdc storage-clilmv
00000020:01000000:5.0:1651608945.068147:0:97769:0:(obd_config.c:1058:class_add_profile()) Add profile storage-client
00000020:01000000:5.0:1651608945.068149:0:97769:0:(obd_config.c:1770:class_config_llog_handler()) Marker, inst_flg=0x2 mark_flg=0x2
00000020:00000080:5.0:1651608945.068149:0:97769:0:(obd_config.c:1356:class_process_config()) processing cmd: cf010
00000020:00000080:5.0:1651608945.068150:0:97769:0:(obd_config.c:1432:class_process_config()) marker 7 (0x2) storage-client mount opts
00000020:01000000:5.0:1651608945.068150:0:97769:0:(obd_config.c:1770:class_config_llog_handler()) Marker, inst_flg=0x0 mark_flg=0x1
00000020:01000004:5.0:1651608945.068151:0:97769:0:(obd_mount.c:990:lustre_check_exclusion()) Check exclusion storage-OST0000 (0) in 0 of 10.31.7.61 at o2ib:/storage
00000020:00000080:5.0:1651608945.068153:0:97769:0:(obd_config.c:1356:class_process_config()) processing cmd: cf010
00000020:00000080:5.0:1651608945.068153:0:97769:0:(obd_config.c:1432:class_process_config()) marker 10 (0x1) storage-OST0000 add osc
00000020:00000080:5.0:1651608945.068154:0:97769:0:(obd_config.c:1356:class_process_config()) processing cmd: cf005
00000020:00000080:5.0:1651608945.068155:0:97769:0:(obd_config.c:1368:class_process_config()) adding mapping from uuid 10.31.7.61 at o2ib to nid 0x500000a1f073d (10.31.7.61 at o2ib)
00000020:01000000:5.0:1651608945.068157:0:97769:0:(obd_config.c:1885:class_config_llog_handler()) cmd cf001, instance name: storage-OST0000-osc-ffff910943185800
00000020:00000080:5.0:1651608945.068158:0:97769:0:(obd_config.c:1356:class_process_config()) processing cmd: cf001
00000020:00000080:5.0:1651608945.068165:0:97769:0:(genops.c:415:class_newdev()) Allocate new device storage-OST0000-osc-ffff910943185800 (000000008f5e9249)
00000020:00000080:5.0:1651608945.068184:0:97769:0:(obd_config.c:648:class_attach()) OBD: dev 4 attached type osc with refcount 1
00000020:01000000:5.0:1651608945.068185:0:97769:0:(obd_config.c:1885:class_config_llog_handler()) cmd cf003, instance name: storage-OST0000-osc-ffff910943185800
00000020:00000080:5.0:1651608945.068186:0:97769:0:(obd_config.c:1356:class_process_config()) processing cmd: cf003
00010000:00080000:5.0:1651608945.068200:0:97769:0:(ldlm_lib.c:115:import_set_conn()) imp 00000000027d94f9 at storage-OST0000-osc-ffff910943185800: add connection 10.31.7.61 at o2ib at head
00000020:00000080:5.0:1651608945.068344:0:97769:0:(obd_config.c:752:class_setup()) finished setup of obd storage-OST0000-osc-ffff910943185800 (uuid fcf3db3e-1e74-404e-9b5b-c5ddd03f655e)
00000020:01000000:5.0:1651608945.068346:0:97769:0:(obd_config.c:1885:class_config_llog_handler()) cmd cf00d, instance name: storage-clilov-ffff910943185800
00000020:00000080:5.0:1651608945.068346:0:97769:0:(obd_config.c:1356:class_process_config()) processing cmd: cf00d
00020000:01000000:5.0:1651608945.068350:0:97769:0:(lov_obd.c:480:lov_add_target()) uuid:storage-OST0000_UUID idx:0 gen:1 active:1
00020000:01000000:5.0:1651608945.068352:0:97769:0:(lov_obd.c:531:lov_add_target()) tgts: 00000000703900ea size: 2
00020000:01000000:5.0:1651608945.068353:0:97769:0:(lov_obd.c:560:lov_add_target()) idx=0 ltd_gen=1 ld_tgt_count=1
00000020:01000000:5.0:1651608945.068356:0:97769:0:(obd_config.c:1770:class_config_llog_handler()) Marker, inst_flg=0x2 mark_flg=0x2
00000020:00000080:5.0:1651608945.068356:0:97769:0:(obd_config.c:1356:class_process_config()) processing cmd: cf010
00000020:00000080:5.0:1651608945.068357:0:97769:0:(obd_config.c:1432:class_process_config()) marker 10 (0x2) storage-OST0000 add osc
00000020:01000000:5.0:1651608945.068358:0:97769:0:(obd_config.c:1770:class_config_llog_handler()) Marker, inst_flg=0x0 mark_flg=0x1
00000020:01000004:5.0:1651608945.068358:0:97769:0:(obd_mount.c:990:lustre_check_exclusion()) Check exclusion storage-OST0001 (1) in 0 of 10.31.7.61 at o2ib:/storage
00000020:00000080:5.0:1651608945.068359:0:97769:0:(obd_config.c:1356:class_process_config()) processing cmd: cf010
00000020:00000080:5.0:1651608945.068359:0:97769:0:(obd_config.c:1432:class_process_config()) marker 13 (0x1) storage-OST0001 add osc
00000020:00000080:5.0:1651608945.068360:0:97769:0:(obd_config.c:1356:class_process_config()) processing cmd: cf005
00000020:00000080:5.0:1651608945.068361:0:97769:0:(obd_config.c:1368:class_process_config()) adding mapping from uuid 10.31.7.62 at o2ib to nid 0x500000a1f073e (10.31.7.62 at o2ib)
00000020:01000000:5.0:1651608945.068363:0:97769:0:(obd_config.c:1885:class_config_llog_handler()) cmd cf001, instance name: storage-OST0001-osc-ffff910943185800
00000020:00000080:5.0:1651608945.068363:0:97769:0:(obd_config.c:1356:class_process_config()) processing cmd: cf001
00000020:00000080:5.0:1651608945.068365:0:97769:0:(genops.c:415:class_newdev()) Allocate new device storage-OST0001-osc-ffff910943185800 (000000003e523917)
00000020:00000080:5.0:1651608945.068390:0:97769:0:(obd_config.c:648:class_attach()) OBD: dev 5 attached type osc with refcount 1
00000020:01000000:5.0:1651608945.068391:0:97769:0:(obd_config.c:1885:class_config_llog_handler()) cmd cf003, instance name: storage-OST0001-osc-ffff910943185800
00000020:00000080:5.0:1651608945.068392:0:97769:0:(obd_config.c:1356:class_process_config()) processing cmd: cf003
00010000:00080000:5.0:1651608945.068398:0:97769:0:(ldlm_lib.c:115:import_set_conn()) imp 00000000024053c3 at storage-OST0001-osc-ffff910943185800: add connection 10.31.7.62 at o2ib at head
00000020:00000080:5.0:1651608945.068505:0:97769:0:(obd_config.c:752:class_setup()) finished setup of obd storage-OST0001-osc-ffff910943185800 (uuid fcf3db3e-1e74-404e-9b5b-c5ddd03f655e)
00000020:01000000:5.0:1651608945.068506:0:97769:0:(obd_config.c:1885:class_config_llog_handler()) cmd cf00d, instance name: storage-clilov-ffff910943185800
00000020:00000080:5.0:1651608945.068507:0:97769:0:(obd_config.c:1356:class_process_config()) processing cmd: cf00d
00020000:01000000:5.0:1651608945.068508:0:97769:0:(lov_obd.c:480:lov_add_target()) uuid:storage-OST0001_UUID idx:1 gen:1 active:1
00020000:01000000:5.0:1651608945.068509:0:97769:0:(lov_obd.c:560:lov_add_target()) idx=1 ltd_gen=1 ld_tgt_count=2
00000020:01000000:5.0:1651608945.068511:0:97769:0:(obd_config.c:1770:class_config_llog_handler()) Marker, inst_flg=0x2 mark_flg=0x2
00000020:00000080:5.0:1651608945.068511:0:97769:0:(obd_config.c:1356:class_process_config()) processing cmd: cf010
00000020:00000080:5.0:1651608945.068512:0:97769:0:(obd_config.c:1432:class_process_config()) marker 13 (0x2) storage-OST0001 add osc
00000020:01000000:5.0:1651608945.068512:0:97769:0:(obd_config.c:1770:class_config_llog_handler()) Marker, inst_flg=0x0 mark_flg=0x1
00000020:00000080:5.0:1651608945.068513:0:97769:0:(obd_config.c:1356:class_process_config()) processing cmd: cf010
00000020:00000080:5.0:1651608945.068513:0:97769:0:(obd_config.c:1432:class_process_config()) marker 21 (0x1) storage-MDT0001 add mdc
00000020:00000080:5.0:1651608945.068514:0:97769:0:(obd_config.c:1356:class_process_config()) processing cmd: cf005
00000020:00000080:5.0:1651608945.068515:0:97769:0:(obd_config.c:1368:class_process_config()) adding mapping from uuid 10.31.7.62 at o2ib to nid 0x500000a1f073e (10.31.7.62 at o2ib)
00000020:01000000:5.0:1651608945.068516:0:97769:0:(obd_config.c:1885:class_config_llog_handler()) cmd cf001, instance name: storage-MDT0001-mdc-ffff910943185800
00000020:00000080:5.0:1651608945.068517:0:97769:0:(obd_config.c:1356:class_process_config()) processing cmd: cf001
00000020:00000080:5.0:1651608945.068519:0:97769:0:(genops.c:415:class_newdev()) Allocate new device storage-MDT0001-mdc-ffff910943185800 (0000000003506110)
00000020:00000080:5.0:1651608945.068537:0:97769:0:(obd_config.c:648:class_attach()) OBD: dev 6 attached type mdc with refcount 1
00000020:01000000:5.0:1651608945.068538:0:97769:0:(obd_config.c:1885:class_config_llog_handler()) cmd cf003, instance name: storage-MDT0001-mdc-ffff910943185800
00000020:00000080:5.0:1651608945.068539:0:97769:0:(obd_config.c:1356:class_process_config()) processing cmd: cf003
00010000:00080000:5.0:1651608945.068544:0:97769:0:(ldlm_lib.c:115:import_set_conn()) imp 00000000ab8fbc69 at storage-MDT0001-mdc-ffff910943185800: add connection 10.31.7.62 at o2ib at head
00000040:01000000:5.0:1651608945.069342:0:97769:0:(llog_obd.c:212:llog_setup()) obd storage-MDT0001-mdc-ffff910943185800 ctxt 13 is initialized
00000020:00000080:5.0:1651608945.069380:0:97769:0:(obd_config.c:752:class_setup()) finished setup of obd storage-MDT0001-mdc-ffff910943185800 (uuid fcf3db3e-1e74-404e-9b5b-c5ddd03f655e)
00000020:01000000:5.0:1651608945.069381:0:97769:0:(obd_config.c:1885:class_config_llog_handler()) cmd cf014, instance name: storage-clilmv-ffff910943185800
00000020:00000080:5.0:1651608945.069382:0:97769:0:(obd_config.c:1356:class_process_config()) processing cmd: cf014
00800000:01000000:5.0:1651608945.069383:0:97769:0:(lmv_obd.c:386:lmv_add_target()) Target uuid: storage-MDT0001_UUID. index 1
00000020:01000000:5.0:1651608945.069386:0:97769:0:(obd_config.c:1770:class_config_llog_handler()) Marker, inst_flg=0x2 mark_flg=0x2
00000020:00000080:5.0:1651608945.069387:0:97769:0:(obd_config.c:1356:class_process_config()) processing cmd: cf010
00000020:00000080:5.0:1651608945.069387:0:97769:0:(obd_config.c:1432:class_process_config()) marker 21 (0x2) storage-MDT0001 add mdc
00000020:01000000:5.0:1651608945.069388:0:97769:0:(obd_config.c:1770:class_config_llog_handler()) Marker, inst_flg=0x0 mark_flg=0x1
00000020:00000080:5.0:1651608945.069389:0:97769:0:(obd_config.c:1356:class_process_config()) processing cmd: cf010
00000020:00000080:5.0:1651608945.069389:0:97769:0:(obd_config.c:1432:class_process_config()) marker 22 (0x1) storage-client mount opts
00000020:00000080:5.0:1651608945.069390:0:97769:0:(obd_config.c:1356:class_process_config()) processing cmd: cf007
00000020:00000080:5.0:1651608945.069390:0:97769:0:(obd_config.c:1386:class_process_config()) mountopt: profile storage-client osc storage-clilov mdc storage-clilmv
00000020:01000000:5.0:1651608945.069391:0:97769:0:(obd_config.c:1058:class_add_profile()) Add profile storage-client
00000020:01000000:5.0:1651608945.069564:0:97769:0:(obd_config.c:1770:class_config_llog_handler()) Marker, inst_flg=0x2 mark_flg=0x2
00000020:00000080:5.0:1651608945.069565:0:97769:0:(obd_config.c:1356:class_process_config()) processing cmd: cf010
00000020:00000080:5.0:1651608945.069565:0:97769:0:(obd_config.c:1432:class_process_config()) marker 22 (0x2) storage-client mount opts
00000040:00080000:5.0:1651608945.069566:0:97769:0:(llog.c:768:llog_process_thread()) stop processing plain 0x4:10:0 index 39 count 39
00000020:01000000:6.0:1651608945.069572:0:97748:0:(obd_config.c:2042:class_config_parse_llog()) Processed log storage-client gen 1-38 (rc=0)
10000000:01000000:6.0:1651608945.069576:0:97748:0:(mgc_request.c:2132:mgc_process_log()) MGC10.31.7.61 at o2ib: configuration from log 'storage-client' succeeded (0).
10000000:01000000:6.0:1651608945.069577:0:97748:0:(mgc_request.c:2060:mgc_process_log()) Process log storage-cliir-ffff910943185800 from 1
10000000:01000000:6.0:1651608945.069577:0:97748:0:(mgc_request.c:1102:mgc_enqueue()) Enqueue for storage-cliir (res 0x656761726f7473)
00000020:00000080:6.0:1651608945.069823:0:97748:0:(obd_config.c:1356:class_process_config()) processing cmd: cf00f
00000020:00000080:6.0:1651608945.069851:0:97748:0:(obd_config.c:1356:class_process_config()) processing cmd: cf00f
00000020:00000080:6.0:1651608945.069859:0:97748:0:(obd_config.c:1356:class_process_config()) processing cmd: cf00f
00000020:00000080:6.0:1651608945.069866:0:97748:0:(obd_config.c:1356:class_process_config()) processing cmd: cf00f
10000000:01000000:6.0:1651608945.069893:0:97748:0:(mgc_request.c:2132:mgc_process_log()) MGC10.31.7.61 at o2ib: configuration from log 'storage-cliir' succeeded (0).
10000000:01000000:6.0:1651608945.069894:0:97748:0:(mgc_request.c:2060:mgc_process_log()) Process log params-ffff910943185800 from 1
10000000:01000000:6.0:1651608945.069895:0:97748:0:(mgc_request.c:1102:mgc_enqueue()) Enqueue for params (res 0x736d61726170)
00000040:00080000:1.0F:1651608945.070236:0:97775:0:(llog.c:768:llog_process_thread()) stop processing plain 0x2:10:0 index 64768 count 1
00000020:01000000:6.0:1651608945.070243:0:97748:0:(obd_config.c:2042:class_config_parse_llog()) Processed log params gen 1-0 (rc=0)
10000000:01000000:6.0:1651608945.070246:0:97748:0:(mgc_request.c:2132:mgc_process_log()) MGC10.31.7.61 at o2ib: configuration from log 'params' succeeded (0).
00000080:01000000:6.0:1651608945.070248:0:97748:0:(llite_lib.c:1312:ll_fill_super()) Found profile storage-client: mdc=storage-clilmv osc=storage-clilov
00000020:00000080:6.0:1651608945.070252:0:97748:0:(genops.c:1357:class_connect()) connect: client fcf3db3e-1e74-404e-9b5b-c5ddd03f655e, cookie 0xb433eadff9fbd43d
00800000:01000000:6.0:1651608945.070256:0:97748:0:(lmv_obd.c:459:lmv_check_connect()) Time to connect fcf3db3e-1e74-404e-9b5b-c5ddd03f655e to storage-clilmv-ffff910943185800
00800000:01000000:6.0:1651608945.070258:0:97748:0:(lmv_obd.c:295:lmv_connect_mdc()) connect to storage-MDT0000-mdc-ffff910943185800(fcf3db3e-1e74-404e-9b5b-c5ddd03f655e) - storage-MDT0000_UUID, fcf3db3e-1e74-404e-9b5b-c5ddd03f655e
00000020:00000080:6.0:1651608945.070260:0:97748:0:(genops.c:1357:class_connect()) connect: client fcf3db3e-1e74-404e-9b5b-c5ddd03f655e, cookie 0xb433eadff9fbd444
00000100:00080000:6.0:1651608945.070261:0:97748:0:(import.c:533:import_select_connection()) storage-MDT0000-mdc-ffff910943185800: connect to NID 10.31.7.61 at o2ib last attempt 0
00000100:00080000:6.0:1651608945.070262:0:97748:0:(import.c:614:import_select_connection()) storage-MDT0000-mdc-ffff910943185800: import 00000000ec1d7cd0 using connection 10.31.7.61 at o2ib/10.31.7.61 at o2ib<mailto:10.31.7.61 at o2ib/10.31.7.61 at o2ib>
00000100:00080000:6.0:1651608945.070268:0:97748:0:(pinger.c:388:ptlrpc_pinger_add_import()) adding pingable import fcf3db3e-1e74-404e-9b5b-c5ddd03f655e->storage-MDT0000_UUID
00800000:01000000:6.0:1651608945.070284:0:97748:0:(lmv_obd.c:356:lmv_connect_mdc()) Connected to storage-MDT0000-mdc-ffff910943185800(fcf3db3e-1e74-404e-9b5b-c5ddd03f655e) successfully (4)
00800000:01000000:6.0:1651608945.070293:0:97748:0:(lmv_obd.c:295:lmv_connect_mdc()) connect to storage-MDT0001-mdc-ffff910943185800(fcf3db3e-1e74-404e-9b5b-c5ddd03f655e) - storage-MDT0001_UUID, fcf3db3e-1e74-404e-9b5b-c5ddd03f655e
00000020:00000080:6.0:1651608945.070295:0:97748:0:(genops.c:1357:class_connect()) connect: client fcf3db3e-1e74-404e-9b5b-c5ddd03f655e, cookie 0xb433eadff9fbd44b
00000100:00080000:23.0:1651608945.070296:0:41675:0:(client.c:1659:ptlrpc_send_new_req()) @@@ req waiting for recovery: (FULL != CONNECTING)  req at 0000000063b7198e x1731815472708928/t0(0) o41->storage-MDT0000-mdc-ffff910943185800 at 10.31.7.61@o2ib:12/10<mailto:storage-MDT0000-mdc-ffff910943185800 at 10.31.7.61@o2ib:12/10> lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:WQU/0/ffffffff rc 0/-1 job:''
00000100:00080000:6.0:1651608945.070296:0:97748:0:(import.c:533:import_select_connection()) storage-MDT0001-mdc-ffff910943185800: connect to NID 10.31.7.62 at o2ib last attempt 0
00000100:00080000:6.0:1651608945.070297:0:97748:0:(import.c:614:import_select_connection()) storage-MDT0001-mdc-ffff910943185800: import 00000000ab8fbc69 using connection 10.31.7.62 at o2ib/10.31.7.62 at o2ib<mailto:10.31.7.62 at o2ib/10.31.7.62 at o2ib>
00000100:00080000:6.0:1651608945.070302:0:97748:0:(pinger.c:388:ptlrpc_pinger_add_import()) adding pingable import fcf3db3e-1e74-404e-9b5b-c5ddd03f655e->storage-MDT0001_UUID
00800000:01000000:6.0:1651608945.070311:0:97748:0:(lmv_obd.c:356:lmv_connect_mdc()) Connected to storage-MDT0001-mdc-ffff910943185800(fcf3db3e-1e74-404e-9b5b-c5ddd03f655e) successfully (4)
00000100:00080000:22.0F:1651608945.070319:0:41676:0:(client.c:1659:ptlrpc_send_new_req()) @@@ req waiting for recovery: (FULL != CONNECTING)  req at 000000000268eff3 x1731815472709056/t0(0) o41->storage-MDT0001-mdc-ffff910943185800 at 10.31.7.62@o2ib:12/10<mailto:storage-MDT0001-mdc-ffff910943185800 at 10.31.7.62@o2ib:12/10> lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:WQU/0/ffffffff rc 0/-1 job:''
00000100:00080000:6.0:1651608945.070323:0:97748:0:(client.c:1659:ptlrpc_send_new_req()) @@@ req waiting for recovery: (FULL != CONNECTING)  req at 000000004b401ce8 x1731815472709120/t0(0) o41->storage-MDT0000-mdc-ffff910943185800 at 10.31.7.61@o2ib:12/10<mailto:storage-MDT0000-mdc-ffff910943185800 at 10.31.7.61@o2ib:12/10> lens 224/368 e 0 to 0 dl 0 ref 2 fl Rpc:WQU/0/ffffffff rc 0/-1 job:''
00000100:00080000:21.0F:1651608945.070335:0:41661:0:(import.c:85:import_set_state_nolock()) 00000000ec1d7cd0 storage-MDT0000_UUID: changing import state from CONNECTING to DISCONN
00000100:00080000:21.0:1651608945.070339:0:41661:0:(import.c:1428:ptlrpc_connect_interpret()) recovery of storage-MDT0000_UUID on 10.31.7.61 at o2ib failed (-11)
00000100:00080000:21.0:1651608945.070396:0:41661:0:(import.c:85:import_set_state_nolock()) 00000000ab8fbc69 storage-MDT0001_UUID: changing import state from CONNECTING to DISCONN
00000100:00080000:21.0:1651608945.070399:0:41661:0:(import.c:1428:ptlrpc_connect_interpret()) recovery of storage-MDT0001_UUID on 10.31.7.62 at o2ib failed (-11)
00000100:00080000:23.0:1651608945.072863:0:97778:0:(import.c:233:ptlrpc_set_import_discon()) mdc: import 00000000ab8fbc69 already not connected (conn 1, was 0): DISCONN
00010000:00080000:23.0:1651608945.072867:0:97778:0:(ldlm_lib.c:98:import_set_conn()) imp 00000000ab8fbc69 at storage-MDT0001-mdc-ffff910943185800: found existing conn 10.31.7.62 at o2ib, moved to head
00000100:00080000:23.0:1651608945.072869:0:97778:0:(import.c:85:import_set_state_nolock()) 00000000ab8fbc69 storage-MDT0001_UUID: changing import state from DISCONN to CONNECTING
00000100:00080000:23.0:1651608945.072870:0:97778:0:(import.c:533:import_select_connection()) storage-MDT0001-mdc-ffff910943185800: connect to NID 10.31.7.62 at o2ib last attempt 0
00000100:00080000:23.0:1651608945.072871:0:97778:0:(import.c:614:import_select_connection()) storage-MDT0001-mdc-ffff910943185800: import 00000000ab8fbc69 using connection 10.31.7.62 at o2ib/10.31.7.62 at o2ib<mailto:10.31.7.62 at o2ib/10.31.7.62 at o2ib>
00000100:00080000:21.0:1651608945.072937:0:41661:0:(import.c:85:import_set_state_nolock()) 00000000ab8fbc69 storage-MDT0001_UUID: changing import state from CONNECTING to DISCONN
00000100:00080000:21.0:1651608945.072938:0:41661:0:(import.c:1428:ptlrpc_connect_interpret()) recovery of storage-MDT0001_UUID on 10.31.7.62 at o2ib failed (-11)
00000100:00080000:30.0:1651608945.073197:0:97779:0:(import.c:233:ptlrpc_set_import_discon()) mdc: import 00000000ec1d7cd0 already not connected (conn 1, was 0): DISCONN
00010000:00080000:30.0:1651608945.073206:0:97779:0:(ldlm_lib.c:98:import_set_conn()) imp 00000000ec1d7cd0 at storage-MDT0000-mdc-ffff910943185800: found existing conn 10.31.7.61 at o2ib, moved to head
00000100:00080000:30.0:1651608945.073209:0:97779:0:(import.c:85:import_set_state_nolock()) 00000000ec1d7cd0 storage-MDT0000_UUID: changing import state from DISCONN to CONNECTING
00000100:00080000:30.0:1651608945.073211:0:97779:0:(import.c:533:import_select_connection()) storage-MDT0000-mdc-ffff910943185800: connect to NID 10.31.7.61 at o2ib last attempt 0
00000100:00080000:30.0:1651608945.073214:0:97779:0:(import.c:614:import_select_connection()) storage-MDT0000-mdc-ffff910943185800: import 00000000ec1d7cd0 using connection 10.31.7.61 at o2ib/10.31.7.61 at o2ib<mailto:10.31.7.61 at o2ib/10.31.7.61 at o2ib>
00000100:00080000:4.0:1651608945.073280:0:41661:0:(import.c:85:import_set_state_nolock()) 00000000ec1d7cd0 storage-MDT0000_UUID: changing import state from CONNECTING to DISCONN
00000100:00080000:4.0:1651608945.073281:0:41661:0:(import.c:1428:ptlrpc_connect_interpret()) recovery of storage-MDT0000_UUID on 10.31.7.61 at o2ib failed (-11)

I'm not sure why on the client node there is something about waiting for recovery/recovery failing as on the io nodes no recovery is happening. This is also the case if I explicitly abort recovery.

Could you possibly help me debug this further?

Best regards,
Stepan


------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr. Astrid Lambrecht,
Prof. Dr. Frauke Melchior
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------


Neugierige sind herzlich willkommen am Sonntag, den 21. August 2022, von 10:00 bis 17:00 Uhr. Mehr unter: https://www.tagderneugier.de
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20220503/4da58034/attachment-0001.html>


More information about the lustre-discuss mailing list