[Lustre-discuss] OST unavailable tests
Shantanu S Pavgi
pavgi at uab.edu
Mon Jul 6 13:14:33 PDT 2009
Hi,
I am exploring lustre configuration on my test installation. I am doing
some tests for OST crash/unavailable and recovery. I am unable to
unmount file system from client and my client hangs even after
deactivating corresponding OSC on client. I would like to get better
understanding of what is happening here. The test installation as follows:
- combined MGS/MDS with OSS/T
- separate OSS/T
- separate client
* Following steps were performed:
Step 1: unmounted OST of separate OSS box.
-- df command hanged for a response from corresponding OST.
Step 2: mounted back OST
-- df showed output after latency time was over.
Step 3: unmounted OST of separate OSS box, deactivated corresponding
OSC on MDS (lctl --device <no> deactivate)
-- df command hanged for a response from corresponding OST
Step 4: deactivated corresponding OSC on client (lctl --device <no>
deactivate)
-- df command hanged for a response from corresponding OST
Step 5: unmount file system from client
-- device busy message
* Following are the log messages on MDS/MGS machine:
Jul 6 14:36:46 localhost kernel: Lustre: Request x18446744073241928855
sent from pacific-OST0000-osc to NID 10.0.0.15 at tcp 56s ago has timed out
(limit 56s).
Jul 6 14:36:46 localhost kernel: Lustre: Skipped 7 previous similar
messages
Jul 6 14:40:38 localhost dhclient: DHCPREQUEST on eth0 to 10.0.0.91 port 67
Jul 6 14:40:38 localhost dhclient: DHCPACK from 10.0.0.91
Jul 6 14:40:38 localhost dhclient: bound to 10.0.0.18 -- renewal in 376
seconds.
Jul 6 14:40:50 localhost kernel: Lustre:
7082:0:(import.c:508:import_select_connection()) pacific-OST0000-osc:
tried all connections, increasing latency to 51s
Jul 6 14:40:50 localhost kernel: Lustre:
7082:0:(import.c:508:import_select_connection()) Skipped 7 previous
similar messages
Jul 6 14:46:54 localhost dhclient: DHCPREQUEST on eth0 to 10.0.0.91 port 67
Jul 6 14:46:54 localhost dhclient: DHCPACK from 10.0.0.91
Jul 6 14:46:54 localhost dhclient: bound to 10.0.0.18 -- renewal in 415
seconds.
Jul 6 14:48:01 localhost kernel: Lustre: Request x18446744073241928909
sent from pacific-OST0000-osc to NID 10.0.0.15 at tcp 56s ago has timed out
(limit 56s).
Jul 6 14:48:01 localhost kernel: Lustre: Skipped 8 previous similar
messages
* Following are the log messages from client:
Jul 6 14:30:19 localhost kernel: Lustre: Request x1685132016 sent from
pacific-OST0000-osc-c472e000 to NID 10.0.0.15 at tcp 56s ago has timed out
(limit 56s).
Jul 6 14:30:19 localhost kernel: Lustre: Skipped 7 previous similar
messages
Jul 6 14:31:53 localhost kernel: Lustre:
1910:0:(import.c:508:import_select_connection())
pacific-OST0000-osc-c472e000: tried all connections, increasing latency
to 51s
Jul 6 14:31:53 localhost kernel: Lustre:
1910:0:(import.c:508:import_select_connection()) Skipped 7 previous
similar messages
Jul 6 14:35:29 localhost dhclient: DHCPREQUEST on eth0 to 10.0.0.91 port 67
Jul 6 14:35:29 localhost dhclient: DHCPACK from 10.0.0.91
Jul 6 14:35:29 localhost dhclient: bound to 10.0.0.11 -- renewal in 372
seconds.
Jul 6 14:41:34 localhost kernel: Lustre: Request x1685132085 sent from
pacific-OST0000-osc-c472e000 to NID 10.0.0.15 at tcp 56s ago has timed out
(limit 56s).
Jul 6 14:41:34 localhost kernel: Lustre: Skipped 8 previous similar
messages
Jul 6 14:41:41 localhost dhclient: DHCPREQUEST on eth0 to 10.0.0.91 port 67
Jul 6 14:41:41 localhost dhclient: DHCPACK from 10.0.0.91
Jul 6 14:41:41 localhost dhclient: bound to 10.0.0.11 -- renewal in 442
seconds.
Jul 6 14:41:53 localhost kernel: Lustre:
1910:0:(import.c:508:import_select_connection())
pacific-OST0000-osc-c472e000: tried all connections, increasing latency
to 51s
Jul 6 14:41:53 localhost kernel: Lustre:
1910:0:(import.c:508:import_select_connection()) Skipped 7 previous
similar messages
Any insights?
Thanks,
Shantanu Pavgi.
More information about the lustre-discuss
mailing list