[Lustre-discuss] OST unavailable tests

Shantanu S Pavgi pavgi at uab.edu
Mon Jul 6 13:14:33 PDT 2009


Hi,

I am exploring lustre configuration on my test installation. I am doing
some tests for OST crash/unavailable and recovery. I am unable to
unmount file system from client and my client hangs even after
deactivating corresponding OSC on client. I would like to get better
understanding of what is happening here. The test installation as follows:
- combined MGS/MDS with OSS/T
- separate OSS/T
- separate client

* Following steps were performed:
Step 1:  unmounted OST of separate OSS box.
 -- df command hanged for a response from corresponding OST.

Step 2: mounted back OST
 -- df showed output after latency time was over.

Step 3: unmounted OST of separate OSS box, deactivated  corresponding
OSC on MDS (lctl  --device <no> deactivate) 
 --  df command hanged for a response from corresponding OST

Step 4: deactivated  corresponding OSC on client (lctl  --device <no>
deactivate)
 --  df command hanged for a response from corresponding OST

Step 5: unmount file system from client
 -- device busy message

* Following are the log messages on MDS/MGS machine:
Jul  6 14:36:46 localhost kernel: Lustre: Request x18446744073241928855
sent from pacific-OST0000-osc to NID 10.0.0.15 at tcp 56s ago has timed out
(limit 56s).
Jul  6 14:36:46 localhost kernel: Lustre: Skipped 7 previous similar
messages
Jul  6 14:40:38 localhost dhclient: DHCPREQUEST on eth0 to 10.0.0.91 port 67
Jul  6 14:40:38 localhost dhclient: DHCPACK from 10.0.0.91
Jul  6 14:40:38 localhost dhclient: bound to 10.0.0.18 -- renewal in 376
seconds.
Jul  6 14:40:50 localhost kernel: Lustre:
7082:0:(import.c:508:import_select_connection()) pacific-OST0000-osc:
tried all connections, increasing latency to 51s
Jul  6 14:40:50 localhost kernel: Lustre:
7082:0:(import.c:508:import_select_connection()) Skipped 7 previous
similar messages
Jul  6 14:46:54 localhost dhclient: DHCPREQUEST on eth0 to 10.0.0.91 port 67
Jul  6 14:46:54 localhost dhclient: DHCPACK from 10.0.0.91
Jul  6 14:46:54 localhost dhclient: bound to 10.0.0.18 -- renewal in 415
seconds.
Jul  6 14:48:01 localhost kernel: Lustre: Request x18446744073241928909
sent from pacific-OST0000-osc to NID 10.0.0.15 at tcp 56s ago has timed out
(limit 56s).
Jul  6 14:48:01 localhost kernel: Lustre: Skipped 8 previous similar
messages

* Following are the log messages from client:
Jul  6 14:30:19 localhost kernel: Lustre: Request x1685132016 sent from
pacific-OST0000-osc-c472e000 to NID 10.0.0.15 at tcp 56s ago has timed out
(limit 56s).
Jul  6 14:30:19 localhost kernel: Lustre: Skipped 7 previous similar
messages
Jul  6 14:31:53 localhost kernel: Lustre:
1910:0:(import.c:508:import_select_connection())
pacific-OST0000-osc-c472e000: tried all connections, increasing latency
to 51s
Jul  6 14:31:53 localhost kernel: Lustre:
1910:0:(import.c:508:import_select_connection()) Skipped 7 previous
similar messages
Jul  6 14:35:29 localhost dhclient: DHCPREQUEST on eth0 to 10.0.0.91 port 67
Jul  6 14:35:29 localhost dhclient: DHCPACK from 10.0.0.91
Jul  6 14:35:29 localhost dhclient: bound to 10.0.0.11 -- renewal in 372
seconds.
Jul  6 14:41:34 localhost kernel: Lustre: Request x1685132085 sent from
pacific-OST0000-osc-c472e000 to NID 10.0.0.15 at tcp 56s ago has timed out
(limit 56s).
Jul  6 14:41:34 localhost kernel: Lustre: Skipped 8 previous similar
messages
Jul  6 14:41:41 localhost dhclient: DHCPREQUEST on eth0 to 10.0.0.91 port 67
Jul  6 14:41:41 localhost dhclient: DHCPACK from 10.0.0.91
Jul  6 14:41:41 localhost dhclient: bound to 10.0.0.11 -- renewal in 442
seconds.
Jul  6 14:41:53 localhost kernel: Lustre:
1910:0:(import.c:508:import_select_connection())
pacific-OST0000-osc-c472e000: tried all connections, increasing latency
to 51s
Jul  6 14:41:53 localhost kernel: Lustre:
1910:0:(import.c:508:import_select_connection()) Skipped 7 previous
similar messages

Any insights? 

Thanks,
Shantanu Pavgi.






More information about the lustre-discuss mailing list