[Lustre-discuss] liblustre sanity test

wangdi di.wang at whamcloud.com
Mon Feb 6 22:26:27 PST 2012


On 02/06/2012 10:08 PM, Jack David wrote:
> On Mon, Feb 6, 2012 at 11:52 PM, wangdi<di.wang at whamcloud.com>  wrote:
>> On 02/06/2012 03:50 AM, Jack David wrote:
>>> On Mon, Feb 6, 2012 at 1:20 PM, wangdi<di.wang at whamcloud.com>    wrote:
>>>> On 02/05/2012 11:26 PM, Jack David wrote:
>>>>> On Mon, Feb 6, 2012 at 12:42 PM, wangdi<di.wang at whamcloud.com>      wrote:
>>>>>> On 02/05/2012 10:37 PM, Jack David wrote:
>>>>>>> Hi All,
>>>>>>>
>>>>>>> I am using the following guide to understand how liblustre works.
>>>>>>>
>>>>>>> http://wiki.lustre.org/index.php/LibLustre_How-To_Guide
>>>>>>>
>>>>>>> But, I am not able to run the "sanity" test. The reason may be that I
>>>>>>> am not passing the correct "profile_name" file while running the test.
>>>>>>>
>>>>>>> I have created two directories under my client
>>>>>>> /mnt/lustre
>>>>>>> /mnt/liblustre_client
>>>>>>>
>>>>>>> My MDS ip address is 10.193.123.1, and I am using following command
>>>>>>> from my client:
>>>>>>>
>>>>>>> sanity --target 10.193.123.1:/mnt/liblustre_client
>>>>>> sanity --target mgsnid:/your_fsname
>>>>>>
>>>>> Thanks, I used the "fsname" in the sanity command, but I am now
>>>>> getting following error. (which says The mds_connect operation failed
>>>>> with -16)
>>>>> =====================================
>>>>> <root at niteshs /usr/src/lustre-release>$ lustre/liblustre/tests/sanity
>>>>> --target nanogon:/temp | head
>>>>>
>>>>> 1328512853.118823:23449:niteshs:(class_obd.c:492:init_obdclass()):
>>>>> Lustre: 23449-niteshs:(class_obd.c:492:init_obdclass()): Lustre: Build
>>>>> Version: 2.1.52-g48452fb-CHANGED-2.6.32-lustre-patched
>>>>> 1328512853.151641:23449:niteshs:(lov_obd.c:2892:lov_init()): Lustre:
>>>>> 23449-niteshs:(lov_obd.c:2892:lov_init()): Lustre LOV module
>>>>> (0x85d180).
>>>>> 1328512853.151687:23449:niteshs:(osc_request.c:4636:osc_init()):
>>>>> Lustre: 23449-niteshs:(osc_request.c:4636:osc_init()): Lustre OSC
>>>>> module (0x85da40).
>>>>> 1328512853.158760:23449:niteshs:(sec.c:1475:sptlrpc_import_sec_adapt()):
>>>>> Lustre: 23449-niteshs:(sec.c:1475:sptlrpc_import_sec_adapt()): import
>>>>> mgc_dev->10.193.186.112 at tcp netid 20000: select flavor null
>>>>> 1328512853.175099:23449:niteshs:(sec.c:1475:sptlrpc_import_sec_adapt()):
>>>>> Lustre: 23449-niteshs:(sec.c:1475:sptlrpc_import_sec_adapt()): import
>>>>> temp-OST0000-osc-0x22c0670->10.193.184.135 at tcp netid 20000: select
>>>>> flavor null
>>>>> 1328512853.175159:23449:niteshs:(sec.c:1475:sptlrpc_import_sec_adapt()):
>>>>> Lustre: 23449-niteshs:(sec.c:1475:sptlrpc_import_sec_adapt()): import
>>>>> temp-MDT0000-mdc-0x22c0670->10.193.186.112 at tcp netid 20000: select
>>>>> flavor null
>>>>> 1328512853.179231:23449:niteshs:(client.c:1141:ptlrpc_check_status()):
>>>>> LustreError: 23449-niteshs:(client.c:1141:ptlrpc_check_status()):
>>>>> 11-0: an error occurred while communicating with 10.193.186.112 at tcp.
>>>>> The mds_connect operation failed with -16
>>>>> 1328512853.179763:23449:niteshs:(client.c:1141:ptlrpc_check_status()):
>>>>> LustreError: 23449-niteshs:(client.c:1141:ptlrpc_check_status()):
>>>>> 11-0: an error occurred while communicating with 10.193.186.112 at tcp.
>>>>> The mds_connect operation failed with -16
>>>>> 1328512853.180146:23449:niteshs:(client.c:1141:ptlrpc_check_status()):
>>>>> LustreError: 23449-niteshs:(client.c:1141:ptlrpc_check_status()):
>>>>> 11-0: an error occurred while communicating with 10.193.186.112 at tcp.
>>>>> The mds_connect operation failed with -16
>>>>> =====================================
>>>> It seems mds is somehow stuck in a long recovery, so it can not access
>>>> the
>>>> new connection. You might wait a bit. or just umount mds and remount it
>>>> with
>>>> -o abort_recov.
>>>>
>>> I tried after some time, and the mds_connect failure error disappeared
>>> (as mentioned in my earlier email). But now I am stuck with the new
>>> problem, i.e. sanity test does not work. I ran it with "gdb" and found
>>> that it failed in the first test itself (test t1, which does
>>> touch+unlink). The failure is in "open" call. I am not sure why it
>>> fails. Does the "sanity" test have any prerequisite, like lustre
>>> should be mounted on a specific path? Because I could see that test_t1
>>> file was created when I mounted the filesystem using "mount" command.
>>>
>>> Following screen-log says that now it failed in "unlink" command.
>> Hmm, you can set environment variable LIBLUSTRE_MOUNT_POINT will indicate
>> where the lustre mounted.
>>
>> What is your lustre version?
>>
>> Btw: why do not you try lustre/tests/liblustre.sh, which might make things
>> easier for you.
>>
> I set the LIBLUSTRE_MOUNT_POINT as well, but it did not help.
>
> My FSNAME is "temp", so I set it to "/mnt/temp" but it didn't work. In
> the /usr/sbin/lrun file, the default path is /mnt/lustre, and I gave
> it a shot, but no luck either.
>
> I cloned the git (from whamcloud) a couple of weeks back, so not sure
> about the version. I do see lustre-2.1.52.tar.gz file so assuming that
> to be the version.
>
> I tried running the lustre/tests/liblustre.sh (after modifying the
> FSNAME, MOUNT2, mds_HOST and ost_HOST variables in local.sh of client)
> but the same error.
>
> Something is wrong.

Hmm, there is a liblustre bug (lu-703) recently, you might want to retry 
this after that is fixed.

Thanks
WangDi

>> Thanks
>> WangDi
>>
>>> ===== START t1: touch+unlink
>>> 1328528905========================================
>>>
>>> 1328528905.411406:28664:niteshs:(/usr/src/lustre-release/lustre/include/obd_class.h:1980:md_intent_lock()):
>>> LustreError:
>>> 28664-niteshs:(/usr/src/lustre-release/lustre/include/obd_class.h:1980:md_intent_lock()):
>>> obd_intent_lock: NULL export
>>>
>>> 1328528905.411425:28664:niteshs:(/usr/src/lustre-release/lustre/include/obd_class.h:1980:md_intent_lock()):
>>> LustreError:
>>> 28664-niteshs:(/usr/src/lustre-release/lustre/include/obd_class.h:1980:md_intent_lock()):
>>> obd_intent_lock: NULL export
>>> unlink(/mnt/lustre/test_t1) error: No such device
>>>
>>> What am I missing?
>>
>
>




More information about the lustre-discuss mailing list