[lustre-discuss] client fails to mount

Strikwerda, Ger g.j.c.strikwerda at rug.nl
Tue Apr 25 01:04:45 PDT 2017


Hi Brett,

Yes, we can ibping from the rebooted client to the metadata-server:

[root at pg-gpu01 ~]# ibping -G 0xf45214030062ee91
Pong from pg-mds01.(none) (Lid 179): time 0.094 ms
Pong from pg-mds01.(none) (Lid 179): time 0.139 ms
Pong from pg-mds01.(none) (Lid 179): time 0.110 ms
Pong from pg-mds01.(none) (Lid 179): time 0.149 ms

But lctl ping fails at once, no timeouts or anything:

[root at pg-gpu01 ~]# lctl ping 172.23.55.211 at o2ib
failed to ping 172.23.55.211 at o2ib: Input/output error

Also we happen to see some differences on the MGC listings, we have 2
metadata-servers and on pg-mds01 the MGS is mounted and running:

[root at pg-mds02 ~]# lctl dl | grep mgc
  4 UP mgc MGC172.23.55.211 at o2ib 24ecba8d-1574-c649-47fc-c7bc944ce4af 5

[root at pg-mds01 ~]# lctl dl | grep mgc
  1 UP mgc MGC172.23.55.211 at o2ib 0c7a07eb-a49a-189a-89b5-86e6ef805fc3 5

Any ideas/advice on the different hex string?







On Mon, Apr 24, 2017 at 11:20 PM, Brett Lee <brettlee.lustre at gmail.com>
wrote:

> So, the LNet ping is not working, and LNet is running on IB.  Have you
> moved down the stack toward the hardware, running an ibping from a rebooted
> client to the MGS?
>
> Brett
> --
> Protect Yourself Against Cybercrime
> PDS Software Solutions LLC
> https://www.TrustPDS.com <https://www.trustpds.com/>
>
> On Mon, Apr 24, 2017 at 11:53 AM, Raj <rajgautam at gmail.com> wrote:
>
>> Yes, this is strange. Normally, I have seen that credits mismatch results
>> this scenario but it doesn't look like this is the case.
>>
>> You wouldn't want to put mgs into capture debug messages as there will be
>> a lot of data.
>>
>> I guess you already tried removing the lustre drivers and adding it again
>> ?
>> lustre_rmmod
>> modprobe -v lustre
>>
>> And check dmesg for any errors...
>>
>>
>> On Mon, Apr 24, 2017 at 12:43 PM Strikwerda, Ger <g.j.c.strikwerda at rug.nl>
>> wrote:
>>
>>> Hi Raj,
>>>
>>> When i do a lctl ping on a MGS server i do not see any logs at all. Also
>>> not when i do a sucessfull ping from a working node. Is there a way to
>>> verbose the Lustre logging to see more detail on the LNET level?
>>>
>>> It is very strange that a rebooted node is able to lctl ping compute
>>> nodes, but fails to lctl ping metadata and storage nodes.
>>>
>>>
>>>
>>>
>>> On Mon, Apr 24, 2017 at 7:35 PM, Raj <rajgautam at gmail.com> wrote:
>>>
>>>> Ger,
>>>> It looks like default configuration of lustre.
>>>>
>>>> Do you see any error message on the MGS side while you are doing lctl
>>>> ping from the rebooted clients?
>>>> On Mon, Apr 24, 2017 at 12:27 PM Strikwerda, Ger <
>>>> g.j.c.strikwerda at rug.nl> wrote:
>>>>
>>>>> Hi Eli,
>>>>>
>>>>> Nothing can be mounted on the Lustre filesystems so the output is:
>>>>>
>>>>> [root at pg-gpu01 ~]# lfs df /home/ger/
>>>>> [root at pg-gpu01 ~]#
>>>>>
>>>>> Empty..
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Apr 24, 2017 at 7:24 PM, E.S. Rosenberg <esr at cs.huji.ac.il>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Apr 24, 2017 at 8:19 PM, Strikwerda, Ger <
>>>>>> g.j.c.strikwerda at rug.nl> wrote:
>>>>>>
>>>>>>> Hallo Eli,
>>>>>>>
>>>>>>> Logfile/syslog on the client-side:
>>>>>>>
>>>>>>> Lustre: Lustre: Build Version: 2.5.3-RC1--PRISTINE-2.6.32-573
>>>>>>> .el6.x86_64
>>>>>>> LNet: Added LNI 172.23.54.51 at o2ib [8/256/0/180]
>>>>>>> LNetError: 2878:0:(o2iblnd_cb.c:2587:kiblnd_rejected())
>>>>>>> 172.23.55.211 at o2ib rejected: consumer defined fatal error
>>>>>>>
>>>>>>
>>>>>> lctl df /path/to/some/file
>>>>>>
>>>>>> gives nothing useful? (the second one will dump *a lot*)
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Apr 24, 2017 at 7:16 PM, E.S. Rosenberg <
>>>>>>> esr+lustre at mail.hebrew.edu> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Apr 24, 2017 at 8:13 PM, Strikwerda, Ger <
>>>>>>>> g.j.c.strikwerda at rug.nl> wrote:
>>>>>>>>
>>>>>>>>> Hi Raj (and others),
>>>>>>>>>
>>>>>>>>> In which file should i state the credits/peer_credits stuff?
>>>>>>>>>
>>>>>>>>> Perhaps relevant config-files:
>>>>>>>>>
>>>>>>>>> [root at pg-gpu01 ~]# cd /etc/modprobe.d/
>>>>>>>>>
>>>>>>>>> [root at pg-gpu01 modprobe.d]# ls
>>>>>>>>> anaconda.conf   blacklist-kvm.conf      dist-alsa.conf
>>>>>>>>> dist-oss.conf           ib_ipoib.conf  lustre.conf  openfwwf.conf
>>>>>>>>> blacklist.conf  blacklist-nouveau.conf  dist.conf
>>>>>>>>> freeipmi-modalias.conf  ib_sdp.conf    mlnx.conf    truescale.conf
>>>>>>>>>
>>>>>>>>> [root at pg-gpu01 modprobe.d]# cat ./ib_ipoib.conf
>>>>>>>>> alias netdev-ib* ib_ipoib
>>>>>>>>>
>>>>>>>>> [root at pg-gpu01 modprobe.d]# cat ./mlnx.conf
>>>>>>>>> # Module parameters for MLNX_OFED kernel modules
>>>>>>>>>
>>>>>>>>> [root at pg-gpu01 modprobe.d]# cat ./lustre.conf
>>>>>>>>> options lnet networks=o2ib(ib0)
>>>>>>>>>
>>>>>>>>> Are there more Lustre/LNET options that could help in this
>>>>>>>>> situation?
>>>>>>>>>
>>>>>>>>
>>>>>>>> What about the logfiles?
>>>>>>>> Any error messages in syslog? lctl debug options?
>>>>>>>> Veel geluk,
>>>>>>>> Eli
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Apr 24, 2017 at 7:02 PM, Raj <rajgautam at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> May be worth checking your lnet credits and peer_credits in
>>>>>>>>>> /etc/modprobe.d ?
>>>>>>>>>> You can compare between working hosts and non working hosts.
>>>>>>>>>> Thanks
>>>>>>>>>> _Raj
>>>>>>>>>>
>>>>>>>>>> On Mon, Apr 24, 2017 at 10:10 AM Strikwerda, Ger <
>>>>>>>>>> g.j.c.strikwerda at rug.nl> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Rick,
>>>>>>>>>>>
>>>>>>>>>>> Even without iptables rules and loading the correct modules
>>>>>>>>>>> afterwards, we get the same results:
>>>>>>>>>>>
>>>>>>>>>>> [root at pg-gpu01 sysconfig]# iptables --list
>>>>>>>>>>> Chain INPUT (policy ACCEPT)
>>>>>>>>>>> target     prot opt source               destination
>>>>>>>>>>>
>>>>>>>>>>> Chain FORWARD (policy ACCEPT)
>>>>>>>>>>> target     prot opt source               destination
>>>>>>>>>>>
>>>>>>>>>>> Chain OUTPUT (policy ACCEPT)
>>>>>>>>>>> target     prot opt source               destination
>>>>>>>>>>>
>>>>>>>>>>> Chain LOGDROP (0 references)
>>>>>>>>>>> target     prot opt source               destination
>>>>>>>>>>> LOG        all  --  anywhere             anywhere            LOG
>>>>>>>>>>> level warning
>>>>>>>>>>> DROP       all  --  anywhere             anywhere
>>>>>>>>>>>
>>>>>>>>>>> [root at pg-gpu01 sysconfig]# modprobe lnet
>>>>>>>>>>>
>>>>>>>>>>> [root at pg-gpu01 sysconfig]# modprobe lustre
>>>>>>>>>>>
>>>>>>>>>>> [root at pg-gpu01 sysconfig]# lctl ping 172.23.55.211 at o2ib
>>>>>>>>>>>
>>>>>>>>>>> failed to ping 172.23.55.211 at o2ib: Input/output error
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Apr 24, 2017 at 4:59 PM, Mohr Jr, Richard Frank (Rick
>>>>>>>>>>> Mohr) <rmohr at utk.edu> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> This might be a long shot, but have you checked for possible
>>>>>>>>>>>> firewall rules that might be causing the issue?  I’m wondering if there is
>>>>>>>>>>>> a chance that some rules were added after the nodes were up to allow Lustre
>>>>>>>>>>>> access, and when a node got rebooted, it lost the rules.
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Rick Mohr
>>>>>>>>>>>> Senior HPC System Administrator
>>>>>>>>>>>> National Institute for Computational Sciences
>>>>>>>>>>>> http://www.nics.tennessee.edu
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> > On Apr 24, 2017, at 10:19 AM, Strikwerda, Ger <
>>>>>>>>>>>> g.j.c.strikwerda at rug.nl> wrote:
>>>>>>>>>>>> >
>>>>>>>>>>>> > Hi Russell,
>>>>>>>>>>>> >
>>>>>>>>>>>> > Thanks for the IB subnet clues:
>>>>>>>>>>>> >
>>>>>>>>>>>> > [root at pg-gpu01 ~]# ibv_devinfo
>>>>>>>>>>>> > hca_id: mlx4_0
>>>>>>>>>>>> >         transport:                      InfiniBand (0)
>>>>>>>>>>>> >         fw_ver:                         2.32.5100
>>>>>>>>>>>> >         node_guid:                      f452:1403:00f5:4620
>>>>>>>>>>>> >         sys_image_guid:                 f452:1403:00f5:4623
>>>>>>>>>>>> >         vendor_id:                      0x02c9
>>>>>>>>>>>> >         vendor_part_id:                 4099
>>>>>>>>>>>> >         hw_ver:                         0x1
>>>>>>>>>>>> >         board_id:                       MT_1100120019
>>>>>>>>>>>> >         phys_port_cnt:                  1
>>>>>>>>>>>> >                 port:   1
>>>>>>>>>>>> >                         state:                  PORT_ACTIVE
>>>>>>>>>>>> (4)
>>>>>>>>>>>> >                         max_mtu:                4096 (5)
>>>>>>>>>>>> >                         active_mtu:             4096 (5)
>>>>>>>>>>>> >                         sm_lid:                 1
>>>>>>>>>>>> >                         port_lid:               185
>>>>>>>>>>>> >                         port_lmc:               0x00
>>>>>>>>>>>> >                         link_layer:             InfiniBand
>>>>>>>>>>>> >
>>>>>>>>>>>> > [root at pg-gpu01 ~]# sminfo
>>>>>>>>>>>> > sminfo: sm lid 1 sm guid 0xf452140300f62320, activity count
>>>>>>>>>>>> 80878098 priority 0 state 3 SMINFO_MASTER
>>>>>>>>>>>> >
>>>>>>>>>>>> > Looks like the rebooted node is able to connect/contact IB/IB
>>>>>>>>>>>> subnetmanager
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> > On Mon, Apr 24, 2017 at 4:14 PM, Russell Dekema <
>>>>>>>>>>>> dekemar at umich.edu> wrote:
>>>>>>>>>>>> > At first glance, this sounds like your Infiniband subnet
>>>>>>>>>>>> manager may
>>>>>>>>>>>> > be down or malfunctioning. In this case, nodes which were
>>>>>>>>>>>> already up
>>>>>>>>>>>> > when the subnet manager was working will continue to be able
>>>>>>>>>>>> to
>>>>>>>>>>>> > communicate over IB, but nodes which reboot after the SM goes
>>>>>>>>>>>> down
>>>>>>>>>>>> > will not.
>>>>>>>>>>>> >
>>>>>>>>>>>> > You can test this theory by running the 'ibv_devinfo' command
>>>>>>>>>>>> on one
>>>>>>>>>>>> > of your rebooted nodes. If the relevant IB port is in state
>>>>>>>>>>>> PORT_INIT,
>>>>>>>>>>>> > this confirms there is a problem with your subnet manager.
>>>>>>>>>>>> >
>>>>>>>>>>>> > Sincerely,
>>>>>>>>>>>> > Rusty Dekema
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> > On Mon, Apr 24, 2017 at 9:57 AM, Strikwerda, Ger
>>>>>>>>>>>> > <g.j.c.strikwerda at rug.nl> wrote:
>>>>>>>>>>>> > > Hi everybody,
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > Here at the university of Groningen we are now experiencing
>>>>>>>>>>>> a strange Lustre
>>>>>>>>>>>> > > error. If a client reboots, it fails to mount the Lustre
>>>>>>>>>>>> storage. The client
>>>>>>>>>>>> > > is not able to reach the MSG service. The storage and nodes
>>>>>>>>>>>> are
>>>>>>>>>>>> > > communicating over IB and unitil now without any problems.
>>>>>>>>>>>> It looks like an
>>>>>>>>>>>> > > issue inside LNET. Clients cannot LNET ping/connect the
>>>>>>>>>>>> metadata and or
>>>>>>>>>>>> > > storage. But the clients are able to LNET ping each other.
>>>>>>>>>>>> Clients which not
>>>>>>>>>>>> > > have been rebooted, are working fine and have their mounts
>>>>>>>>>>>> on our Lustre
>>>>>>>>>>>> > > filesystem.
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > Lustre client log:
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > Lustre: Lustre: Build Version:
>>>>>>>>>>>> 2.5.3-RC1--PRISTINE-2.6.32-573.el6.x86_64
>>>>>>>>>>>> > > LNet: Added LNI 172.23.54.51 at o2ib [8/256/0/180]
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > LustreError: 15c-8: MGC172.23.55.211 at o2ib: The
>>>>>>>>>>>> configuration from log
>>>>>>>>>>>> > > 'pgdata01-client' failed (-5). This may be the result of
>>>>>>>>>>>> communication
>>>>>>>>>>>> > > errors between this node and the MGS, a bad configuration,
>>>>>>>>>>>> or other errors.
>>>>>>>>>>>> > > See the syslog for more information.
>>>>>>>>>>>> > > LustreError: 3812:0:(llite_lib.c:1046:ll_fill_super())
>>>>>>>>>>>> Unable to process
>>>>>>>>>>>> > > log: -5
>>>>>>>>>>>> > > Lustre: Unmounted pgdata01-client
>>>>>>>>>>>> > > LustreError: 3812:0:(obd_mount.c:1325:lustre_fill_super())
>>>>>>>>>>>> Unable to mount
>>>>>>>>>>>> > > (-5)
>>>>>>>>>>>> > > LNetError: 2882:0:(o2iblnd_cb.c:2587:kiblnd_rejected())
>>>>>>>>>>>> 172.23.55.212 at o2ib
>>>>>>>>>>>> > > rejected: consumer defined fatal error
>>>>>>>>>>>> > > LNetError: 2882:0:(o2iblnd_cb.c:2587:kiblnd_rejected())
>>>>>>>>>>>> Skipped 1 previous
>>>>>>>>>>>> > > similar message
>>>>>>>>>>>> > > Lustre: 3765:0:(client.c:1918:ptlrpc_expire_one_request())
>>>>>>>>>>>> @@@ Request sent
>>>>>>>>>>>> > > has failed due to network error: [sent 1492789626/real
>>>>>>>>>>>> 1492789626]
>>>>>>>>>>>> > > req at ffff88105af2cc00 x1565303228072004/t0(0)
>>>>>>>>>>>> > > o250->MGC172.23.55.211 at o2ib@172.23.55.212 at o2ib:26/25 lens
>>>>>>>>>>>> 400/544 e 0 to 1
>>>>>>>>>>>> > > dl 1492789631 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
>>>>>>>>>>>> > > Lustre: 3765:0:(client.c:1918:ptlrpc_expire_one_request())
>>>>>>>>>>>> Skipped 1
>>>>>>>>>>>> > > previous similar message
>>>>>>>>>>>> > > LustreError: 3826:0:(client.c:1083:ptlrpc_import_delay_req())
>>>>>>>>>>>> @@@ send limit
>>>>>>>>>>>> > > expired   req at ffff882041ffc000 x1565303228071996/t0(0)
>>>>>>>>>>>> > > o101->MGC172.23.55.211 at o2ib@172.23.55.211 at o2ib:26/25 lens
>>>>>>>>>>>> 328/344 e 0 to 0
>>>>>>>>>>>> > > dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
>>>>>>>>>>>> > > LustreError: 3826:0:(client.c:1083:ptlrpc_import_delay_req())
>>>>>>>>>>>> Skipped 2
>>>>>>>>>>>> > > previous similar messages
>>>>>>>>>>>> > > LustreError: 15c-8: MGC172.23.55.211 at o2ib: The
>>>>>>>>>>>> configuration from log
>>>>>>>>>>>> > > 'pghome01-client' failed (-5). This may be the result of
>>>>>>>>>>>> communication
>>>>>>>>>>>> > > errors between this node and the MGS, a bad configuration,
>>>>>>>>>>>> or other errors.
>>>>>>>>>>>> > > See the syslog for more information.
>>>>>>>>>>>> > > LustreError: 3826:0:(llite_lib.c:1046:ll_fill_super())
>>>>>>>>>>>> Unable to process
>>>>>>>>>>>> > > log: -5
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > LNetError: 2882:0:(o2iblnd_cb.c:2587:kiblnd_rejected())
>>>>>>>>>>>> 172.23.55.212 at o2ib
>>>>>>>>>>>> > > rejected: consumer defined fatal error
>>>>>>>>>>>> > > LNetError: 2882:0:(o2iblnd_cb.c:2587:kiblnd_rejected())
>>>>>>>>>>>> Skipped 1 previous
>>>>>>>>>>>> > > similar message
>>>>>>>>>>>> > > LNet: 3755:0:(o2iblnd_cb.c:475:kiblnd_rx_complete()) Rx
>>>>>>>>>>>> from
>>>>>>>>>>>> > > 172.23.55.211 at o2ib failed: 5
>>>>>>>>>>>> > > LNetError: 2882:0:(o2iblnd_cb.c:2587:kiblnd_rejected())
>>>>>>>>>>>> 172.23.55.211 at o2ib
>>>>>>>>>>>> > > rejected: consumer defined fatal error
>>>>>>>>>>>> > > LNetError: 2882:0:(o2iblnd_cb.c:2587:kiblnd_rejected())
>>>>>>>>>>>> Skipped 1 previous
>>>>>>>>>>>> > > similar message
>>>>>>>>>>>> > > LNet: 2882:0:(o2iblnd_cb.c:2072:kiblnd_peer_connect_failed())
>>>>>>>>>>>> Deleting
>>>>>>>>>>>> > > messages for 172.23.55.211 at o2ib: connection failed
>>>>>>>>>>>> > > LNet: 2882:0:(o2iblnd_cb.c:2072:kiblnd_peer_connect_failed())
>>>>>>>>>>>> Deleting
>>>>>>>>>>>> > > messages for 172.23.55.212 at o2ib: connection failed
>>>>>>>>>>>> > > LNet: 3754:0:(o2iblnd_cb.c:475:kiblnd_rx_complete()) Rx
>>>>>>>>>>>> from
>>>>>>>>>>>> > > 172.23.55.212 at o2ib failed: 5
>>>>>>>>>>>> > > LNet: 3754:0:(o2iblnd_cb.c:475:kiblnd_rx_complete())
>>>>>>>>>>>> Skipped 17 previous
>>>>>>>>>>>> > > similar messages
>>>>>>>>>>>> > > LNet: 2882:0:(o2iblnd_cb.c:2072:kiblnd_peer_connect_failed())
>>>>>>>>>>>> Deleting
>>>>>>>>>>>> > > messages for 172.23.55.211 at o2ib: connection failed
>>>>>>>>>>>> > > LNet: 3754:0:(o2iblnd_cb.c:475:kiblnd_rx_complete()) Rx
>>>>>>>>>>>> from
>>>>>>>>>>>> > > 172.23.55.212 at o2ib failed: 5
>>>>>>>>>>>> > > LNet: 2882:0:(o2iblnd_cb.c:2072:kiblnd_peer_connect_failed())
>>>>>>>>>>>> Deleting
>>>>>>>>>>>> > > messages for 172.23.55.212 at o2ib: connection failed
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > LNET ping of a metadata-node:
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > [root at pg-gpu01 ~]# lctl ping 172.23.55.211 at o2ib
>>>>>>>>>>>> > > failed to ping 172.23.55.211 at o2ib: Input/output error
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > LNET ping of the number 2 metadata-node:
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > [root at pg-gpu01 ~]# lctl ping 172.23.55.212 at o2ib
>>>>>>>>>>>> > > failed to ping 172.23.55.212 at o2ib: Input/output error
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > LNET ping of a random compute-node:
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > [root at pg-gpu01 ~]# lctl ping 172.23.52.5 at o2ib
>>>>>>>>>>>> > > 12345-0 at lo
>>>>>>>>>>>> > > 12345-172.23.52.5 at o2ib
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > LNET to OST01:
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > [root at pg-gpu01 ~]# lctl ping 172.23.55.201 at o2ib
>>>>>>>>>>>> > > failed to ping 172.23.55.201 at o2ib: Input/output error
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > LNET to OST02:
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > [root at pg-gpu01 ~]# lctl ping 172.23.55.202 at o2ib
>>>>>>>>>>>> > > failed to ping 172.23.55.202 at o2ib: Input/output error
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > 'normal' pings (on ip level) works fine:
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > [root at pg-gpu01 ~]# ping 172.23.55.201
>>>>>>>>>>>> > > PING 172.23.55.201 (172.23.55.201) 56(84) bytes of data.
>>>>>>>>>>>> > > 64 bytes from 172.23.55.201: icmp_seq=1 ttl=64 time=0.741
>>>>>>>>>>>> ms
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > [root at pg-gpu01 ~]# ping 172.23.55.202
>>>>>>>>>>>> > > PING 172.23.55.202 (172.23.55.202) 56(84) bytes of data.
>>>>>>>>>>>> > > 64 bytes from 172.23.55.202: icmp_seq=1 ttl=64 time=0.704
>>>>>>>>>>>> ms
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > lctl on a rebooted node:
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > [root at pg-gpu01 ~]# lctl dl
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > lctl on a not rebooted node:
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > [root at pg-node005 ~]# lctl dl
>>>>>>>>>>>> > >   0 UP mgc MGC172.23.55.211 at o2ib
>>>>>>>>>>>> 94bd1c8a-512f-b920-9a4e-a6aced3d386d 5
>>>>>>>>>>>> > >   1 UP lov pgtemp01-clilov-ffff88206906d400
>>>>>>>>>>>> > > 281c441f-8aa3-ab56-8812-e459d308f47c 4
>>>>>>>>>>>> > >   2 UP lmv pgtemp01-clilmv-ffff88206906d400
>>>>>>>>>>>> > > 281c441f-8aa3-ab56-8812-e459d308f47c 4
>>>>>>>>>>>> > >   3 UP mdc pgtemp01-MDT0000-mdc-ffff88206906d400
>>>>>>>>>>>> > > 281c441f-8aa3-ab56-8812-e459d308f47c 5
>>>>>>>>>>>> > >   4 UP osc pgtemp01-OST0001-osc-ffff88206906d400
>>>>>>>>>>>> > > 281c441f-8aa3-ab56-8812-e459d308f47c 5
>>>>>>>>>>>> > >   5 UP osc pgtemp01-OST0003-osc-ffff88206906d400
>>>>>>>>>>>> > > 281c441f-8aa3-ab56-8812-e459d308f47c 5
>>>>>>>>>>>> > >   6 UP osc pgtemp01-OST0005-osc-ffff88206906d400
>>>>>>>>>>>> > > 281c441f-8aa3-ab56-8812-e459d308f47c 5
>>>>>>>>>>>> > >   7 UP osc pgtemp01-OST0007-osc-ffff88206906d400
>>>>>>>>>>>> > > 281c441f-8aa3-ab56-8812-e459d308f47c 5
>>>>>>>>>>>> > >   8 UP osc pgtemp01-OST0009-osc-ffff88206906d400
>>>>>>>>>>>> > > 281c441f-8aa3-ab56-8812-e459d308f47c 5
>>>>>>>>>>>> > >   9 UP osc pgtemp01-OST000b-osc-ffff88206906d400
>>>>>>>>>>>> > > 281c441f-8aa3-ab56-8812-e459d308f47c 5
>>>>>>>>>>>> > >  10 UP osc pgtemp01-OST000d-osc-ffff88206906d400
>>>>>>>>>>>> > > 281c441f-8aa3-ab56-8812-e459d308f47c 5
>>>>>>>>>>>> > >  11 UP osc pgtemp01-OST000f-osc-ffff88206906d400
>>>>>>>>>>>> > > 281c441f-8aa3-ab56-8812-e459d308f47c 5
>>>>>>>>>>>> > >  12 UP osc pgtemp01-OST0011-osc-ffff88206906d400
>>>>>>>>>>>> > > 281c441f-8aa3-ab56-8812-e459d308f47c 5
>>>>>>>>>>>> > >  13 UP osc pgtemp01-OST0002-osc-ffff88206906d400
>>>>>>>>>>>> > > 281c441f-8aa3-ab56-8812-e459d308f47c 5
>>>>>>>>>>>> > >  14 UP osc pgtemp01-OST0004-osc-ffff88206906d400
>>>>>>>>>>>> > > 281c441f-8aa3-ab56-8812-e459d308f47c 5
>>>>>>>>>>>> > >  15 UP osc pgtemp01-OST0006-osc-ffff88206906d400
>>>>>>>>>>>> > > 281c441f-8aa3-ab56-8812-e459d308f47c 5
>>>>>>>>>>>> > >  16 UP osc pgtemp01-OST0008-osc-ffff88206906d400
>>>>>>>>>>>> > > 281c441f-8aa3-ab56-8812-e459d308f47c 5
>>>>>>>>>>>> > >  17 UP osc pgtemp01-OST000a-osc-ffff88206906d400
>>>>>>>>>>>> > > 281c441f-8aa3-ab56-8812-e459d308f47c 5
>>>>>>>>>>>> > >  18 UP osc pgtemp01-OST000c-osc-ffff88206906d400
>>>>>>>>>>>> > > 281c441f-8aa3-ab56-8812-e459d308f47c 5
>>>>>>>>>>>> > >  19 UP osc pgtemp01-OST000e-osc-ffff88206906d400
>>>>>>>>>>>> > > 281c441f-8aa3-ab56-8812-e459d308f47c 5
>>>>>>>>>>>> > >  20 UP osc pgtemp01-OST0010-osc-ffff88206906d400
>>>>>>>>>>>> > > 281c441f-8aa3-ab56-8812-e459d308f47c 5
>>>>>>>>>>>> > >  21 UP osc pgtemp01-OST0012-osc-ffff88206906d400
>>>>>>>>>>>> > > 281c441f-8aa3-ab56-8812-e459d308f47c 5
>>>>>>>>>>>> > >  22 UP osc pgtemp01-OST0013-osc-ffff88206906d400
>>>>>>>>>>>> > > 281c441f-8aa3-ab56-8812-e459d308f47c 5
>>>>>>>>>>>> > >  23 UP osc pgtemp01-OST0015-osc-ffff88206906d400
>>>>>>>>>>>> > > 281c441f-8aa3-ab56-8812-e459d308f47c 5
>>>>>>>>>>>> > >  24 UP osc pgtemp01-OST0017-osc-ffff88206906d400
>>>>>>>>>>>> > > 281c441f-8aa3-ab56-8812-e459d308f47c 5
>>>>>>>>>>>> > >  25 UP osc pgtemp01-OST0014-osc-ffff88206906d400
>>>>>>>>>>>> > > 281c441f-8aa3-ab56-8812-e459d308f47c 5
>>>>>>>>>>>> > >  26 UP osc pgtemp01-OST0016-osc-ffff88206906d400
>>>>>>>>>>>> > > 281c441f-8aa3-ab56-8812-e459d308f47c 5
>>>>>>>>>>>> > >  27 UP osc pgtemp01-OST0018-osc-ffff88206906d400
>>>>>>>>>>>> > > 281c441f-8aa3-ab56-8812-e459d308f47c 5
>>>>>>>>>>>> > >  28 UP lov pgdata01-clilov-ffff88204bab6400
>>>>>>>>>>>> > > 996b1742-82eb-281c-c322-e244672d5225 4
>>>>>>>>>>>> > >  29 UP lmv pgdata01-clilmv-ffff88204bab6400
>>>>>>>>>>>> > > 996b1742-82eb-281c-c322-e244672d5225 4
>>>>>>>>>>>> > >  30 UP mdc pgdata01-MDT0000-mdc-ffff88204bab6400
>>>>>>>>>>>> > > 996b1742-82eb-281c-c322-e244672d5225 5
>>>>>>>>>>>> > >  31 UP osc pgdata01-OST0001-osc-ffff88204bab6400
>>>>>>>>>>>> > > 996b1742-82eb-281c-c322-e244672d5225 5
>>>>>>>>>>>> > >  32 UP osc pgdata01-OST0003-osc-ffff88204bab6400
>>>>>>>>>>>> > > 996b1742-82eb-281c-c322-e244672d5225 5
>>>>>>>>>>>> > >  33 UP osc pgdata01-OST0005-osc-ffff88204bab6400
>>>>>>>>>>>> > > 996b1742-82eb-281c-c322-e244672d5225 5
>>>>>>>>>>>> > >  34 UP osc pgdata01-OST0007-osc-ffff88204bab6400
>>>>>>>>>>>> > > 996b1742-82eb-281c-c322-e244672d5225 5
>>>>>>>>>>>> > >  35 UP osc pgdata01-OST0009-osc-ffff88204bab6400
>>>>>>>>>>>> > > 996b1742-82eb-281c-c322-e244672d5225 5
>>>>>>>>>>>> > >  36 UP osc pgdata01-OST000b-osc-ffff88204bab6400
>>>>>>>>>>>> > > 996b1742-82eb-281c-c322-e244672d5225 5
>>>>>>>>>>>> > >  37 UP osc pgdata01-OST000d-osc-ffff88204bab6400
>>>>>>>>>>>> > > 996b1742-82eb-281c-c322-e244672d5225 5
>>>>>>>>>>>> > >  38 UP osc pgdata01-OST000f-osc-ffff88204bab6400
>>>>>>>>>>>> > > 996b1742-82eb-281c-c322-e244672d5225 5
>>>>>>>>>>>> > >  39 UP osc pgdata01-OST0002-osc-ffff88204bab6400
>>>>>>>>>>>> > > 996b1742-82eb-281c-c322-e244672d5225 5
>>>>>>>>>>>> > >  40 UP osc pgdata01-OST0004-osc-ffff88204bab6400
>>>>>>>>>>>> > > 996b1742-82eb-281c-c322-e244672d5225 5
>>>>>>>>>>>> > >  41 UP osc pgdata01-OST0006-osc-ffff88204bab6400
>>>>>>>>>>>> > > 996b1742-82eb-281c-c322-e244672d5225 5
>>>>>>>>>>>> > >  42 UP osc pgdata01-OST0008-osc-ffff88204bab6400
>>>>>>>>>>>> > > 996b1742-82eb-281c-c322-e244672d5225 5
>>>>>>>>>>>> > >  43 UP osc pgdata01-OST000a-osc-ffff88204bab6400
>>>>>>>>>>>> > > 996b1742-82eb-281c-c322-e244672d5225 5
>>>>>>>>>>>> > >  44 UP osc pgdata01-OST000c-osc-ffff88204bab6400
>>>>>>>>>>>> > > 996b1742-82eb-281c-c322-e244672d5225 5
>>>>>>>>>>>> > >  45 UP osc pgdata01-OST000e-osc-ffff88204bab6400
>>>>>>>>>>>> > > 996b1742-82eb-281c-c322-e244672d5225 5
>>>>>>>>>>>> > >  46 UP osc pgdata01-OST0010-osc-ffff88204bab6400
>>>>>>>>>>>> > > 996b1742-82eb-281c-c322-e244672d5225 5
>>>>>>>>>>>> > >  47 UP osc pgdata01-OST0013-osc-ffff88204bab6400
>>>>>>>>>>>> > > 996b1742-82eb-281c-c322-e244672d5225 5
>>>>>>>>>>>> > >  48 UP osc pgdata01-OST0015-osc-ffff88204bab6400
>>>>>>>>>>>> > > 996b1742-82eb-281c-c322-e244672d5225 5
>>>>>>>>>>>> > >  49 UP osc pgdata01-OST0017-osc-ffff88204bab6400
>>>>>>>>>>>> > > 996b1742-82eb-281c-c322-e244672d5225 5
>>>>>>>>>>>> > >  50 UP osc pgdata01-OST0014-osc-ffff88204bab6400
>>>>>>>>>>>> > > 996b1742-82eb-281c-c322-e244672d5225 5
>>>>>>>>>>>> > >  51 UP osc pgdata01-OST0016-osc-ffff88204bab6400
>>>>>>>>>>>> > > 996b1742-82eb-281c-c322-e244672d5225 5
>>>>>>>>>>>> > >  52 UP osc pgdata01-OST0018-osc-ffff88204bab6400
>>>>>>>>>>>> > > 996b1742-82eb-281c-c322-e244672d5225 5
>>>>>>>>>>>> > >  53 UP osc pgdata01-OST0019-osc-ffff88204bab6400
>>>>>>>>>>>> > > 996b1742-82eb-281c-c322-e244672d5225 5
>>>>>>>>>>>> > >  54 UP osc pgdata01-OST001a-osc-ffff88204bab6400
>>>>>>>>>>>> > > 996b1742-82eb-281c-c322-e244672d5225 5
>>>>>>>>>>>> > >  55 UP osc pgdata01-OST001b-osc-ffff88204bab6400
>>>>>>>>>>>> > > 996b1742-82eb-281c-c322-e244672d5225 5
>>>>>>>>>>>> > >  56 UP lov pghome01-clilov-ffff88204bb50000
>>>>>>>>>>>> > > 9ae8f2a9-1cdf-901f-160c-66f70e4c10d1 4
>>>>>>>>>>>> > >  57 UP lmv pghome01-clilmv-ffff88204bb50000
>>>>>>>>>>>> > > 9ae8f2a9-1cdf-901f-160c-66f70e4c10d1 4
>>>>>>>>>>>> > >  58 UP mdc pghome01-MDT0000-mdc-ffff88204bb50000
>>>>>>>>>>>> > > 9ae8f2a9-1cdf-901f-160c-66f70e4c10d1 5
>>>>>>>>>>>> > >  59 UP osc pghome01-OST0011-osc-ffff88204bb50000
>>>>>>>>>>>> > > 9ae8f2a9-1cdf-901f-160c-66f70e4c10d1 5
>>>>>>>>>>>> > >  60 UP osc pghome01-OST0012-osc-ffff88204bb50000
>>>>>>>>>>>> > > 9ae8f2a9-1cdf-901f-160c-66f70e4c10d1 5
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > Please help, any clues/advice/hints/tips are appricated
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > --
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > Vriendelijke groet,
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > Ger Strikwerda
>>>>>>>>>>>> > > Chef Special
>>>>>>>>>>>> > > Rijksuniversiteit Groningen
>>>>>>>>>>>> > > Centrum voor Informatie Technologie
>>>>>>>>>>>> > > Unit Pragmatisch Systeembeheer
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > Smitsborg
>>>>>>>>>>>> > > Nettelbosje 1
>>>>>>>>>>>> > > 9747 AJ Groningen
>>>>>>>>>>>> > > Tel. 050 363 9276
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > "God is hard, God is fair
>>>>>>>>>>>> > >  some men he gave brains, others he gave hair"
>>>>>>>>>>>> > >
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > _______________________________________________
>>>>>>>>>>>> > > lustre-discuss mailing list
>>>>>>>>>>>> > > lustre-discuss at lists.lustre.org
>>>>>>>>>>>> > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.
>>>>>>>>>>>> org
>>>>>>>>>>>> > >
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> > --
>>>>>>>>>>>> > Vriendelijke groet,
>>>>>>>>>>>> >
>>>>>>>>>>>> > Ger Strikwerda
>>>>>>>>>>>> >
>>>>>>>>>>>> > Chef Special
>>>>>>>>>>>> > Rijksuniversiteit Groningen
>>>>>>>>>>>> > Centrum voor Informatie Technologie
>>>>>>>>>>>> > Unit Pragmatisch Systeembeheer
>>>>>>>>>>>> >
>>>>>>>>>>>> > Smitsborg
>>>>>>>>>>>> > Nettelbosje 1
>>>>>>>>>>>> > 9747 AJ Groningen
>>>>>>>>>>>> > Tel. 050 363 9276
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> > "God is hard, God is fair
>>>>>>>>>>>> >  some men he gave brains, others he gave hair"
>>>>>>>>>>>> > _______________________________________________
>>>>>>>>>>>> > lustre-discuss mailing list
>>>>>>>>>>>> > lustre-discuss at lists.lustre.org
>>>>>>>>>>>> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.
>>>>>>>>>>>> org
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> Vriendelijke groet,
>>>>>>>>>>>
>>>>>>>>>>> Ger StrikwerdaChef Special
>>>>>>>>>>> Rijksuniversiteit Groningen
>>>>>>>>>>> Centrum voor Informatie Technologie
>>>>>>>>>>> Unit Pragmatisch Systeembeheer
>>>>>>>>>>>
>>>>>>>>>>> Smitsborg
>>>>>>>>>>> Nettelbosje 1
>>>>>>>>>>> 9747 AJ Groningen
>>>>>>>>>>> Tel. 050 363 9276
>>>>>>>>>>> "God is hard, God is fair
>>>>>>>>>>>  some men he gave brains, others he gave hair"
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> lustre-discuss mailing list
>>>>>>>>>>> lustre-discuss at lists.lustre.org
>>>>>>>>>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> Vriendelijke groet,
>>>>>>>>>
>>>>>>>>> Ger StrikwerdaChef Special
>>>>>>>>> Rijksuniversiteit Groningen
>>>>>>>>> Centrum voor Informatie Technologie
>>>>>>>>> Unit Pragmatisch Systeembeheer
>>>>>>>>>
>>>>>>>>> Smitsborg
>>>>>>>>> Nettelbosje 1
>>>>>>>>> 9747 AJ Groningen
>>>>>>>>> Tel. 050 363 9276
>>>>>>>>> "God is hard, God is fair
>>>>>>>>>  some men he gave brains, others he gave hair"
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> lustre-discuss mailing list
>>>>>>>>> lustre-discuss at lists.lustre.org
>>>>>>>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Vriendelijke groet,
>>>>>>>
>>>>>>> Ger StrikwerdaChef Special
>>>>>>> Rijksuniversiteit Groningen
>>>>>>> Centrum voor Informatie Technologie
>>>>>>> Unit Pragmatisch Systeembeheer
>>>>>>>
>>>>>>> Smitsborg
>>>>>>> Nettelbosje 1
>>>>>>> 9747 AJ Groningen
>>>>>>> Tel. 050 363 9276
>>>>>>> "God is hard, God is fair
>>>>>>>  some men he gave brains, others he gave hair"
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Vriendelijke groet,
>>>>>
>>>>> Ger StrikwerdaChef Special
>>>>> Rijksuniversiteit Groningen
>>>>> Centrum voor Informatie Technologie
>>>>> Unit Pragmatisch Systeembeheer
>>>>>
>>>>> Smitsborg
>>>>> Nettelbosje 1
>>>>> 9747 AJ Groningen
>>>>> Tel. 050 363 9276
>>>>> "God is hard, God is fair
>>>>>  some men he gave brains, others he gave hair"
>>>>>
>>>>>
>>>
>>>
>>> --
>>>
>>> Vriendelijke groet,
>>>
>>> Ger StrikwerdaChef Special
>>> Rijksuniversiteit Groningen
>>> Centrum voor Informatie Technologie
>>> Unit Pragmatisch Systeembeheer
>>>
>>> Smitsborg
>>> Nettelbosje 1
>>> 9747 AJ Groningen
>>> Tel. 050 363 9276
>>> "God is hard, God is fair
>>>  some men he gave brains, others he gave hair"
>>>
>>>
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
>>
>


-- 

Vriendelijke groet,

Ger StrikwerdaChef Special
Rijksuniversiteit Groningen
Centrum voor Informatie Technologie
Unit Pragmatisch Systeembeheer

Smitsborg
Nettelbosje 1
9747 AJ Groningen
Tel. 050 363 9276
"God is hard, God is fair
 some men he gave brains, others he gave hair"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20170425/5f35cef1/attachment-0001.htm>


More information about the lustre-discuss mailing list