[Lustre-discuss] Problem after upgrade to 1.6.5

Enrico Morelli morelli at cerm.unifi.it
Mon Jul 7 03:27:10 PDT 2008


Dear all,

I've a problem after the upgrade from 1.6.4.1 to 1.6.5.

I've four OSTs to create a /fastfs lustre filesystem. In each OST I
have the following fstab:

/dev/vg/fastfs_ost   /fastfs_ost   lustre defaults,_netdev        0 0
lustre-server at tcp0:/fastfs   /fastfs   lustre _netdev,defaults    0 0

On the lustre-server I have:
/dev/data_se/fastfs_mdt    /fastfs_mdt  lustre  defaults,_netdev  0 0

On one OST (192.168.100.101) I have the following error:
Lustre: Client fastfs-client has started
Lustre: Request x686 sent from fastfs-OST0000-osc-c5b6cc00 to NID 0 at lo
5s ago has timed out (limit 5s). 
Lustre: Skipped 62 previous similar messages

Infact on the other I obtain:
Lustre: Client fastfs-client has started
Lustre: Request x463 sent from fastfs-OST0000-osc-f7d1f800 to NID
192.168.100.101 at tcp 5s ago has timed out (limit 5s). 
Lustre: Skipped 32 previous similar messages


But on the lustre-server I have two OST that seems to be dead and one
in timeout:
Lustre: Client fastfs-client has started
Lustre: fastfs-MDT0000: haven't heard from client
d7fd9368-3f2b-7625-9c48-3de83b5c4cd3 (at 192.168.100.103 at tcp) in 231
seconds. I think it's dead, and I am evicting it. 
Lustre: fastfs-MDT0000: haven't heard from client
42c0e2c4-0844-8b8b-69b2-9c16ff0ba043 (at 192.168.100.100 at tcp) in 229
seconds. I think it's dead, and I am evicting it.
Lustre: Request x2950836 sent from fastfs-OST0000-osc to NID
192.168.100.101 at tcp 50s ago has timed out (limit 50s). 
Lustre: Skipped 65 previous similar messages

On all machine I've installed the following rpms:
lustre-ldiskfs-3.0.4-2.6.9_67.0.7.EL_lustre.1.6.5smp
kernel-lustre-smp-2.6.9-67.0.7.EL_lustre.1.6.5
lustre-1.6.5-2.6.9_67.0.7.EL_lustre.1.6.5smp
lustre-modules-1.6.5-2.6.9_67.0.7.EL_lustre.1.6.5smp

On each node I have the following active modules:

lustre                644716  2 
lov                   414696  3 lustre
mdc                   144900  3 lustre
lquota                212116  3 
osc                   224680  6 lustre
ksocklnd              138984  1 
ptlrpc                970676  6 mgc,lustre,lov,mdc,lquota,osc
obdclass              677464  9 mgc,lustre,lov,mdc,lquota,osc,ptlrpc
lnet                  267292  4 lustre,ksocklnd,ptlrpc,obdclass
lvfs                   90360  8
mgc,lustre,lov,mdc,lquota,osc,ptlrpc,obdclass libcfs
132044  11
mgc,lustre,lov,mdc,lquota,osc,ksocklnd,ptlrpc,obdclass,lnet,lvfs

With 1.6.4.1 all works fine, where I can check to solve the problem?

Thanks
-- 
-------------------------------------------------------------------
       (o_
(o_    //\  Coltivate Linux che tanto Windows si pianta da solo.
(/)_   V_/_
+------------------------------------------------------------------+
|     ENRICO MORELLI         |  email: morelli at CERM.UNIFI.IT       |
| *     *       *       *    |  phone: +39 055 4574269             |
|  University of Florence    |  fax  : +39 055 4574253             |
|  CERM - via Sacconi, 6 -  50019 Sesto Fiorentino (FI) - ITALY    |
+------------------------------------------------------------------+



More information about the lustre-discuss mailing list