[Lustre-discuss] 1.8.1(-ish) client vs. 1.6.7.2 server

Robin Humble robin.humble+lustre at anu.edu.au
Wed Jul 15 09:53:10 PDT 2009


On Wed, Jul 15, 2009 at 11:59:54AM -0400, Brian J. Murrell wrote:
>On Wed, 2009-07-15 at 11:22 -0400, Robin Humble wrote:
>> another data point is that the above errors don't happen with
>> 2.6.18-128.1.14.el5 patched with 1.8.0.1 and using the same in-kernel
>> OFED, so it's probably something that's happened between 1.8.0.1 and
>> 1.8.1-pre.
>> or I guess it could be a rhel change between 2.6.18-128.1.14.el5 and
>> 2.6.18-128.1.16.el5, but that seems less likely.
>> I can spin up a 2.6.18-128.1.14.el5 with b_release_1_8_1 if you like...
>Yeah, it would be a great troubleshooting addition to see if the same
>kernel on the clients and servers with the different lustre versions has
>the same problem.  This would isolate the problem either to or away from
>a problem with the difference in OFED stacks.

ok - I made a 2.6.18-128.1.14.el5 with b_release_1_8_1 and it behaves
the same as 2.6.18-128.1.16.el5 with b_release_1_8_1. ie. spits out a
bunch of errors on the first lustre mount.

the only changes between those rhel .14 and .16 versions looks pretty
unrelated to IB/lnet, so I guess that was to be expected:
  * Sat Jun 27 2009 Jiri Pirko <jpirko at redhat.com> [2.6.18-128.1.16.el5]
  - [mm] prevent panic in copy_hugetlb_page_range (Larry Woodman ) [508030 507860]
  
  * Tue Jun 23 2009 Jiri Pirko <jpirko at redhat.com> [2.6.18-128.1.15.el5]
  - [mm] fix swap race condition in fork-gup-race patch (Andrea Arcangeli) [507297 506684]

so I guess the change is between Lustre 1.8.0.1 and
b_release_1_8_1-20090712131220 somewhere.
if only we had git bisect, and if only I knew how to use it, and if only
I had the time to try it... :-)

cheers,
robin
--
Dr Robin Humble, HPC Systems Analyst, NCI National Facility



More information about the lustre-discuss mailing list