[lustre-devel] [EXTERNAL] Re: Lustre Arm stuff status and work plan

Tue Mar 15 15:09:00 PDT 2022

Hello.

    I have been watching your efforts to doing your own testing and this is something ORNL has been interested in as well.
I was thinking would you be willing to do a joint talk at LUG on this effort. We can pool our knowledge on how to doing
local testing and feeding it back to WC. Would you be interested?
________________________________
From: lustre-devel <lustre-devel-bounces at lists.lustre.org> on behalf of Kevin Zhao via lustre-devel <lustre-devel at lists.lustre.org>
Sent: Friday, March 11, 2022 1:28 AM
To: Oleg Drokin <green at whamcloud.com>
Cc: Li Xi <lixi at ddn.com>; Jian Yu <jiyu at whamcloud.com>; cloud-dev-request at op-lists.linaro.org <cloud-dev-request at op-lists.linaro.org>; Xinliang Liu via lustre-devel <lustre-devel at lists.lustre.org>
Subject: [EXTERNAL] Re: [lustre-devel] Lustre Arm stuff status and work plan

Thanks Oleg,

I will update the progress for the test clusters setup on Arm64 platform.

On Mon, 28 Feb 2022 at 13:36, Oleg Drokin <green at whamcloud.com<mailto:green at whamcloud.com>> wrote:
Hello!

  the sizing really depends on your test scaling requirements.
  For example my own test infrastructure is a couple builders + 4 nodes for VMs (each has 256G RAM), 160 VM pairs in total,
  and on a particularly busy day another 80 VM pairs can be added. This is to ensure speedy feedback to developers.
  You can operate a much smaller scale testing system if you want, just keep in mind what is the longest running test would take
  to understand how many patches could be tested in parallel (sometimes patch bombs result in 20+ patches submission at the same time).
   Here’s stats for last 30 days. hxxps://imgur.com/lk2ogJv 1 item means single patch n processing. time in testing for a patch is typically about 3.5 hours.

maloo shows the resources when you go into the test session, for example hxxps://testing.whamcloud.com/test_sessions/4de25b47-43fc-4bfc-87aa-15e4968519a7 - scroll down to see list of nodes

On Feb 18, 2022, at 3:05 AM, Kevin Zhao <kevin.zhao at linaro.org<mailto:kevin.zhao at linaro.org>> wrote:

Hi All,

Greetings and thanks a lot for your comments! Xinliang and I are from Linaro, an organization focusing on Arm open-source ecosystem development. We have been working on Lustre on the Arm64 server and client end for a while now, already fixing a few bugs on arm64.
As Xinliang said before, we want to enable the Arm64 CI, Oleg advises that we can plug our own CI nodes into the Jenkins. Now we want to understand and estimate how many machines resources can meet our requests, and doing the next stage plan of our hardware to meet the Lustre test requirements.

As I understand, the test jobs will cover the ZFS and Ldiskfs backend with 2 scenarios:

  *
Lustre Arm64 Server + Arm64 Client( High Priority )
  *
Lustre Arm64 Server + x86_64 Client

After going through the Lustre test website: hxxps://testing.whamcloud.com/test_sessions, it is quite clear to show the test info, and still remain some questions, that will be great if the community can give me a clear answer.
1. Is there a link to show all the machine resources？ Including the machine info, CPU, memory and peripheral info.
2. Do we have a CI infra arch overview diagram to show the machine usage and communication?
3. How many machines are needed to meet the request of the Lustre Arm64 Server + Arm64 Client test?

Thanks a lot for your time, and look forward to your response.

On Tue, 28 Dec 2021 at 09:58, Oleg Drokin <green at whamcloud.com<mailto:green at whamcloud.com>> wrote:

On Dec 27, 2021, at 8:53 PM, Xinliang Liu <xinliang.liu at linaro.org<mailto:xinliang.liu at linaro.org>> wrote:

Maloo is just one place to link to to actually let people see the results, but you can link to external resources too
like e.g. gatekeeper janitor helper does or assuming the information is small enough it could be entirely contained
in the comment (like say for a build failure)

Ok, understand now. Is there any other reference external CI that posts results to Lustre gerrit now?

Currently there are:
- checkpatch and Misc code checks (smach) that post their results as 100% comment only. they share codebase pretty much
- the Janitor (also started with above codebase but got changed and extended a lot)

There was external interest in the past to post results to gerrit but it never materialized in the end

--
Best Regards

Kevin Zhao

Tech Lead, LDCG Cloud Infrastructure

Linaro Vertical Technologies

IRC(freenode): kevinz

Slack(kubernetes.slack.com): kevinz

kevin.zhao at linaro.org<mailto:kevin.zhao at linaro.org> | Mobile/Direct/Wechat:  +86 18818270915

--
Best Regards

Kevin Zhao

Tech Lead, LDCG Cloud Infrastructure

Linaro Vertical Technologies

IRC(freenode): kevinz

Slack(kubernetes.slack.com): kevinz

kevin.zhao at linaro.org<mailto:kevin.zhao at linaro.org> | Mobile/Direct/Wechat:  +86 18818270915

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20220315/db8dc0c5/attachment-0001.html>