[lustre-devel] Lustre Arm stuff status and work plan

Tue Mar 29 05:51:28 PDT 2022

Please find attached the content from the refenced wiki page

From: Kevin Zhao <kevin.zhao at linaro.org>
Date: Tuesday, March 29, 2022 at 12:38 AM
To: Oleg Drokin <green at whamcloud.com>
Cc: Minh Diep <mdiep at whamcloud.com>, Xinliang Liu <xinliang.liu at linaro.org>, Peter Jones <pjones at whamcloud.com>, Xinliang Liu via lustre-devel <lustre-devel at lists.lustre.org>, "cloud-dev-request at op-lists.linaro.org" <cloud-dev-request at op-lists.linaro.org>, Jian Yu <jiyu at whamcloud.com>, Andreas Dilger <adilger at whamcloud.com>, "jsimmons at infradead.org" <jsimmons at infradead.org>, Li Xi <lixi at ddn.com>
Subject: Re: Lustre Arm stuff status and work plan

Hi Oleg,

We are now defining the test process and setting up a trial test cluster as the external Arm64 test resources.
Can I have an account to post consequences to Maloo DB? The trial arm64 external test cluster is under setup and hopefully, it will be finished quite soon. I want to test if we can post data to Maloo DB. Btw, the doc you point before: https://wiki.whamcloud.com/display/TEI/Test+results+format can not access, just 404.

On Mon, 28 Feb 2022 at 13:36, Oleg Drokin <green at whamcloud.com<mailto:green at whamcloud.com>> wrote:
Hello!

  the sizing really depends on your test scaling requirements.
  For example my own test infrastructure is a couple builders + 4 nodes for VMs (each has 256G RAM), 160 VM pairs in total,
  and on a particularly busy day another 80 VM pairs can be added. This is to ensure speedy feedback to developers.
  You can operate a much smaller scale testing system if you want, just keep in mind what is the longest running test would take
  to understand how many patches could be tested in parallel (sometimes patch bombs result in 20+ patches submission at the same time).
   Here’s stats for last 30 days. https://imgur.com/lk2ogJv 1 item means single patch n processing. time in testing for a patch is typically about 3.5 hours.

maloo shows the resources when you go into the test session, for example https://testing.whamcloud.com/test_sessions/4de25b47-43fc-4bfc-87aa-15e4968519a7 - scroll down to see list of nodes

On Feb 18, 2022, at 3:05 AM, Kevin Zhao <kevin.zhao at linaro.org<mailto:kevin.zhao at linaro.org>> wrote:

Hi All,

Greetings and thanks a lot for your comments! Xinliang and I are from Linaro<https://www.linaro.org/>, an organization focusing on Arm open-source ecosystem development. We have been working on Lustre on the Arm64 server and client end for a while now, already fixing a few bugs on arm64.
As Xinliang said before, we want to enable the Arm64 CI, Oleg advises that we can plug our own CI nodes into the Jenkins. Now we want to understand and estimate how many machines resources can meet our requests, and doing the next stage plan of our hardware to meet the Lustre test requirements.

As I understand, the test jobs will cover the ZFS and Ldiskfs backend with 2 scenarios:

  *   Lustre Arm64 Server + Arm64 Client( High Priority )

  *   Lustre Arm64 Server + x86_64 Client
After going through the Lustre test website: https://testing.whamcloud.com/test_sessions, it is quite clear to show the test info, and still remain some questions, that will be great if the community can give me a clear answer.
1. Is there a link to show all the machine resources？ Including the machine info, CPU, memory and peripheral info.
2. Do we have a CI infra arch overview diagram to show the machine usage and communication?
3. How many machines are needed to meet the request of the Lustre Arm64 Server + Arm64 Client test?

Thanks a lot for your time, and look forward to your response.

On Tue, 28 Dec 2021 at 09:58, Oleg Drokin <green at whamcloud.com<mailto:green at whamcloud.com>> wrote:

On Dec 27, 2021, at 8:53 PM, Xinliang Liu <xinliang.liu at linaro.org<mailto:xinliang.liu at linaro.org>> wrote:

Maloo is just one place to link to to actually let people see the results, but you can link to external resources too
like e.g. gatekeeper janitor helper does or assuming the information is small enough it could be entirely contained
in the comment (like say for a build failure)

Ok, understand now. Is there any other reference external CI that posts results to Lustre gerrit now?

Currently there are:
- checkpatch and Misc code checks (smach) that post their results as 100% comment only. they share codebase pretty much
- the Janitor (also started with above codebase but got changed and extended a lot)

There was external interest in the past to post results to gerrit but it never materialized in the end

--
Best Regards
Kevin Zhao
Tech Lead, LDCG Cloud Infrastructure
Linaro Vertical Technologies
IRC(freenode): kevinz
Slack(kubernetes.slack.com<http://kubernetes.slack.com/>): kevinz
kevin.zhao at linaro.org<mailto:kevin.zhao at linaro.org> | Mobile/Direct/Wechat:  +86 18818270915

--
Best Regards
Kevin Zhao
Tech Lead, LDCG Cloud Infrastructure
Linaro Vertical Technologies
IRC(freenode): kevinz
Slack(kubernetes.slack.com<http://kubernetes.slack.com>): kevinz
kevin.zhao at linaro.org<mailto:kevin.zhao at linaro.org> | Mobile/Direct/Wechat:  +86 18818270915

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20220329/180ebb46/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Test results format - DataCenter Operations - Whamcloud Community Wiki.pdf
Type: application/pdf
Size: 113184 bytes
Desc: Test results format - DataCenter Operations - Whamcloud Community Wiki.pdf
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20220329/180ebb46/attachment-0001.pdf>