[lustre-devel] Lustre Arm stuff status and work plan

Sun Feb 27 21:36:22 PST 2022

Hello!

  the sizing really depends on your test scaling requirements.
  For example my own test infrastructure is a couple builders + 4 nodes for VMs (each has 256G RAM), 160 VM pairs in total,
  and on a particularly busy day another 80 VM pairs can be added. This is to ensure speedy feedback to developers.
  You can operate a much smaller scale testing system if you want, just keep in mind what is the longest running test would take
  to understand how many patches could be tested in parallel (sometimes patch bombs result in 20+ patches submission at the same time).
   Here’s stats for last 30 days. https://imgur.com/lk2ogJv 1 item means single patch n processing. time in testing for a patch is typically about 3.5 hours.

maloo shows the resources when you go into the test session, for example https://testing.whamcloud.com/test_sessions/4de25b47-43fc-4bfc-87aa-15e4968519a7 - scroll down to see list of nodes

On Feb 18, 2022, at 3:05 AM, Kevin Zhao <kevin.zhao at linaro.org<mailto:kevin.zhao at linaro.org>> wrote:

Hi All,

Greetings and thanks a lot for your comments! Xinliang and I are from Linaro<https://www.linaro.org/>, an organization focusing on Arm open-source ecosystem development. We have been working on Lustre on the Arm64 server and client end for a while now, already fixing a few bugs on arm64.
As Xinliang said before, we want to enable the Arm64 CI, Oleg advises that we can plug our own CI nodes into the Jenkins. Now we want to understand and estimate how many machines resources can meet our requests, and doing the next stage plan of our hardware to meet the Lustre test requirements.

As I understand, the test jobs will cover the ZFS and Ldiskfs backend with 2 scenarios:

  *
Lustre Arm64 Server + Arm64 Client( High Priority )
  *
Lustre Arm64 Server + x86_64 Client

After going through the Lustre test website: https://testing.whamcloud.com/test_sessions, it is quite clear to show the test info, and still remain some questions, that will be great if the community can give me a clear answer.
1. Is there a link to show all the machine resources？ Including the machine info, CPU, memory and peripheral info.
2. Do we have a CI infra arch overview diagram to show the machine usage and communication?
3. How many machines are needed to meet the request of the Lustre Arm64 Server + Arm64 Client test?

Thanks a lot for your time, and look forward to your response.

On Tue, 28 Dec 2021 at 09:58, Oleg Drokin <green at whamcloud.com<mailto:green at whamcloud.com>> wrote:

On Dec 27, 2021, at 8:53 PM, Xinliang Liu <xinliang.liu at linaro.org<mailto:xinliang.liu at linaro.org>> wrote:

Maloo is just one place to link to to actually let people see the results, but you can link to external resources too
like e.g. gatekeeper janitor helper does or assuming the information is small enough it could be entirely contained
in the comment (like say for a build failure)

Ok, understand now. Is there any other reference external CI that posts results to Lustre gerrit now?

Currently there are:
- checkpatch and Misc code checks (smach) that post their results as 100% comment only. they share codebase pretty much
- the Janitor (also started with above codebase but got changed and extended a lot)

There was external interest in the past to post results to gerrit but it never materialized in the end

--
Best Regards
Kevin Zhao
Tech Lead, LDCG Cloud Infrastructure
Linaro Vertical Technologies
IRC(freenode): kevinz
Slack(kubernetes.slack.com<http://kubernetes.slack.com/>): kevinz
kevin.zhao at linaro.org<mailto:kevin.zhao at linaro.org> | Mobile/Direct/Wechat:  +86 18818270915

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20220228/5a38da1b/attachment-0001.html>