[lustre-discuss] Suspended jobs and rebooting lustre servers

Raj Ayyampalayam ansraj at gmail.com
Thu Feb 21 08:13:30 PST 2019


Our upgrade strategy is as follows:

1) Load all disks into the storage array.
2) Create RAID pools and virtual disks.
3) Create lustre file system using mkfs.lustre command. (I still have to
figure out all the parameters used on the existing OSTs).
4) Create mount points on all OSSs.
5) Mount the lustre OSTs.
6) Maybe rebalance the filesystem.
My understanding is that the above can be done without bringing the
filesystem down. I want to create the HA configuration (corosync and
pacemaker) for the new OSTs. This step requires the filesystem to be down.
I want to know what would happen to the suspended processes across the
cluster when I bring the filesystem down to re-generate the HA configs.

Thanks,
-Raj

On Thu, Feb 21, 2019 at 12:59 AM Colin Faber <cfaber at gmail.com> wrote:

> Can you provide more details on your upgrade strategy? In some cases
> expanding your storage shouldn't impact client / job activity at all.
>
> On Wed, Feb 20, 2019, 11:09 AM Raj Ayyampalayam <ansraj at gmail.com> wrote:
>
>> Hello,
>>
>> We are planning on expanding our storage by adding more OSTs to our
>> lustre file system. It looks like it would be easier to expand if we bring
>> the filesystem down and perform the necessary operations. We are planning
>> to suspend all the jobs running on the cluster. We originally planned to
>> add new OSTs to the live filesystem.
>>
>> We are trying to determine the potential impact to the suspended jobs if
>> we bring down the filesystem for the upgrade.
>> One of the questions we have is what would happen to the suspended
>> processes that hold an open file handle in the lustre file system when the
>> filesystem is brought down for the upgrade?
>> Will they recover from the client eviction?
>>
>> We do have vendor support and have engaged them. I wanted to ask the
>> community and get some feedback.
>>
>> Thanks,
>> -Raj
>>
> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20190221/17b8e70d/attachment.html>


More information about the lustre-discuss mailing list