[lustre-discuss] Suspended jobs and rebooting lustre servers

Gin Tan gin.tan at monash.edu
Wed Feb 20 14:22:02 PST 2019


Hi Raj,

You can add the OSTs online, we have been doing it but if you are expanding
the storage array, you might want to think about what is involved such as
cabling etc depends on the recommendation from your storage vendor. We
added an expansion on every Dell storage array last year and because of the
physical location of these storage, we needed to do a full shutdown. It
means we created a maintenance reservation and performed a full filesystem
shutdown.

In many occasions when we perform Lustre maintenance, we have suspended
jobs but that was when we know the filesystem will stay online, some
clients might get evicted during the failover but they will reconnect when
jobs were resumed.

In your case, if you want to do a full filesystem shutdown, you will have
to unmount all the Lustre clients, it means the jobs will need to be killed
in order to unmount the filesystem. We always use cat /proc/sys/lnet/peers
or lshowmount to make sure that there are no other clients connected before
doing the full FS shut down.

Hope it helps.

-- 

*Gin Tan*
MASSIVE support and consulting services

*Monash eResearch Centre*
Monash University

15 Innovation Walk
Ground Floor, Room G22
Clayton Campus
Wellington Road
Clayton VIC 3800
Australia

T: +61 3 9902 0245
E: gin.tan at monash.edu
Z: https://monash.zoom.us/my/gintan
www.monash.edu.au/eresearch


>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20190221/35ebdcb0/attachment.html>


More information about the lustre-discuss mailing list