[Lustre-discuss] simulations

Mag Gam magawake at gmail.com
Thu Aug 7 21:50:07 PDT 2008


CliffW:

This helps out a lot!

We still have problems determining devices. We don't know what their
numbers are (I been using lctl dl), but I don't know how to activate
or deactivate them.


Do you have an example?


TIA

On Thu, Aug 7, 2008 at 10:59 AM, Cliff White <Cliff.White at sun.com> wrote:
> Mag Gam wrote:
>>
>> We do a lot of fluid simulations at my university, but on a similar
>> note I would like to know what the Lustre experts will do in
>> particular simulated scenarios...
>>
>> The environment is this:
>> 30 Servers (All Linux)
>> 1000+ Clients (All Linux)
>>
>> 30 Servers
>> 1 MDS
>> 30 OSTs each with 2TB of storage
>>
>> No fail over capabilities.
>>
>>
>> Scenario 1:
>> Your client is trying to mount lustre filesystem using lustre module,
>> and it hung. Do what?
>
> Answer 0 to all questions:
> "Read the Lustre Manual. File doc bugs in Lustre Bugzilla if there's a part
> you don't understand, or a part missing"
>
> Answer 1 for all your questions.
> "Check syslogs/consoles on the impacted clients.
> Check syslogs/consoles on _all lustre servers.
> Pay careful attention to timestamps.
> Work backwards to the first error."
>
> Is the problem restricted to one client or seen by multiple clients?
> If multiple clients, start with the network, use lctl ping to check lustre
> connectivity.
> If a single client, it's generally a client config/network config issue.
>>
>> Scenario 2:
>> Your MDS won't mount up. Its saying, "The server is already running".
>> You try to mount it up couple of times and still its not
>
> Be certain the server is not already running.
> Be certain no hung mount processes exist.
> Unload all lustre modules (lustre_rmmod script will do this)
> Retry and -> answer 1
>
>>
>> Scenario 3:
>> OST/OSS reboots due to a power outage. Some files are striped on this,
>> and some aren't What happens? What to do for minimal outage?
>
> - Clients can be mounted with a dead OST using the exclude options to the
> mount command. lfs getstripe can be run from clients to find files
> on the bad OST. See answer 0 for detailed process.
>>
>> Scenario 4:
>> lctl dl shows some devices in "ST" state. What does that mean, and how
>> do I clear it?
>
> ST = stopped.
> Clear this by cleaning up all devices (answer 0)
> or restarting the stopped devices.
> Usually indicates an error/issue with the stopped device, so see
> answer 1.
>>
>>
>> I know some of these scenarios may be ambiguous, but please let me
>> know which so I can further elaborate. I am eventually planning to
>> wiki this for future reference and other lustre newbies.
>
> Please contribute to wiki.lustre.org - there is considerable information
> there already, and a decent existing structure.
>>
>> If anyone else has any other scenarios, please don't be shy and ask
>> away. We can create a good trouble shooting doc similar to the
>> operations manual.
>
> Again, please file doc bugs at bugzilla.lustre.org and contribute to
> wiki.lustre.org, hope this helps!
> cliffw
>
>>
>>
>> TIA
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>



More information about the lustre-discuss mailing list