[Lustre-discuss] simulations

Cliff White Cliff.White at Sun.COM
Fri Aug 8 10:45:18 PDT 2008


Mag Gam wrote:
> CliffW:
> 
> This helps out a lot!
> 
> We still have problems determining devices. We don't know what their
> numbers are (I been using lctl dl), but I don't know how to activate
> or deactivate them.
> 
> 
> Do you have an example?
> 
Yup
http://manual.lustre.org/manual/LustreManual16_HTML/KnowledgeBase.html#50544717_84403

The .pdf version I think has more details.
cliffw

> 
> TIA
> 
> On Thu, Aug 7, 2008 at 10:59 AM, Cliff White <Cliff.White at sun.com> wrote:
>> Mag Gam wrote:
>>> We do a lot of fluid simulations at my university, but on a similar
>>> note I would like to know what the Lustre experts will do in
>>> particular simulated scenarios...
>>>
>>> The environment is this:
>>> 30 Servers (All Linux)
>>> 1000+ Clients (All Linux)
>>>
>>> 30 Servers
>>> 1 MDS
>>> 30 OSTs each with 2TB of storage
>>>
>>> No fail over capabilities.
>>>
>>>
>>> Scenario 1:
>>> Your client is trying to mount lustre filesystem using lustre module,
>>> and it hung. Do what?
>> Answer 0 to all questions:
>> "Read the Lustre Manual. File doc bugs in Lustre Bugzilla if there's a part
>> you don't understand, or a part missing"
>>
>> Answer 1 for all your questions.
>> "Check syslogs/consoles on the impacted clients.
>> Check syslogs/consoles on _all lustre servers.
>> Pay careful attention to timestamps.
>> Work backwards to the first error."
>>
>> Is the problem restricted to one client or seen by multiple clients?
>> If multiple clients, start with the network, use lctl ping to check lustre
>> connectivity.
>> If a single client, it's generally a client config/network config issue.
>>> Scenario 2:
>>> Your MDS won't mount up. Its saying, "The server is already running".
>>> You try to mount it up couple of times and still its not
>> Be certain the server is not already running.
>> Be certain no hung mount processes exist.
>> Unload all lustre modules (lustre_rmmod script will do this)
>> Retry and -> answer 1
>>
>>> Scenario 3:
>>> OST/OSS reboots due to a power outage. Some files are striped on this,
>>> and some aren't What happens? What to do for minimal outage?
>> - Clients can be mounted with a dead OST using the exclude options to the
>> mount command. lfs getstripe can be run from clients to find files
>> on the bad OST. See answer 0 for detailed process.
>>> Scenario 4:
>>> lctl dl shows some devices in "ST" state. What does that mean, and how
>>> do I clear it?
>> ST = stopped.
>> Clear this by cleaning up all devices (answer 0)
>> or restarting the stopped devices.
>> Usually indicates an error/issue with the stopped device, so see
>> answer 1.
>>>
>>> I know some of these scenarios may be ambiguous, but please let me
>>> know which so I can further elaborate. I am eventually planning to
>>> wiki this for future reference and other lustre newbies.
>> Please contribute to wiki.lustre.org - there is considerable information
>> there already, and a decent existing structure.
>>> If anyone else has any other scenarios, please don't be shy and ask
>>> away. We can create a good trouble shooting doc similar to the
>>> operations manual.
>> Again, please file doc bugs at bugzilla.lustre.org and contribute to
>> wiki.lustre.org, hope this helps!
>> cliffw
>>
>>>
>>> TIA
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss




More information about the lustre-discuss mailing list