Silicon Graphics, Inc.
OpenVault
System Startup and Restart
Command Sequencing
$Id: booting.html,v 1.13 1997/07/01 21:38:27 curtis Exp $
Overview and High Level Goals
This document describes how the different components
of the OpenVault system assemble themselves into a
functioning whole at either boot time or
when recovering from a partial failure of the system.
This document is primarily focussed on DCPs and LCPs,
not on CAPI clients or AAPI clients.
The goal is to have high flexibility without having
high administrative overhead; to support central administration
and/or locally controlled resources at the same time.
The dynamic configuration discovery approach we are using allows us to
have both good flexibility and insulation from environmental changes.
Specifically, we are using a combination of a small configuration file
next to each component and a stylized session initiation sequence
to bootstrap the pieces of the Open Vault system into full functionality.
The configuration file should have just enough information in it to
bootstrap that particular OpenVault component into communication
with the device it controls and the MLM Core.
All the rest of the information would be derived from the state
of the device, the information in the central OpenVault database,
or the parameters compiled into that particular component.
The contents of a configuration file may vary greatly from component to
component depending on the requirements of that particular component.
The stylized session initiation sequence is common to all components
and allows the component to identify itself by name,
type, and language versions that it supports.
The name an LCP or DCP uses is actually the name of the
device that it is managing, not the name of the particular instance
of LCP or DCP.
If more that one LCP or DCP claims to manage the same name,
then that device is assumed to be multi-ported between those two components.
Decisions, Assumptions, and Limitations
-
We want to carefully walk the line between central administration and
locally controlled resources.
We want the OpenVault architecture to be able to support either one,
depending on site preferences and configuration options.
-
We want to cleanly support multi-ported devices, thus the use of
the device name to detect multi-porting.
-
We do not need to be able to support multiple independent
MLM Cores running on the same host.
Using a single Well Known Address for the MLM Core root process limits us
to one copy of OpenVault on a given system, but that is deemed to be OK.
Open Issues
-
How do we detect and recover from configuration errors on
dual-ported drives?
If system A calls it fred and system B calls it joe,
how do we figure this out and not clobber each other?
LCP Booting
Overview
Since any given LCP could be the inactive side of a multi-ported device,
there is a specific set of requirements that an LCP must adhere to during
booting so as not to interfere with the currently active LCP for that device.
The MLM Core is the arbitrator of control for all devices,
and the LCP cannot assume that it is controlling the device until the
MLM Core tells it to do so.
Configuration File
Each LCP should have a configuration file that logically contains
at least the following information:
- The address of the controlling MLM Core.
This allows the LCP to initiate contact with the controlling MLM Core.
It is the name of the system (or the IP address);
the MLM Core usually lives at a Well Known Address on that machine.
- The name of the managed library.
The MLM Core uses this name as a handle for this physical library.
All device names (whether libraries or drives)
must be unique within the OpenVault domain so that OpenVault can
accurately detect multi-hosted devices.
- The control path to the library (for example: /dev/scsi/sc0d2l0).
This is how the LCP would talk to the library hardware.
This information is not visible to the MLM Core.
This is not a hard requirement as some library implementations are not
controlled in that fashion, but the vast majority of libraries will
need something equivalent, so we mention it here.
- The name of the drives contained in this library.
The MLM Core will use this information to determine the relationship
between libraries and drives (between LCPs and DCPs).
That "contained in" relationship is a big part of deciding which drives a
cartridge can be put into, based on which library the cartridge is in.
Each library has some notion of addressing for each drive inside that library,
some way to identify to the library which drive we are talking about.
The "name" to drive address mapping in an LCP is generically of the form:
the "first" drive in the library is called "fred" by OpenVault,
the "second" drive in the library is called "john" by OpenVault,
and the "third" drive in the library is called "harry" by OpenVault.
The format and content of the file are strictly up to the LCP implementor,
but we strongly encourage edittable ASCII.
There may be additional information in the file,
the list above is just an idea of the likely minimum contents.
(Re)Boot Sequence
- When an LCP (re)boots, the LCP will:
- Do not talk to the library yet:
All LCPs must boot into the activate disable state,
they must wait for the MLM Core to tell them to talk to the library.
If the library is actually dual-ported and the other LCP is currently
active, we do not want to interrupt anything.
The activate enable command will be issued by the MLM Core when
the Core decides that this LCP should take control of the library.
If the library is not dual-ported, the activate enable command
will be issued almost immediately.
- Read its configuration file.
- Open a connection to the controlling MLM Core:
- If it fails, then try again in 2 minutes.
-
The LCP will output a hello message upon openning the connection.
For example:
hello language["ALI"] version["1.0"] name["robbie"];
where robbie is the name of the library.
The LCP will block until it receives a welcome command letting
the LCP know which language version to use during this session.
- When the MLM Core is first contacted by an LCP it will:
- Integrate the library into the list of managed devices.
The MLM Core should check for other LCPs that manage the same
physical library.
If this is the first, the Core will tell this LCP to activate enable.
As a result of this sequencing, LCPs are given control of their
associated library on a first-come-first-served basis.
- When OpenVault tells the LCP to activate enable, the LCP will:
- Tell the MLM Core ready no
The LCP needs to tell the MLM Core that it has started to come up.
This state tells the Core that the LCP is not yet ready to
accept cartridge movement commands.
- Talk to the library to determine:
- That the library is supported by this LCP (eg: "ATL-2640" is supported).
- Whether the library supports PCLs/barcodes or not (eg: true/false).
- List of supported cartridge form factors
(eg: "DLT", may be compiled into the LCP).
- Total number of slots for each formFactor.
- Total number of used slots for each formFactor.
- Import/export port configuration.
- The slotmap (eg: all the barcodes and occupancy info for the library).
- Any other information that may be relevant tothe operation of the LCP
or of the library.
- Collect any state/config information from the MLM Core
The LCP can store persistent state and/or configuration information
in the MLM Core's database.
For example, the LCP will certainly want to retrieve the loglevel
attribute so that it can resume logging only the level of messages
that the Admin wants logged.
- Push all the slotmap and drive information up into the MLM Core
The LCP "owns" the slotmap and therefore needs to update the
MLM Core's copy of the slotmap whenever required.
- Tell the MLM Core ready
The LCP needs to tell the MLM Core when it is ready to
accept cartridge movement commands.
- Respond success to the activate enable command.
This is defined to be the last step as a convenience to the MLM Core.
The MLM Core can block until it receives a response from the activate
command rather than waiting for a ready command to arrive.
- When the LCP pushes the slotmap and drive information up,
the MLM Core will:
- Crosscheck the list of drives
The MLM Core should crosscheck the list of contained
drive names with the list of known DCPs.
Not all DCPs may have checked in before this LCP does.
The MLM Core should also keep a list of DCPs that have
not yet checked in so that it can flag them as possible
hardware failures.
- Crosscheck the list of PCLs
The MLM Core should crosscheck the list of PCLs the LCP returns
against the previously known contents of the library looking
for new or missing cartridges.
A message should be sent to the Administrator and/or logfiles
if any changes are detected.
- Store all the slot and drive information in the database
The MLM Core should store all of the information that the LCP has
provided in the database.
That information will be the basis for choosing drives and cartridges
on behalf of CAPI clients.
- When the MLM Core gets a succesful response to the activate command,
it will:
- Mark the library as being available for cartridge mounts
The library is now ready to accept cartridge mount/unmount/movement
operations.
This implies that the cartridges in that library will no longer be
filtered out of the list of potential candidates for a mount operation
as a result of not being where the MLM Core can get to them.
DCP Booting
Overview
Since any given DCP could be the inactive side of a multi-ported device,
there is a specific set of requirements that a DCP must adhere to during
booting so as not to interfere with the currently active DCP for that device.
The MLM Core is the arbitrator of control for all multi-ported devices,
and the DCP cannot assume that it is controlling the device until the
MLM Core tells it to do so.
Configuration File
Each DCP should have a configuration file that logically contains
at least the following information:
- The address of the controlling MLM Core.
This allows the DCP to initiate contact with the controlling MLM Core.
It is the name of the system (or the IP address);
the MLM Core usually lives at a Well Known Address on that machine.
- The name of the managed drive.
The MLM Core uses this name as a handle for this physical drive.
All device names (whether libraries or drives)
must be unique within the OpenVault domain so that OpenVault can
accurately detect multi-hosted devices.
- The control path to the drive (for example: /dev/rmt/tps0d4).
This is how the DCP would talk to the hardware.
This information is not visible to the MLM Core.
This is not a hard requirement as some drive implementations are not
controlled in that fashion (eg: Louth controlled video cart players),
but the vast majority of drives will need something equivalent,
so we mention it here.
- A list of access-mode-names and access capabilities for this drive.
This is completely DCP implementation dependent,
and as such it might not even exist for some implementations,
but since most will probably have something equivalent, we mention it here.
The DCP needs some way for an Administrator to control the capabilities
that the DCP advertises to the MLM Core.
This is one of the possible ways.
The config file would list the tag names and their associated
capabilities and config/performance parameters.
The capabilities are simply text strings that people have agreed upon.
The MLM Core doesn't care what they are,
it will simply compare them for equality when looking for a drive to
satisfy some user requirement.
The tag names are simply text handles that the MLM Core
uses to tell the DCP which combination of capabilities it would like
the DCP to use when mounting the next cartridge.
For example:
name capabilities list for that handle clonable dev pathname
---- --------------------------------- ---------------------
base /dev/rmt/tps0d4nrv
r rewind, /dev/rmt/tps0d4v
f fixedblock /dev/rmt/tps0d4nr
c compressed, /dev/rmt/tps0d4cnrv
cr compressed, rewind /dev/rmt/tps0d4cv
cf compressed, fixedblock /dev/rmt/tps0d4cnr
rf rewind, fixedblock /dev/rmt/tps0d4
crf compressed, rewind, fixedblock /dev/rmt/tps0d4c
stat status /dev/rmt/tps0d4stat
all allmodes /dev/rmt/tps0d4
audio audio /dev/rmt/tps0d4a
In this example, a UNIX device pathname is included so as to
avoid having the DCP understand the format of a dev_t minor number,
or the equivalent on some other UNIX-like operating system.
The DCP can replicate the path (ie: copy the dev_t) when it needs to
create a handle for that combination of drive and access mode
for an application.
Note that OpenVault defines the default capabilities of a drive
and a DCP must specify what capabilities it offers in terms of changes
to that default set.
The format and content of the file are strictly up to the DCP implementor,
but we strongly encourage edittable ASCII.
There may be additional information in the file,
the list above is just an idea of the likely minimum contents.
(Re)boot Sequence
- When a DCP (re)boots the DCP will:
- Do not talk to the drive yet:
All DCPs must boot into the active disable state,
they must wait for the MLM Core to tell them to talk to the drive.
If the drive is actually dual-ported and the other DCP is currently
active, we do not want to interrupt anything.
The activate enable command will be issued by the MLM Core when
the Core decides that this DCP should take control of the drive.
If the drive is not dual-ported, the activate enable command will be
issued almost immediately.
- Read its configuration file.
The DCP should assimiliate all of the information in its configuration file.
- Open a connection to the controlling MLM Core:
- If it fails, then try again in 2 minutes.
-
The DCP will output a hello message upon openning the connection.
For example:
hello language["ADI"] version["1.0"] name["DLT12"];
where DLT12 is the name of the drive.
The DCP will block until it receives a welcome command letting
the DCP know which language version to use during this session.
- When the MLM Core is first contacted by a DCP it will:
- Integrate the drive into the list of managed devices.
The MLM Core should check for other DCPs that manage the same
physical drive.
If this is the first, the Core will tell this DCP to activate enable.
As a result of this sequencing, DCPs are given control of their
associated drive on a first-come-first-served basis.
- When OpenVault tells the DCP to activate enable, the DCP will:
- Tell the MLM Core ready no
The DCP needs to tell the MLM Core that it has started to come up.
This state tells the Core that the DCP is not yet ready to
accept drive control commands.
- Talk to the drive to determine:
- Verify the drive type is supported by this DCP.
- The supported media formats (eg: EXABYTE-8mm-5GB).
- Verify that the drive can support the listed access modes.
- Whether the drive is in use or loaded at this time.
- Verify/acquire any other information that will affect the DCP.
- Collect any state/config information from the MLM Core
The DCP can store persistent state and/or configuration information
in the MLM Core's database.
For example, the DCP will certainly want to retrieve the loglevel
attribute so that it can resume logging only the level of messages
that the Admin wants logged.
- Push all the capability information up into the MLM Core
The DCP needs to update the MLM Core's copy of the capability list
at boot time, before the DCP has been activated.
Note that this is different than an LCP.
In order for the MLM Core to decide that a drive might be a candidate
to satisfy a mount request, the Core needs all the config information
from the DCP.
Using the same model the LCP uses would means that the DCP would have
to be activated before we would know what those capabilities were,
and we do not want to activate all the DCPs that could control a given
drive just so we can see what their capability set is.
The DCP will take all of the compiled-in information and the info
from its config file and generate a config command to send up
to the MLM Core.
There is the possibility that the offered capabilities will change once
the DCP has a chance to talk to the drive hardware,
but the MLM Core will have to deal with that when it happens.
See below for more information.
- Tell the MLM Core ready
The DCP needs to tell the MLM Core when it is ready to
accept commands.
- Respond success to the activate enable command.
This is defined to be the last step as a convenience to the MLM Core.
The MLM Core can block until it receives a response from the activate
command rather than waiting for a ready command to arrive.
MLM Core Booting
Overview
Configuration File
The MLM Core has a configuration file containing an unspecified
pile of stuff.
(Re)boot Sequence
When the MLM Core (re)boots it will have dropped all of its
existing connections to LCPs and DCPs,
so they will all be trying to reattach.
When it boots it will do the following:
- Read its configuration file.
- All DCPs and LCPs will re-open their connections:
If they fail their connection attempt, then they will try again in 2 minutes.
The DCPs and LCPs should use TCP keepalives to monitor remote system crashes.
- The MLM Core will wait for connections and requests:
- The MLM Core will accept connections from DCPs and LCPs
as they arrive and will treat them just as detailed above
for DCP and/or LCP reboot.
-
CAPI requests will be serviced as soon as the resources needed to
service that request become available.
No "quorum" is needed.