ceph-mgr orchestrator 模块¶
Warning
这是开发者文档,描述的是 Ceph 内部机制,仅适用于编写 ceph-mgr orchestrator 模块的人们。
In this context, orchestrator refers to some external service that provides the ability to discover devices and create Ceph services. This includes external projects such as ceph-ansible, DeepSea, and Rook.
An orchestrator module is a ceph-mgr module (ceph-mgr 模块开发指南) which implements common management operations using a particular orchestrator.
Orchestrator modules subclass the Orchestrator
class: this class is
an interface, it only provides method definitions to be implemented
by subclasses. The purpose of defining this common interface
for different orchestrators is to enable common UI code, such as
the dashboard, to work with various different backends.
Behind all the abstraction, the purpose of orchestrator modules is simple: enable Ceph to do things like discover available hardware, create and destroy OSDs, and run MDS and RGW services.
A tutorial is not included here: for full and concrete examples, see the existing implemented orchestrator modules in the Ceph source tree.
术语¶
- Stateful service
a daemon that uses local storage, such as OSD or mon.
- Stateless service
a daemon that doesn’t use any local storage, such as an MDS, RGW, nfs-ganesha, iSCSI gateway.
- Label
arbitrary string tags that may be applied by administrators to nodes. Typically administrators use labels to indicate which nodes should run which kinds of service. Labels are advisory (from human input) and do not guarantee that nodes have particular physical capabilities.
- Drive group
collection of block devices with common/shared OSD formatting (typically one or more SSDs acting as journals/dbs for a group of HDDs).
- Placement
choice of which node is used to run a service.
关键概念¶
The underlying orchestrator remains the source of truth for information about whether a service is running, what is running where, which nodes are available, etc. Orchestrator modules should avoid taking any internal copies of this information, and read it directly from the orchestrator backend as much as possible.
Bootstrapping nodes and adding them to the underlying orchestration system is outside the scope of Ceph’s orchestrator interface. Ceph can only work on nodes when the orchestrator is already aware of them.
Calls to orchestrator modules are all asynchronous, and return completion objects (see below) rather than returning values immediately.
Where possible, placement of stateless services should be left up to the orchestrator.
Completions and batching¶
All methods that read or modify the state of the system can potentially be long running. To handle that, all such methods return a Completion object. Orchestrator modules must implement the process method: this takes a list of completions, and is responsible for checking if they’re finished, and advancing the underlying operations as needed.
Each orchestrator module implements its own underlying mechanisms for completions. This might involve running the underlying operations in threads, or batching the operations up before later executing in one go in the background. If implementing such a batching pattern, the module would do no work on any operation until it appeared in a list of completions passed into process.
Some operations need to show a progress. Those operations need to add a ProgressReference to the completion. At some point, the progress reference becomes effective, meaning that the operation has really happened (e.g. a service has actually been started).
-
Orchestrator.
process
(completions)¶ Given a list of Completion instances, process any which are incomplete.
Callers should inspect the detail of each completion to identify partial completion/progress information, and present that information to the user.
This method should not block, as this would make it slow to query a status, while other long running operations are in progress.
- Return type
None
-
class
orchestrator.
Completion
(_first_promise=None, value=<object object>, on_complete=None, name=None)¶ Combines multiple promises into one overall operation.
Completions are composable by being able to call one completion from another completion. I.e. making them re-usable using Promises E.g.:
>>> ... return Orchestrator().get_hosts().then(self._create_osd)
where
get_hosts
returns a Completion of list of hosts and_create_osd
takes a list of hosts.The concept behind this is to store the computation steps explicit and then explicitly evaluate the chain:
>>> ... p = Completion(on_complete=lambda x: x*2).then(on_complete=lambda x: str(x)) ... p.finalize(2) ... assert p.result = "4"
or graphically:
+---------------+ +-----------------+ | | then | | | lambda x: x*x | +--> | lambda x: str(x)| | | | | +---------------+ +-----------------+
-
fail
(e)¶ Sets the whole completion to be faild with this exception and end the evaluation.
-
property
has_result
¶ Has the operation already a result?
For Write operations, it can already have a result, if the orchestrator’s configuration is persistently written. Typically this would indicate that an update had been written to a manifest, but that the update had not necessarily been pushed out to the cluster.
- Return type
bool
- Returns
-
property
is_errored
¶ Has the completion failed. Default implementation looks for self.exception. Can be overwritten.
- Return type
bool
-
property
is_finished
¶ Could the external operation be deemed as complete, or should we wait? We must wait for a read operation only if it is not complete.
- Return type
bool
-
property
needs_result
¶ Could the external operation be deemed as complete, or should we wait? We must wait for a read operation only if it is not complete.
- Return type
bool
-
property
progress_reference
¶ ProgressReference. Marks this completion as a write completeion.
- Return type
Optional
[ProgressReference
]
-
property
result
¶ The result of the operation that we were waited for. Only valid after calling Orchestrator.process() on this completion.
-
result_str
()¶ Force a string.
-
-
class
orchestrator.
ProgressReference
(message, mgr, completion=None)¶ -
completion
: Optional[Callable[[], Completion]] = None¶ The completion can already have a result, before the write operation is effective. progress == 1 means, the services are created / removed.
-
property
progress
¶ if a orchestrator module can provide a more detailed progress information, it needs to also call
progress.update()
.
-
错误处理¶
The main goal of error handling within orchestrator modules is to provide debug information to assist users when dealing with deployment errors.
-
class
orchestrator.
OrchestratorError
¶ General orchestrator specific error.
Used for deployment, configuration or user errors.
It’s not intended for programming errors or orchestrator internal errors.
-
class
orchestrator.
NoOrchestrator
(msg='No orchestrator configured (try `ceph orch set backend`)')¶ No orchestrator in configured.
-
class
orchestrator.
OrchestratorValidationError
¶ Raised when an orchestrator doesn’t support a specific feature.
In detail, orchestrators need to explicitly deal with different kinds of errors:
No orchestrator configured
See
NoOrchestrator
.An orchestrator doesn’t implement a specific method.
For example, an Orchestrator doesn’t support
add_host
.In this case, a
NotImplementedError
is raised.Missing features within implemented methods.
E.g. optional parameters to a command that are not supported by the backend (e.g. the hosts field in
Orchestrator.update_mons()
command with the rook backend).Input validation errors
The
orchestrator_cli
module and other calling modules are supposed to provide meaningful error messages.Errors when actually executing commands
The resulting Completion should contain an error string that assists in understanding the problem. In addition,
_Completion.is_errored()
is set toTrue
Invalid configuration in the orchestrator modules
This can be tackled similar to 5.
All other errors are unexpected orchestrator issues and thus should raise an exception that are then
logged into the mgr log file. If there is a completion object at that point,
_Completion.result()
may contain an error message.
排除的功能¶
Ceph’s orchestrator interface is not a general purpose framework for managing linux servers – it is deliberately constrained to manage the Ceph cluster’s services only.
Multipathed storage is not handled (multipathing is unnecessary for Ceph clusters). Each drive is assumed to be visible only on a single node.
主机管理¶
-
Orchestrator.
add_host
(host_spec)¶ Add a host to the orchestrator inventory.
- Parameters
host – hostname
- Return type
Completion
-
Orchestrator.
remove_host
(host)¶ Remove a host from the orchestrator inventory.
- Parameters
host (
str
) – hostname- Return type
Completion
-
Orchestrator.
get_hosts
()¶ Report the hosts in the cluster.
- Return type
Completion
- Returns
list of HostSpec
-
Orchestrator.
update_host_addr
(host, addr)¶ Update a host’s address
- Parameters
host (
str
) – hostnameaddr (
str
) – address (dns name or IP)
- Return type
Completion
-
Orchestrator.
add_host_label
(host, label)¶ Add a host label
- Return type
Completion
-
Orchestrator.
remove_host_label
(host, label)¶ Remove a host label
- Return type
Completion
-
class
orchestrator.
HostSpec
(hostname, addr=None, labels=None, status=None)¶ Information about hosts. Like e.g.
kubectl get nodes
设备¶
-
Orchestrator.
get_inventory
(host_filter=None, refresh=False)¶ Returns something that was created by ceph-volume inventory.
- Return type
Completion
- Returns
list of InventoryHost
-
class
orchestrator.
InventoryFilter
(labels=None, hosts=None)¶ When fetching inventory, use this filter to avoid unnecessarily scanning the whole estate.
- Typical use: filter by host when presenting UI workflow for configuring
a particular server. filter by label when not all of estate is Ceph servers, and we want to only learn about the Ceph servers. filter by label when we are interested particularly in e.g. OSD servers.
-
class
ceph.deployment.inventory.
Devices
(devices)¶ A container for Device instances with reporting
-
class
ceph.deployment.inventory.
Device
(path, sys_api=None, available=None, rejected_reasons=None, lvs=None, device_id=None)¶
Placement¶
A Placement Specification defines the placement of daemons of a specifc service.
In general, stateless services do not require any specific placement rules as they can run anywhere that sufficient system resources are available. However, some orchestrators may not include the functionality to choose a location in this way. Optionally, you can specify a location when creating a stateless service.
-
class
ceph.deployment.service_spec.
PlacementSpec
(label=None, hosts=None, count=None, host_pattern=None)¶ For APIs that need to specify a host subset
-
classmethod
from_string
(arg)¶ A single integer is parsed as a count: >>> PlacementSpec.from_string(‘3’) PlacementSpec(count=3)
A list of names is parsed as host specifications: >>> PlacementSpec.from_string(‘host1 host2’) PlacementSpec(hosts=[HostPlacementSpec(hostname=’host1’, network=’’, name=’’), HostPlacementSpec(hostname=’host2’, network=’’, name=’’)])
You can also prefix the hosts with a count as follows: >>> PlacementSpec.from_string(‘2 host1 host2’) PlacementSpec(count=2, hosts=[HostPlacementSpec(hostname=’host1’, network=’’, name=’’), HostPlacementSpec(hostname=’host2’, network=’’, name=’’)])
You can spefify labels using label:<label> >>> PlacementSpec.from_string(‘label:mon’) PlacementSpec(label=’mon’)
Labels als support a count: >>> PlacementSpec.from_string(‘3 label:mon’) PlacementSpec(count=3, label=’mon’)
fnmatch is also supported: >>> PlacementSpec.from_string(‘data[1-3]’) PlacementSpec(host_pattern=’data[1-3]’)
>>> PlacementSpec.from_string(None) PlacementSpec()
- Return type
-
host_pattern
: Optional[str] = None¶ fnmatch patterns to select hosts. Can also be a single host.
-
classmethod
服务¶
-
class
orchestrator.
ServiceDescription
(spec, container_image_id=None, container_image_name=None, rados_config_location=None, service_url=None, last_refresh=None, created=None, size=0, running=0)¶ For responding to queries about the status of a particular service, stateful or stateless.
This is not about health or performance monitoring of services: it’s about letting the orchestrator tell Ceph whether and where a service is scheduled in the cluster. When an orchestrator tells Ceph “it’s running on host123”, that’s not a promise that the process is literally up this second, it’s a description of where the orchestrator has decided the service should run.
-
class
ceph.deployment.service_spec.
ServiceSpec
(service_type, service_id=None, placement=None, count=None, unmanaged=False)¶ Details of service creation.
Request to the orchestrator for a cluster of daemons such as MDS, RGW, iscsi gateway, MONs, MGRs, Prometheus
This structure is supposed to be enough information to start the services.
-
Orchestrator.
describe_service
(service_type=None, service_name=None, refresh=False)¶ Describe a service (of any kind) that is already configured in the orchestrator. For example, when viewing an OSD in the dashboard we might like to also display information about the orchestrator’s view of the service (like the kubernetes pod ID).
When viewing a CephFS filesystem in the dashboard, we would use this to display the pods being currently run for MDS daemons.
- Return type
Completion
- Returns
list of ServiceDescription objects.
-
Orchestrator.
service_action
(action, service_name)¶ Perform an action (start/stop/reload) on a service (i.e., all daemons providing the logical service).
- Parameters
action (
str
) – one of “start”, “stop”, “restart”, “redeploy”, “reconfig”service_type – e.g. “mds”, “rgw”, …
service_name (
str
) – name of logical service (“cephfs”, “us-east”, …)
- Return type
-
Orchestrator.
remove_service
(service_name)¶ Remove a service (a collection of daemons).
- Return type
Completion
- Returns
None
守护进程¶
-
Orchestrator.
list_daemons
(service_name=None, daemon_type=None, daemon_id=None, host=None, refresh=False)¶ Describe a daemon (of any kind) that is already configured in the orchestrator.
- Return type
Completion
- Returns
list of DaemonDescription objects.
-
Orchestrator.
remove_daemons
(names)¶ Remove specific daemon(s).
- Return type
Completion
- Returns
None
-
Orchestrator.
daemon_action
(action, daemon_type, daemon_id)¶ Perform an action (start/stop/reload) on a daemon.
- Parameters
action (
str
) – one of “start”, “stop”, “restart”, “redeploy”, “reconfig”name – name of daemon
- Return type
OSD 管理¶
-
Orchestrator.
create_osds
(drive_group)¶ Create one or more OSDs within a single Drive Group.
The principal argument here is the drive_group member of OsdSpec: other fields are advisory/extensible for any finer-grained OSD feature enablement (choice of backing store, compression/encryption, etc).
- Return type
Completion
-
Orchestrator.
blink_device_light
(ident_fault, on, locations)¶ Instructs the orchestrator to enable or disable either the ident or the fault LED.
- Parameters
ident_fault (
str
) – either"ident"
or"fault"
on (
bool
) –True
= on.locations (
List
[DeviceLightLoc
]) – Seeorchestrator.DeviceLightLoc
- Return type
Completion
-
class
orchestrator.
DeviceLightLoc
¶ Describes a specific device on a specific host. Used for enabling or disabling LEDs on devices.
hostname as in
orchestrator.Orchestrator.get_hosts()
- device_id: e.g.
ABC1234DEF567-1R1234_ABC8DE0Q
. See
ceph osd metadata | jq '.[].device_ids'
- device_id: e.g.
OSD 替换¶
See 替换一个 OSD for the underlying process.
Replacing OSDs is fundamentally a two-staged process, as users need to physically replace drives. The orchestrator therefor exposes this two-staged process.
Phase one is a call to Orchestrator.remove_osds()
with destroy=True
in order to mark
the OSD as destroyed.
Phase two is a call to Orchestrator.create_osds()
with a Drive Group with
DriveGroupSpec.osd_id_claims
set to the destroyed OSD ids.
监视器¶
-
Orchestrator.
add_mon
(spec)¶ Create mon daemon(s)
- Return type
Completion
-
Orchestrator.
apply_mon
(spec)¶ Update mon cluster
- Return type
Completion
无状态服务¶
-
Orchestrator.
add_mgr
(spec)¶ Create mgr daemon(s)
- Return type
Completion
-
Orchestrator.
apply_mgr
(spec)¶ Update mgr cluster
- Return type
Completion
-
Orchestrator.
add_mds
(spec)¶ Create MDS daemon(s)
- Return type
Completion
-
Orchestrator.
apply_mds
(spec)¶ Update MDS cluster
- Return type
Completion
-
Orchestrator.
add_rbd_mirror
(spec)¶ Create rbd-mirror daemon(s)
- Return type
Completion
-
Orchestrator.
apply_rbd_mirror
(spec)¶ Update rbd-mirror cluster
- Return type
Completion
-
class
ceph.deployment.service_spec.
RGWSpec
(service_type='rgw', service_id=None, placement=None, rgw_realm=None, rgw_zone=None, subcluster=None, rgw_frontend_port=None, rgw_frontend_ssl_certificate=None, rgw_frontend_ssl_key=None, unmanaged=False, ssl=False)¶ Settings to configure a (multisite) Ceph RGW
-
Orchestrator.
add_rgw
(spec)¶ Create RGW daemon(s)
- Return type
Completion
-
Orchestrator.
apply_rgw
(spec)¶ Update RGW cluster
- Return type
Completion
-
class
ceph.deployment.service_spec.
NFSServiceSpec
(service_type='nfs', service_id=None, pool=None, namespace=None, placement=None, unmanaged=False)¶
-
Orchestrator.
add_nfs
(spec)¶ Create NFS daemon(s)
- Return type
Completion
-
Orchestrator.
apply_nfs
(spec)¶ Update NFS cluster
- Return type
Completion
升级¶
-
Orchestrator.
upgrade_available
()¶ Report on what versions are available to upgrade to
- Return type
Completion
- Returns
List of strings
-
Orchestrator.
upgrade_start
(image, version)¶ - Return type
Completion
-
Orchestrator.
upgrade_status
()¶ If an upgrade is currently underway, report on where we are in the process, or if some error has occurred.
- Return type
Completion
- Returns
UpgradeStatusSpec instance
-
class
orchestrator.
UpgradeStatusSpec
¶
工具¶
-
Orchestrator.
available
()¶ Report whether we can talk to the orchestrator. This is the place to give the user a meaningful message if the orchestrator isn’t running or can’t be contacted.
This method may be called frequently (e.g. every page load to conditionally display a warning banner), so make sure it’s not too expensive. It’s okay to give a slightly stale status (e.g. based on a periodic background ping of the orchestrator) if that’s necessary to make this method fast.
Note
True doesn’t mean that the desired functionality is actually available in the orchestrator. I.e. this won’t work as expected:
>>> ... if OrchestratorClientMixin().available()[0]: # wrong. ... OrchestratorClientMixin().get_hosts()
- Return type
Tuple
[bool
,str
]- Returns
two-tuple of boolean, string
-
Orchestrator.
get_feature_set
()¶ Describes which methods this orchestrator implements
Note
True doesn’t mean that the desired functionality is actually possible in the orchestrator. I.e. this won’t work as expected:
>>> ... api = OrchestratorClientMixin() ... if api.get_feature_set()['get_hosts']['available']: # wrong. ... api.get_hosts()
It’s better to ask for forgiveness instead:
>>> ... try: ... OrchestratorClientMixin().get_hosts() ... except (OrchestratorError, NotImplementedError): ... ...
- Returns
Dict of API method names to
{'available': True or False}
客户端模块¶
-
class
orchestrator.
OrchestratorClientMixin
¶ A module that inherents from OrchestratorClientMixin can directly call all
Orchestrator
methods without manually calling remote.Every interface method from
Orchestrator
is converted into a stub method that internally callsOrchestratorClientMixin._oremote()
>>> class MyModule(OrchestratorClientMixin): ... def func(self): ... completion = self.add_host('somehost') # calls `_oremote()` ... self._orchestrator_wait([completion]) ... self.log.debug(completion.result)
Note
Orchestrator implementations should not inherit from OrchestratorClientMixin. Reason is, that OrchestratorClientMixin magically redirects all methods to the “real” implementation of the orchestrator.
>>> import mgr_module >>> ... class MyImplentation(mgr_module.MgrModule, Orchestrator): ... def __init__(self, ...): ... self.orch_client = OrchestratorClientMixin() ... self.orch_client.set_mgr(self.mgr))
-
set_mgr
(mgr)¶ Useable in the Dashbord that uses a global
mgr
- Return type
None
-