块设备与 OpenStack¶
通过 libvirt
你可以把 Ceph 块设备用于 OpenStack ,它配置了 QEMU 到 librbd
的接口。 Ceph 把块设备分块为对象并分布到集群中,这意味着大个的 Ceph 块设备映像其性能会比独立服务器更好。
要把 Ceph 块设备用于 OpenStack ,必须先安装 QEMU 、
libvirt
和 OpenStack 。我们建议用一台独立的物理主机安装 OpenStack ,此主机最少需 8GB 内存和一个 4 核 CPU 。下面的图表描述了 OpenStack/Ceph 技术栈。
Important
要让 OpenStack 使用 Ceph 块设备,你必须有相应的 Ceph 集群访问权限。
OpenStack 里有三个地方和 Ceph 块设备结合:
映像: OpenStack 的 Glance 管理着 VM 的映像。映像相对恒定, OpenStack 把它们当作大块二进制数据、并按需下载。
卷宗: 卷宗是块设备, OpenStack 用它们引导虚拟机、或挂到运行着的虚拟机上。 OpenStack 用 Cinder 服务管理卷宗。
Guest Disks: Guest disks are guest operating system disks. By default, when you boot a virtual machine, its disk appears as a file on the filesystem of the hypervisor (usually under
/var/lib/nova/instances/<uuid>/
). Prior to OpenStack Havana, the only way to boot a VM in Ceph was to use the boot-from-volume functionality of Cinder. However, now it is possible to boot every virtual machine inside Ceph directly without using Cinder, which is advantageous because it allows you to perform maintenance operations easily with the live-migration process. Additionally, if your hypervisor dies it is also convenient to triggernova evacuate
and run the virtual machine elsewhere almost seamlessly.
你可以用 OpenStack Glance 把映像存储到 Ceph 块设备中,还可以用 Cinder 来引导映像的写时复制克隆品。
下面将详细指导你安装设置 Glance 、 Cinder 和 Nova ,虽然它们不一定一起用。你可以在本地硬盘上运行 VM 、却把映像存储于 Ceph 块设备,反之亦可。
Important
Ceph doesn’t support QCOW2 for hosting a virtual machine disk.
Thus if you want to boot virtual machines in Ceph (ephemeral backend or boot
from volume), the Glance image format must be RAW
.
Tip
This document describes using Ceph Block Devices with OpenStack Havana. For earlier versions of OpenStack see 块设备与 OpenStack (Dumpling).
创建一个存储池¶
默认情况下, Ceph 块设备使用 rbd
存储池,你可以用任何可用存储池。但我们建议分别为 Cinder 和 Glance 创建存储池。确保 Ceph
集群在运行,然后创建存储池。
ceph osd pool create volumes 128
ceph osd pool create images 128
ceph osd pool create backups 128
ceph osd pool create vms 128
参考创建存储池为存储池指定归置组数量,参考归置组确定应该为存储池分配多少归置组。
新建的存储池必须先初始化才能使用,用 rbd
工具来初始化此存储池:
rbd pool init volumes
rbd pool init images
rbd pool init backups
rbd pool init vms
配置 OpenStack 的 Ceph 客户端¶
运行着 glance-api
、 cinder-volume
、 nova-compute
或 cinder-backup
的主机被当作 Ceph 客户端,它们都需要
ceph.conf
文件。
ssh {your-openstack-server} sudo tee /etc/ceph/ceph.conf </etc/ceph/ceph.conf
安装 Ceph 客户端软件包¶
在运行 glance-api
的节点上你得安装 librbd
的 Python 绑定:
sudo apt-get install python-rbd
sudo yum install python-rbd
在 nova-compute
、 cinder-backup
和 cinder-volume
节点上,要安装 Python 绑定和客户端命令行工具:
sudo apt-get install ceph-common
sudo yum install ceph-common
配置 Ceph 客户端认证¶
如果你启用了 cephx 认证,需要分别为 Nova/Cinder 和 Glance 创建新用户。命令如下:
ceph auth get-or-create client.glance mon 'profile rbd' osd 'profile rbd pool=images' mgr 'profile rbd pool=images'
ceph auth get-or-create client.cinder mon 'profile rbd' osd 'profile rbd pool=volumes, profile rbd pool=vms, profile rbd-read-only pool=images' mgr 'profile rbd pool=volumes, profile rbd pool=vms'
ceph auth get-or-create client.cinder-backup mon 'profile rbd' osd 'profile rbd pool=backups' mgr 'profile rbd pool=backups'
把这些用户 client.cinder
、 client.glance
和
client.cinder-backup
的密钥环复制到各自所在节点,并修正所有权:
ceph auth get-or-create client.glance | ssh {your-glance-api-server} sudo tee /etc/ceph/ceph.client.glance.keyring
ssh {your-glance-api-server} sudo chown glance:glance /etc/ceph/ceph.client.glance.keyring
ceph auth get-or-create client.cinder | ssh {your-volume-server} sudo tee /etc/ceph/ceph.client.cinder.keyring
ssh {your-cinder-volume-server} sudo chown cinder:cinder /etc/ceph/ceph.client.cinder.keyring
ceph auth get-or-create client.cinder-backup | ssh {your-cinder-backup-server} sudo tee /etc/ceph/ceph.client.cinder-backup.keyring
ssh {your-cinder-backup-server} sudo chown cinder:cinder /etc/ceph/ceph.client.cinder-backup.keyring
运行 nova-compute
的节点,其进程需要密钥环文件:
ceph auth get-or-create client.cinder | ssh {your-nova-compute-server} sudo tee /etc/ceph/ceph.client.cinder.keyring
还得把 client.cinder
用户的密钥存进 libvirt
, libvirt
进程从 Cinder 挂载块设备时要用它访问集群。
在运行 nova-compute
的节点上创建一个密钥的临时副本:
ceph auth get-key client.cinder | ssh {your-compute-node} tee client.cinder.key
然后,在计算节点上把密钥加进 libvirt
、然后删除临时副本:
uuidgen
457eb676-33da-42ec-9a8c-9293d545c337
cat > secret.xml <<EOF
<secret ephemeral='no' private='no'>
<uuid>457eb676-33da-42ec-9a8c-9293d545c337</uuid>
<usage type='ceph'>
<name>client.cinder secret</name>
</usage>
</secret>
EOF
sudo virsh secret-define --file secret.xml
Secret 457eb676-33da-42ec-9a8c-9293d545c337 created
sudo virsh secret-set-value --secret 457eb676-33da-42ec-9a8c-9293d545c337 --base64 $(cat client.cinder.key) && rm client.cinder.key secret.xml
保留密钥的 uuid ,稍后配置 nova-compute
要用。
Important
You don’t necessarily need the UUID on all the compute nodes. However from a platform consistency perspective, it’s better to keep the same UUID.
让 OpenStack 使用 Ceph¶
配置 Glance¶
Glance 可使用多种后端存储映像,要让它默认使用 Ceph 块设备,可以这样配置 Glance 。
低于 Juno 的版本¶
编辑 /etc/glance/glance-api.conf
并把下列内容加到
[DEFAULT]
段下:
default_store = rbd
rbd_store_user = glance
rbd_store_pool = images
rbd_store_chunk_size = 8
Juno 版¶
编辑 /etc/glance/glance-api.conf
并把下列内容加到
[glance_store]
段下:
[DEFAULT]
...
default_store = rbd
...
[glance_store]
stores = rbd
rbd_store_pool = images
rbd_store_user = glance
rbd_store_ceph_conf = /etc/ceph/ceph.conf
rbd_store_chunk_size = 8
Important
Glance 还没完全迁移到 ‘store’ ,所以我们还得在 DEFAULT 段下配置 store 。
Kilo 及更高版¶
编辑 /etc/glance/glance-api.conf
并把下列内容加到
[glance_store]
段下:
[glance_store]
stores = rbd
default_store = rbd
rbd_store_pool = images
rbd_store_user = glance
rbd_store_ceph_conf = /etc/ceph/ceph.conf
rbd_store_chunk_size = 8
关于 Glance 的其它可用选项见 OpenStack Configuration Reference: http://docs.openstack.org/ 。
让映像支持写时复制克隆功能¶
注意,这里通过 Glance 的 API 展示了后端位置,所以此选项启用时的入口不能公开访问。
仅适用于 Mitaka¶
要启用映像的多位置、并利用写时复制克隆功能,把下列配置加入
[DEFAULT]
段:
show_multiple_locations = True
show_image_direct_url = True
禁用缓存管理(任意 OpenStack 版本):¶
禁用 Glance 缓存管理,以免映像被缓存到 /var/lib/glance/image-cache/
下;假设你的配置文件里有 flavor = keystone+cachemanagement
[paste_deploy]
flavor = keystone
映像属性¶
我们建议你配置如下映像属性:
hw_scsi_model=virtio-scsi
: 添加 virtio-scsi 控制器以获得更好的性能、并支持 discard 操作;hw_disk_bus=scsi
: 把所有 cinder 块设备都连到这个控制器;hw_qemu_guest_agent=yes
: 启用 QEMU guest agent (访客代理)os_require_quiesce=yes
: 通过 QEMU guest agent 向外发送文件系统的 freeze/thaw 调用
配置 Cinder¶
OpenStack 需要一个驱动和 Ceph 块设备交互,还得指定块设备所在的存储池名字。编辑 OpenStack 节点上的
/etc/cinder/cinder.conf
,添加:
[DEFAULT]
...
enabled_backends = ceph
glance_api_version = 2
...
[ceph]
volume_driver = cinder.volume.drivers.rbd.RBDDriver
volume_backend_name = ceph
rbd_pool = volumes
rbd_ceph_conf = /etc/ceph/ceph.conf
rbd_flatten_volume_from_snapshot = false
rbd_max_clone_depth = 5
rbd_store_chunk_size = 4
rados_connect_timeout = -1
如果你在用 cephx 认证,还需要配置用户及其密钥(前述文档中存进了 libvirt
)的 uuid :
[ceph]
...
rbd_user = cinder
rbd_secret_uuid = 457eb676-33da-42ec-9a8c-9293d545c337
注意:如果你想配置多个 cinder 后端, glance_api_versio = 2
必须放到 [DEFAULT
段下。
Cinder Backup 的配置¶
OpenStack Cinder Backup 需要专有守护进程,所以别忘了安装。在你的 Cinder Backup 节点上,编辑 /etc/cinder/cinder.conf
并加上:
backup_driver = cinder.backup.drivers.ceph
backup_ceph_conf = /etc/ceph/ceph.conf
backup_ceph_user = cinder-backup
backup_ceph_chunk_size = 134217728
backup_ceph_pool = backups
backup_ceph_stripe_unit = 0
backup_ceph_stripe_count = 0
restore_discard_excess_bytes = true
让 Nova 对接 Ceph RBD 块设备¶
要连接 Cinder 设备(普通块设备或从卷宗引导),必须告诉 Nova (和 libvirt )连接时用哪个用户和 UUID , libvirt 连接 Ceph 集群或与之认证时也会用这个用户:
[libvirt]
...
rbd_user = cinder
rbd_secret_uuid = 457eb676-33da-42ec-9a8c-9293d545c337
Nova 的 ephemeral 后端也会用这两条配置。
Nova 的配置¶
要让所有虚拟机直接从 Ceph 引导,必须配置 Nova 的 ephemeral 后端。
我们建议在 Ceph 配置文件里启用 RBD 缓存(从 Giant 起默认启用);另外,启用管理套接字对于故障排查来说大有好处,给每个使用 Ceph 块设备的虚拟机分配一个套接字有助于调查性能和/或异常行为。
可以这样访问套接字:
ceph daemon /var/run/ceph/ceph-client.cinder.19195.32310016.asok help
编辑所有计算节点上的 Ceph 配置文件:
[client]
rbd cache = true
rbd cache writethrough until flush = true
admin socket = /var/run/ceph/guests/$cluster-$type.$id.$pid.$cctid.asok
log file = /var/log/qemu/qemu-guest-$pid.log
rbd concurrent management ops = 20
调整这些路径的权限:
mkdir -p /var/run/ceph/guests/ /var/log/qemu/
chown qemu:libvirtd /var/run/ceph/guests /var/log/qemu/
要注意, qemu
用户和 libvirtd
组可能因系统不同而不同,前面的实例基于 RedHat 风格的系统。
Tip
如果你的虚拟机已经跑起来了,重启一下就能得到套接字。
Havana and Icehouse¶
Havana and Icehouse require patches to implement copy-on-write cloning and fix bugs with image size and live migration of ephemeral disks on rbd. These are available in branches based on upstream Nova stable/havana and stable/icehouse. Using them is not mandatory but highly recommended in order to take advantage of the copy-on-write clone functionality.
On every Compute node, edit /etc/nova/nova.conf
and add:
libvirt_images_type = rbd
libvirt_images_rbd_pool = vms
libvirt_images_rbd_ceph_conf = /etc/ceph/ceph.conf
disk_cachemodes="network=writeback"
rbd_user = cinder
rbd_secret_uuid = 457eb676-33da-42ec-9a8c-9293d545c337
It is also a good practice to disable file injection. While booting an
instance, Nova usually attempts to open the rootfs of the virtual machine.
Then, Nova injects values such as password, ssh keys etc. directly into the
filesystem. However, it is better to rely on the metadata service and
cloud-init
.
On every Compute node, edit /etc/nova/nova.conf
and add:
libvirt_inject_password = false
libvirt_inject_key = false
libvirt_inject_partition = -2
为确保在线迁移能顺利进行,要使用如下标志:
libvirt_live_migration_flag="VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE,VIR_MIGRATE_PERSIST_DEST,VIR_MIGRATE_TUNNELLED"
Juno¶
In Juno, Ceph block device was moved under the [libvirt]
section.
On every Compute node, edit /etc/nova/nova.conf
under the [libvirt]
section and add:
[libvirt]
images_type = rbd
images_rbd_pool = vms
images_rbd_ceph_conf = /etc/ceph/ceph.conf
rbd_user = cinder
rbd_secret_uuid = 457eb676-33da-42ec-9a8c-9293d545c337
disk_cachemodes="network=writeback"
It is also a good practice to disable file injection. While booting an
instance, Nova usually attempts to open the rootfs of the virtual machine.
Then, Nova injects values such as password, ssh keys etc. directly into the
filesystem. However, it is better to rely on the metadata service and
cloud-init
.
On every Compute node, edit /etc/nova/nova.conf
and add the following
under the [libvirt]
section:
inject_password = false
inject_key = false
inject_partition = -2
为确保在线迁移能顺利进行,要使用如下标志(写到 [libvirt]
段下):
live_migration_flag="VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE,VIR_MIGRATE_PERSIST_DEST,VIR_MIGRATE_TUNNELLED"
Kilo¶
为虚拟机的 ephemeral 根磁盘启用 discard 功能:
[libvirt]
...
...
hw_disk_discard = unmap # 启用 discard 功能(注意性能)
重启 OpenStack¶
要激活 Ceph 块设备驱动、并把块设备存储池名载入配置,必须重启 OpenStack 。在基于 Debian 的系统上需在对应节点上执行这些命令:
sudo glance-control api restart
sudo service nova-compute restart
sudo service cinder-volume restart
sudo service cinder-backup restart
在基于 Red Hat 的系统上执行:
sudo service openstack-glance-api restart
sudo service openstack-nova-compute restart
sudo service openstack-cinder-volume restart
sudo service openstack-cinder-backup restart
一旦 OpenStack 启动并运行正常,应该就可以创建卷宗并用它引导了。
从块设备引导¶
你可以用 Cinder 命令行工具从一映像创建卷宗:
cinder create --image-id {id of image} --display-name {name of volume} {size of volume}
注意映像必须是 RAW 格式,你可以用 qemu-img 转换格式,如:
qemu-img convert -f {source-format} -O {output-format} {source-filename} {output-filename}
qemu-img convert -f qcow2 -O raw precise-cloudimg.img precise-cloudimg.raw
Glance 和 Cinder 都使用 Ceph 块设备时,此镜像又是个写时复制克隆,就能非常快地创建新卷宗。在 OpenStack 操作板里就能从那个卷宗引导,步骤如下:
启动新例程;
选择与写时复制克隆关联的镜像;
选中 ‘boot from volume’ ;
选中你刚创建的卷宗。