How to enable Ceph RBD mirroring

Lots of talk but not a lot of info online. This worked for me

I'm enabling replication t pool level here. Not only images that have the 'journaling' feature enabled will be mirrored irrespective of the state of your pool. So to clarify, you need to enable replication on the pool AND enable the journaling feature on the image itself.

1 - Make sure the pool exists on the local AND remote clusters.

In this case our local cluster is ceph(Or default) and our remove cluster is adleast

On the ADLEast cluster run ceph osd pool create ADLWEST-vms

2 - Enable mirroring on the pool on both clusters

rbd mirror pool enable ADLWEST-vms pool

rbd mirror pool enable ADLWEST-vms pool --cluster adleast

3 - Add peers to the pool

rbd --cluster adleast mirror pool peer add ADLWEST-vms client.admin@ceph

rbd mirror pool peer add ADLWEST-vms client.admin@adleast

4 - Enable replication on the desired images in the pool

rbd feature enable ADLWEST-vms/VM-Cacti.raw journaling --journal-pool ADLWEST-journal

Note the journal-pool argument, this allows you to send all the journal data for that VM to a different pool, this might help you reduce the performance impact of journaling\mirroring on your cluster. Your journal will need to be as fast if not faster thatn the actual pool the image resides in else it will become a bottleneck. Also a ==really important gotcha==, if you are using KVM(Or anything with cephx authentication i guess) the user account you are using to access the cluster(Cinder for example!?) MUST have access to this pool, otherwise you IO access will just hang inexplicably! Trust me, i learnt this one the hard way!

Useful script

List the info on all images in a pool

1rbd ls -p $1 |
2  while IFS= read -r line
3  do
4    rbd mirror image status $1/$line
5  done

Should yield a result like

bash checkMirrorStatus.sh ADLWEST-vms

 1ADLWest-RGW-LB02.raw:
 2  global_id:   4196f19b-3ddb-4dce-a15d-0a281898298d
 3  state:       up+stopped
 4  description: remote image is non-primary or local image is primary
 5  last_update: 2017-06-25 19:25:47
 6ADLWest-RGW02.raw:
 7  global_id:   859a6377-9872-4f0f-9c5f-4cb69bcf101d
 8  state:       up+stopped
 9  description: remote image is non-primary or local image is primary
10  last_update: 2017-06-25 19:25:47
11ADLWest-Tunnel1.raw:
12  global_id:   0e36b8bd-cf07-42e7-8875-ea4e63f9dcfa
13  state:       up+stopped
14  description: remote image is non-primary or local image is primary
15  last_update: 2017-06-25 19:25:47
16VM-ADLWest-PRTG.raw:
17  global_id:   473c6d0a-4e6b-492b-a143-b240e4b6194d
18  state:       up+stopped
19  description: remote image is non-primary or local image is primary
20  last_update: 2017-06-25 19:25:47
21VM-Cacti.raw:
22  global_id:   1aac9fbd-0eb4-47bd-a7f7-c7e0adebb5a9
23  state:       up+stopped
24  description: remote image is non-primary or local image is primary
25  last_update: 2017-06-25 19:25:47
26VM-OS-Net02.raw:
27  global_id:   ee6532e1-c11f-4728-a327-559e91eee39e
28  state:       up+stopped
29  description: remote image is non-primary or local image is primary
30  last_update: 2017-06-25 19:25:47
31VM-SMTP01.raw:
32  global_id:   4fb8a975-54e4-486a-a119-ae741c4163af
33  state:       up+stopped
34  description: remote image is non-primary or local image is primary
35  last_update: 2017-06-25 19:25:47

Useful command

Show the status of your image replication

rbd mirror image status ADLWEST-vms/VM-Cacti.raw

 1VM-Cacti.raw:
 2  global_id:   1aac9fbd-0eb4-47bd-a7f7-c7e0adebb5a9
 3  state:       up+stopped
 4  description: remote image is non-primary or local image is primary
 5  last_update: 2017-06-25 18:58:16
 6**rbd mirror image status ADLWEST-vms/VM-Cacti.raw --cluster=adleast**
 7VM-Cacti.raw:
 8  global_id:   1aac9fbd-0eb4-47bd-a7f7-c7e0adebb5a9
 9  state:       up+syncing
10  description: bootstrapping, IMAGE_COPY/COPY_OBJECT 21%
11  last_update: 2017-06-25 18:58:50

Then when the replication is done you'll see something like this

rbd mirror image status ADLWEST-vms/VM-Cacti.raw

1VM-Cacti.raw:
2global_id:   1aac9fbd-0eb4-47bd-a7f7-c7e0adebb5a9
3state:       up+stopped
4description: remote image is non-primary or local image is primary
5last_update: 2017-06-25 19:23:18

rbd mirror image status ADLWEST-vms/VM-Cacti.raw --cluster=adleast

1VM-Cacti.raw:
2  global_id:   1aac9fbd-0eb4-47bd-a7f7-c7e0adebb5a9
3  state:       up+replaying
4  description: replaying, master_position=[object_number=21, tag_tid=0, entry_tid=57097], mirror_position=[object_number=6, tag_tid=0, entry_tid=10886], entries_behind_master=46211
5  last_update: 2017-06-25 19:22:57

I believe that 'e_ntries_behind_master'_ is something along the lines of how far behind the replication of the master vs the slave is. So if you have a write heavy VM it might fall quite far behind the master. But an idle VM should show zero