Skip to main content
Version: 4.1

Disaster Recovery

VirtFusion 2.3+ supports DR (backup/restore) to minimize downtime and data loss in the event of a disaster.

Data can either be stored in S3 compatible object storage or on a local disk partition.

For a list of S3 compatible providers, see here.

This may look complicated, but it really isn't. VirtFusion will try it's very best to produce consistent backups that can be restored easily. You just need to configure it on the hypervisor the way you want.

caution

Although disaster recovery is supported on hypervisors that utilize shared storage, it will only create backups of locally stored disks and configurations. All shared storage (i.e Ceph) disks will be ignored.

It's expected that later versions of VirtFusion will incorporate a GUI component for disaster recovery.

Hypervisor Preparation

S3

You only need to follow this step if you want to use object storage.

Installing the AWS CLI tools

To install the CLI tool, unzip is required on the hypervisor. It should be installed using your distro package manager. i.e apt install unzip or dnf install unzip.

Once you have unzip, install the tool.

cd /tmp
curl https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip -o awscliv2.zip
unzip awscliv2.zip
./aws/install
rm -rf /tmp/aws
rm -f /tmp/awscliv2.zip

The official AWS instructions can be found here.

Configuring the AWS config profile

Create the file /root/.aws/config.

mkdir -p /root/.aws
nano -w /root/.aws/config

In that file you can configure your access credentials for your S3 compatible service.

[profile virtfusion]
aws_access_key_id=ACCESS_KEY
aws_secret_access_key=SECRET_ACCESS_KEY
retry_mode = standard
max_attempts = 3
s3 =
#max_concurrent_requests = 20
#max_bandwidth = 5MB/s
addressing_style = path

You should leave the profile name as virtfusion.

Local Storage

Nothing extra needs to be configured for local storage.

Configuring VirtFusion

Create the file /home/vf-data/conf/dr.json.

nano -w /home/vf-data/conf/dr.json

In that file you can configure your S3 bucket, endpoint and region or/and your local storage.

{
"type": "S3",
"storeDirType": "once",
"tmpDir": "home/vf-data/tmp",
"threads": 1,
"S3": {
"endpoint": "https://s3.eu-west-1.wasabisys.com",
"region": "eu-west-1",
"bucket": "backups",
"compress": true,
"compressLevel": 1,
"prune": {
"enabled": true,
"hoursGrace": 168,
"keepLast": 30
}
},
"local": {
"path": "backups",
"compress": true,
"compressLevel": 1,
"prune": {
"enabled": true,
"hoursGrace": 168,
"keepLast": 30
}
}
}
  • type should be either S3 or local
  • storeDirType may be one of the following:
TypeExample PathDescription
oncevirtfusion_dr/[HV_ID]/onceThe once type will create a static directory named once, meaning each time a backup runs, the files will be overwritten.
Y-M-D-Hvirtfusion_dr/[HV_ID]/y_m_d_h/[2023_06_01_04]Typically this will create a new directory every hour.
Y-M-Dvirtfusion_dr/[HV_ID]/y_m_d/[2023_06_01]This will create a new directory each day.
Y-Mvirtfusion_dr/[HV_ID]/y_m/[2023_06]This will create a new directory each month.
Y-Wvirtfusion_dr/[HV_ID]/y_w/[2023_03]This will create a directory weekly (Weeks starting on Monday).
DWvirtfusion_dr/[HV_ID]/dw/[1-7]This will create a directory for the day number of the week (1 for Monday, 7 for Sunday).

All date style types are based on UTC.

  • tmpDir is where temporary snapshots will be stored. You don't need to include a / at the start or end.

  • threads how many backups can be running at any one time. The option is currently ignored and only 1 thread will be used until a future version.

  • S3.endpoint should be the URL to your S3 service.

  • S3.region is the S3 region of your S3 service.

  • S3.bucket should be the name of the bucket you created to store the backups.

  • S3.compress will enable lz4 compression on server disks. Enabling compression will usually slow down the backup process, especially on a fully loaded system, but saves storage space. It's advised to keep this option enabled.

  • S3.compressLevel (1 to 16) 1 = fast (default), 16 = slow. Be warned - 16 will be better compression, but it will be very very very very slow! It's advised to keep this option set to 1.

  • S3.prune.enabled can be true or false. This option will enable or disabled backup pruning.

  • S3.prune.hoursGrace is only used for the once storeDirType setting and will wait the specified number of hours before deleting files that no longer belong to the hypervisor.

  • S3.prune.keepLast is used for all date style storeDirType settings and is used to specify how many of the newest folders should be kept on storage.

  • local.path should be the local path where you want to store the backups. You don't need to include a / at the start or end.

  • local.compress will enable lz4 compression on server disks. Enabling compression will usually slow down the backup process, especially on a fully loaded system, but saves storage space. It's advised to keep this option enabled.

  • local.compressLevel (1 to 16) 1 = fast (default), 16 = slow. Be warned - 16 will be better compression, but it will be very very very very slow! It's advised to keep this option set to 1.

  • local.prune.enabled can be true or false. This option will enable or disabled backup pruning.

  • local.prune.hoursGrace is only used for the once storeDirType setting and will wait the specified number of hours before deleting files that no longer belong to the hypervisor.

  • local.prune.keepLast is used for all date style storeDirType settings and is used to specify how many of the newest folders should be kept on storage.

info

It's probably wise to backup both /root/.aws/config and /home/vf-data/conf/dr.json for easy access in case of a fatal hypervisor failure.

Creating Backups

Important
  • If you run a backup manually, it's highly advised to run the command within a screen session.
  • Even though signals are caught and the DR routine will attempt to exit cleanly, resist the urge to ctrl+c when a backup is running. Due to the nature of external snapshots, this may cause inconsistency on disk, which can be hard to repair.
  • When a backup runs on a server disk, some of the server actions like reboot, reinstall, shutdown etc... will be blocked at hypervisor level. Once the backup process has been completed on that disk, the actions will be unlocked.

Generate a list of files that will be included in a backup (with filesize estimates)

vfcli-hv dr:backup --estimate-only

It will generate something similar to this.

+---------------------------------------------------------------+---------+-----------+
| Item | Type | Size |
+---------------------------------------------------------------+---------+-----------+
| /home/vf-data/disk/37fc81d8-402c-41a1-8f7b-5f421c7e2d20_1.img | file | 3.81 GB |
| /home/vf-data/disk/d33ab991-ec0e-491b-b38a-0999f01bbfc9_1.img | file | 6.76 GB |
| /home/vf-data/disk/6ba9b79c-0168-4988-ba6e-b7674c7c2c08_1.img | file | 1.8 GB |
| /home/vf-data/disk/6ba9b79c-0168-4988-ba6e-b7674c7c2c08_2.img | file | 6.69 MB |
| /home/vf-data/disk/6ba9b79c-0168-4988-ba6e-b7674c7c2c08_3.img | file | 6.69 MB |
| /home/vf-data/server | dir | 1.11 MB |
| /home/vf-data/stats | dir | 109.85 MB |
| /home/vf-data/events | dir | 0 B |
| /home/vf-data/conf | dir | 802 B |
| /opt/virtfusion/app/hypervisor/storage/stats | dir | 3.04 KB |
| /opt/virtfusion/app/hypervisor/storage/logs | dir | 314.9 KB |
| /opt/virtfusion/app/hypervisor/storage/nat | dir | 136 B |
| /opt/virtfusion/app/hypervisor/conf/auth.json | file | 339 B |
| /etc/haproxy/haproxy.cfg | file | 1.35 KB |
| /opt/virtfusion/app/hypervisor/database/database.sqlite | file | 12 KB |
| /opt/virtfusion/app/hypervisor/database/queue.sqlite | file | 16 KB |
|---------------------------------------------------------------|---------|-----------|
| Total (approx) | 12.49 GB |
+---------------------------------------------------------------+---------+-----------+

Backup everything

vfcli-hv dr:backup

Backup only specific servers

vfcli-hv dr:backup --only-servers=1754,1253,1002

Backup everything except disks that already exist on the backup storage

vfcli-hv dr:backup --only-missing-disks

Backup everything but exclude specific servers

vfcli-hv dr:backup --exclude-servers=1754,1253,1002

--exclude-servers= should be a comma seperated list of server ids.

Backup everything except the system data (only server disks)

vfcli-hv dr:backup --exclude-system-data

Automating Backups

Obviously you will want to automate your backups. You can do this using a cronjob or a systemd timer.

Cronjob

Create a file in /etc/cron.d/ called virtfusion_dr and use any of the following, or add your own.

Run daily at 2am

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
0 2 * * * root /usr/bin/vfcli-hv dr:backup >/dev/null 2>&1

Run every Friday at 2am

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
0 2 * * FRI root /usr/bin/vfcli-hv dr:backup >/dev/null 2>&1

Run on the first day of the month at 2am

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
0 2 1 * * root /usr/bin/vfcli-hv dr:backup >/dev/null 2>&1

Run every day of week from Monday through Friday at 2am

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
0 2 * * 1-5 root /usr/bin/vfcli-hv dr:backup >/dev/null 2>&1

Restoring Backups

SOURCE_PATH should be the path after the hypervisor ID. For example if you use Y-M-D for the storeDirType, the path would be y_m_d/2023_06_01 (swap out the date you want) or if you use once it would be once.

Restoring system files

vfcli-hv dr:restore --mode=system --source=once --system-opts=haproxy,eventhooks,config

--system-opts= should be a comma seperated string of any of the following options.

OptionDescription
haproxyRestore the HAProxy configuration files.
eventhooksRestore all custom event hooks.
configRestore all system configuration files and logs.

Restoring all servers

vfcli-hv dr:restore --mode=server --servers=all --source=[SOURCE_PATH] --power-control=true

Restoring specific servers

You may restore specific servers based on their ID using the --servers= argument.

vfcli-hv dr:restore --mode=server --servers=1,5,7,45,193 --source=[SOURCE_PATH] --power-control=true

You can also use the following command to fetch a pre-populated --servers argument.

vfcli-hv dr:restore --mode=servers-arg

Which would output something like the following.

--servers=1387,1390,1393,1465,1698,1794

Restoring after a disaster (Replacement hypervisor)

If the dreaded does happen, you can use DR to restore your backups on to the replacement hypervisor.

Important

You should NOT remove the failed hypervisor from VirtFusion. It MUST remain, as this is where your virtual servers are linked.

Hypervisor preparation

You've now got to the stage where you have a new hypervisor, and it has the base OS installed.

The next step is to run the VirtFusion installer as you normally would when installing a new hypervisor. This also includes setting up your network.

Configuring access to your backups

Once you have VirtFusion installed and a working bridge (or whatever you use for networking), you need to link the new hypervisor to your control server. This can be done using a special DR mode called get-auth.

Before you can run this mode, you need to setup access to your backup storage. You should follow this previous step.

Re-connecting the hypervisor to the control server

You should now have working access to your storage. You can run the following command to download the auth.json file that is used to connect the control server to the hypervisor.

vfcli-hv dr:restore --mode=get-auth --source=[SOURCE_PATH]

SOURCE_PATH must include the hypervisor ID as listed in VirtFusion. For example: --source=26/y_m_d/2023_06_06

If all goes well, you should see something like the following.

root@test:~# vfcli-hv dr:restore --mode=get-auth --source=26/y_m_d/2023_06_06
S3 connection check success!
Found backup storage source: virtfusion_dr/26/y_m_d/2023_06_06

Are you sure you would like to continue? (yes/no) [no]:
> y
download: s3://backup-vf/virtfusion_dr/26/y_m_d/2023_06_06/system/auth.json to ../opt/virtfusion/app/hypervisor/conf/auth.json
Downloaded successfully
root@test:~#

The hypervisor should now be linked again and the test connection button in VirtFusion should report success.

info

Worse case, you can manually download the auth.json and save it as /opt/virtfusion/app/hypervisor/conf/auth.json.

Restoring the system files

The following command will restore the VirtFusion system files.

vfcli-hv dr:restore --mode=system --system-opts=all --source=[SOURCE_PATH]

SOURCE_PATH should NOT include the hypervisor ID. Now the control server is linked, It already knows the ID. For example: --source=y_m_d/2023_06_06

Restoring the servers

The following command will restore all servers and configure them.

vfcli-hv dr:restore --mode=server --servers=all --power-control=true --source=[SOURCE_PATH]

That should be it. The --power-control=true option should have taken care of booting all the servers (or keeping them turned off if suspended) and they should now be online.

Pruning Backups

You can remove any expired backups using the following command. This command will use the prune.[value] settings that were configured in this section.

You may also set this command as a cronjob.

vfcli-hv dr:prune

Known Issues

IssueDescriptionAction/Fix/Workaround
CentOS/CloudLinux system freezeThe QEMU guest agent freezes the file systems so no single change will be made during the backup, but the guest agent does not respect the loop* devices in freezing order which leads to a hung task and kernel crash.

This is a bug in the guest agent, not VirtFusion.
Disable qemu-guest-agent

S3 Compatible Storage Providers

A non-exhaustive list of providers.

ProviderAmericaEurope (EMEA)Asia (APAC)
iDrive e2Oregon, Los Angeles, Virginia, Chicago, Miami, Dallas, San Jose, Phoenix, Montreal (Canada)Ireland, London, Madrid, Paris, FrankfurtSingapore
WasabiOregon, Virginia, Plano (TX), Toronto (Canada)London, Paris, Amsterdam, FrankfurtTokyo, Osaka, Sydney, Singapore
Cloudflare R2Western North America, Eastern North AmericaWestern Europe, Eastern EuropeAsia-Pacific
Backblaze B2US East, US WestEU Central-
ContaboUnited StatesGermanySingapore
Scaleway-Amsterdam, Paris, Warsaw-
DigitalOceanNew York City, San FranciscoAmsterdam, FrankfurtSingapore, Sydney
Huawei OBSMexico City, Santiago, Sao PauloJohannesburg, IstanbulBangkok, Singapore, Shanghai, Beijing, Guangzhou, Hong Kong
VultrNew Jersey, Silicon VallyAmsterdamDelhi, Bangalore, Singapore
OVH-Gravelines, Strasbourg, Frankfurt, Beauharnois, Roubaix, Warsaw, London-
IONOS-Frankfurt, Berlin, Logrono (Spain)-
Alibaba OSSSilicon Valley, Virginia,Frankfurt, London, DubaiHangzhou, Shanghai, Nanjing, Qingdao, Beijing, Zhangjiakou, Hohhot, Ulanqab, (Shenzhen, Heyuan, Guangzhou, Chengdu, Hong Kong, Tokyo, Seoul, Singapore, Sydney, Kuala Lumpur, Jakarta, Manila, Bangkok, Mumbai
DreamHost DreamObjectsUS East--
Amazon S3Ohio, N. Virginia, N. California, Oregon, Canada, São PauloCape Town, Frankfurt, Ireland, London, Milan, Paris, Stockholm, Spain, Zurich, Bahrain, UAEHong Kong, Hyderabad, Jakarta, Melbourne, Mumbai, Osaka, Seoul, Singapore, Sydney, Tokyo, Beijing, Ningxia

You can also host your own with solutions like MinIO.