To my astonishment, this has consistently worked:
# dump -0 -f - / | ssh backup-server.example.com "cd /vault && cat > dump.0"
There's a little more to it in that I specify ssh keys, ports, per-server destination, etc. but that's essentially the command. I've examined the dump file on the backup server and done restores over it, etc. Of course, change level 0 for any level you like.
So, um, why isn't everyone using dump for backing up their VMs? I mean, I'm doing this over the WAN and ending up with a nice full/incremental rotation, I can pull out subsets for restore, it's compressed/secure, I could probably pipe a gpg encryption in there if I wished...
Let's Do Some Tests
Backup source: 1-core, 768M Vultr in Seattle.
Backup destination: DO in NYC. ~28ms.
Backing up:
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 30G 11G 18G 39% /
Dump command: dump -0 -f - /
Where compression is listed, -z, -z5, or -z9
Test Results (SCIENCE!)
Compression Level Time Dump Size Source Server Impact
None 3m28.997s 11.0G nil (6% cpu)
2 (default) 3m36.289s 7.0G noticeable if you look
5 4m5.272s 6.9G noticeable even if you don't look
9 (max) 6m5.260s 6.9G this is all you're doing
I'm being comical on the source server impact, but for example with level 5 or 9, the load average was well over 2.0, while with level 2 it was usually around 1.0
Destination side barely showed load - sshd was using 6% of CPU.
I was being lazy and using du -sh...I'm sure level 9 is a little smaller than level 5, but not so much that I'd care.
Of course, these are all full backups of the entire OS and in practice, I'd exclude some things (/tmp, etc.) and the daily incrementals would be much, much smaller (files changed since yesterday, compressed).
Given SSD disk speed these days, I think one could do a level 0 less frequently than the traditional once a week...more incrementals to play back but SSDs are fast.
Honorable Intentions
This method seems to meet all my needs/wants:
- captures everything - default include, not default "remember to include"
- can do incrementals which saves on my bandwidth
- can extract a subset of files to restore.
- encrypted in transit
- compressible
- haven't played with encryption yet but that's just a gpg command in the pipeline before ssh
- doesn't require staging space on the client
- can run unattended with passwordless ssh.
- on the backup server, I can move the backups somewhere out of the clients' access once backups are done, and the client doesn't depend on looking at that for an rsync-type incremental (and can't destroy backups with a malicious rsync)
Only negative is that I'd prefer to go over sftp so the client is completely locked down and limited to sftp only. But I can chroot the client into an incoming directory where he can only put files and not escape to do anything else.
I was concerned that maybe going over the WAN would result in broken connections, etc. but I just did half a dozen transcontinental dumps (please, no crude humor) and things seem to be working fine...
Someone stop me before I fall in love with this solution, get it pregnant, and elope to Buffalo.