For *nix users, the answer is easy: rsync. For Macintosh users, the answer is even simpler: Time Machine (“time ‘sheen”). For Windows, the answer is a convoluted mess of choices. And the problem is that none of those choices give everything you want.
Why can’t you have everything? Here’s all of the things a backup program needs:
- Permissions. If you can’t preserve your metadata, forget about making faithful backups. POSIX and Windows permissions are very different, but they still deserve the same love.
- Resilience. The restore part of a program should never produce a fatal error, unless a backup has been corrupted beyond repair. If a part has been corrupted, ignore that part, notify the user that a corrupted portion was ignored (noting, of course, what the corrupted portion actually is), and continue with the restore process.
- Compression. Many would argue that compression only makes the backup more difficult to restore, yields a minimal return in efficiency, etc. However, this can make a large difference when uploading from a personal home network to a storage service, where storage costs are billed by the gigabyte. I don’t know about you, but $1 a month was more than my tax return this year.
- Encryption. Everyone’s got their tinfoil hats on, how about you?
- Incremental backups. People are not going to do full backups every week. This is a waste of time, storage space, and energy, since most files would be redundantly stored.
- Block-level. If you modified a 20 GB VHD file, are you going to copy that whole thing on every weekly incremental backup? No, you’re going to copy the differences in blocks/parts of that file.
- Archivable. It appears most people choose either image-based backups or file-based backups. I personally prefer at the file level, but this should not mean “copy millions of files and spew them on the target directory.” The backup should be neatly organized in, say, 50 MB parts that can be easily uploaded to a cloud service as part of a future backup plan. Or, it can just be made as a monolithic 800 GB file. The former is workable by most consumer file services, while the latter is most convenient for more enterprise-oriented services like Amazon Glacier.
- Resumable. Most backup programs hate it when you shut down your computer for the night. Yet none of them seem to understand that this is exactly what shadow copies are for. Even after shutting down the computer, shadow copies do not magically change. Yet the software goes, restarts your entire backup, and creates yet another useless shadow copy for the mere sake of not wanting to touch files in use and making the most up-to-date backup possible.
- Snapshots. Let’s say I don’t want to restore my whole computer; I just want to see an old file and its version changes over time. Most backup programs will not let you do that, citing that it is “too complex.” No, it’s not. Track the files the software backed up, using a tiny database like SQLite. There, you can store checksums, file sizes, previous versions, and so on and so forth. The suffering ends there. The end user can view a snapshot of the computer at a certain point in time, or view the history of a specific file, perhaps with diffs (binary diffs if the backup software is user-friendly enough).
- Low profile. What is CloudBerry Backup using 2.7 GB of memory for? Just flapping around? No! Decent backup software should use 100 MB of memory, tops. Leave the heavy RAM consumption to browsers, games, and servers.
- Integration. This backup software should be robust enough to make anything either a source or a destination for backups, notwithstanding the limitations of each backup medium.
- Least liquid: Offline local storage; Amazon Glacier; Google Coldline
- Somewhat liquid: FTP (due to its slow transfer speed of many files and inability to perform multipart transfers); most consumer storage services
- Most liquid: iSCSI SANs; high-availability storage services
- Drive path-agnostic. A backup software should never, ever depend on drive letters to figure out backup sources and targets.
- Predict drive failure. This goes somewhat beyond the scope of a backup software, but there should be at least some kind of periodic SMART monitor to inform and warn a user of a drive that is indicating signs of failure. Yes, put a big popup on the notification bar with a scary message like “Your drive might fail soon” or just outright “Your drive is failing.” Show it to them the first three days, make it go away, and then show it to them the next week. Of course, the notification can be removed for a specific drive, but it will require them to read a message about possibly losing data on the failing drive, wait 5 seconds to close the dialog, and now they never have to see the dialog for that drive again.
- Recognize cache folders. Here’s what you need to do: just stick that CCleaner scanning stuff into your product. Make the default backup plan ignore whatever CCleaner would usually clean up. Caches can add up to be gigabytes of size, and many users do not even care about including them in their backups, because all they want are their programs and documents. However, there is that one company that might say, “no you can’t ignore cache folders because we need a perfect file-level backup of the system tree.” (My argument would be to use CloneZilla and do it at the image level – but fine.)
- Import from other services. No, I don’t care much about Acronis, Veeam, or other proprietary solutions. What I do care about, however, are the crappy Windows 7 Backup and Restore backups, dd “backups,” and other image-level backup formats. Don’t just import the backups: import file history, recompress them, preserve timestamps. Give them the full treatment, and put them neatly in the new backup format as if it really were an old backup.
- Responsive (and responsible) backend. Big enterprise backup software uses a UI frontend, which merely communicates with the service backend. This is generally a good design. However, when the backend decides to quit, the UI frontend goes into limbo and does not respond to any commands, instead of providing a reasonable explanation to what is happening with the backend, while the backend does not attempt to halt whatever blocking operation that is taking too long. The gears just grind to a halt, and nothing can get done on either side.
- Don’t delete anything without asking. No, I don’t even want an auto-purge functionality, and if you do, for the love of God, make it a manual operation. There is no reason to keep purging things constantly, unless you have a disk quota to work under – in that case, the software should determine what is best to purge (start with the big stuff, at the earliest backup) to meet the size requirement.
- Only one backup mode. That backup mode better be good, and it should have a hybrid format.
- Open-source format. The software itself may not be open-source, but you are essentially ensuring that someone out there can make a restore software that can always be compatible with the latest and greatest operating systems.
- Bootable. Where are you going to make your restores from? A flash drive running Linux with an ncurses interface for your backup software, obviously. You could, of course, allow backups from that same bootable drive, in the case of an infected drive or as part of a standard computer emergency response procedure – but eh, that’s really pushing it. Just restores will do fine.
- Self-testable. Make sure the backups can actually restore to something.
- Exportable. One day, your backup software will not be relevant anymore, so why bother locking in users to your format? Make it so that they can export full archives of their backups, with a CSV sheet explaining all of the contents of each archive.
At the end of the day, users just want their files safe and sound, so keep the software as close to the fundamentals as possible, and allow others to make tools around the backup software if additional functionality is needed.