Outdoor Photography and Videography

Tobi Wulff Photography

Home / Essays / 2015 / October / Backup strategy for photos and videos on Linux

Backup strategy for photos and videos on Linux

This is a continuation of my previous Workflow article. In the workflow article I've linked to Chase Jarvis' excellent videos and blog posts about working with digital media and keeping it safe. I highly recommend to watch them if you are interested in a more in-depth look at workflow and backups.

When I started getting into photography beyond simple point-and-shoot or cellphone snapshots, I realised that leaving all my valuable photos in one location without a decent backup plan would be too risky. I've never had a hard drive suddenly die on me but I'm sure that one day it will happen - suddenly or, maybe even worse, gradually. Not that my old snapshots weren't valuable but RAW photography suddenly involved much more data and also more high-quality prints, competitions, and a growing portfolio. Of course, I had done backups to an external hard drive and this is often as far as most people take it. This prevents data loss due to a sudden one-disk failure. However, even my older photos which were taken during some of the most memorable and important periods of my life were always just backed up to that other drive in the same room. A violent power surge while it is plugged in, a fire, water damage, or theft could easily render all those important files inaccessible.

Requirements

The requirements were easy to jot down:

  • Full system protection against a one-disk failure so I don't get stopped dead in the tracks if something happens to my hardware,
  • off-site backups for the most important data so that even my house being swallowed off the surface of the earth doesn't lead to significant data loss,
  • ability to take snapshots of folders or partitions so that I can experiment with files without the risk of corrupting or losing any data (this also helps with taking backups),
  • protection against data corruption and bit flipping which does happen.

Redundancy

Multi-terrabyte hard disks are become pretty affordable and if your priced digital possessions are shot with expensive cameras, there's no good reason not to invest another $100-200 to make your data and ideally your whole system fully redundant. A RAID1 mirrors all data on the disk so if one disk suddenly fails the other one can keep running. It is recommended to use disks from different batches - or even models - so that a manufacturing issue does not propagate across both drives. RAID1 can easily be done in software by the Linux kernel and there are no complicated algorithms that could lead to issues recovering data further down the road. It is as simple as: both disk have exactly the same information stored on them.

There are also a few other benefits of RAID1. First, while write speeds are slower, read speeds are twice as fast. Once the photo or video data has been offloaded from the memory cards, editing software can take advantage of that for a quicker and more fluid workflow. Additionally, when using a file system like btrfs, any data corruption on one disk can be repaired with data from the other disk. I'll talk about this further down.

For hardware I use two Western Digital Red 4TB NAS Hard Drives (de). WD has an excellent reputation and the Red drives are designed for workstations and Network Attached Storage (NAS) systems where disks can run many hours at a time (think heavy editing or transcoding) or even 24/7. For a great big-scale reliability study of current hard drives check out this Backblaze article about the drives in their data centre - personally, I'd stay away from Seagate.

Off-site backups

All your important data must be stored securely off-site. While it is luckily quite unlikely to fall victim to a house-destroying catastrophe or serious theft it is still possible, and it is the one point were complete data loss could happen when it is expected the least. I use two identical external drives and one always lives off-site in a safe location. I swap them out every one to two weeks depending on the flow of new photos, videos and editing files.

For my big backup disks I use the more affordable Western Digital Green 3TB Hard Drives (de) which are not designed to run for long periods of time so they shouldn't be used in workstations or servers. For backups they are ideal because they are only spun up every other week and it means cheaper storage: all of my data with room to spare for at least the next half year for under $100. To attach the backup disk to my computer I use a cheap and fast Sabrent USB 3.0 to SATA External Hard Drive Docking Station, then rsync to copy new and changed data from my workstation to the disk.

Since I don't really need any of my older, smaller drives for my workstation (if I run out of space it is easier to buy new big multi-TB disks rather than reuse old sub-TB ones and assemble them into RAID0s and RAID1s), I'm also planning to copy some of my finished projects and previous years of photography to those smaller drives and to keep them in yet another off-site location as an archive. My only concern is that after a year of not spinning them up and running a btrfs scrub, data corruption might start to get noticeable.

Data integrity

The Linux file system btrfs provides multiple features that come in very handy when keeping data safe and running backups. For starters, btrfs scrub start /media/data will start a check of all my multimedia files and fix any issues found. Btrfs keeps metadata (which includes checksums) and actual data separate so if there is any damage to the real data there is a chance it can be recovered from the metadata. Furthermore, in a RAID1 system the data can be restored from the other disk which shouldn't show the same (random) data corruption.

The next great feature is snapshots: btrfs subvolume snapshot photography photography-backup creates a snapshot of all my photos and keeps changes that happen from now on separate (through a mechanism called copy-on-write). So if I accidentally delete a file in my photography/ folder I can get it back from the backup folder, yet it doesn't use up any additional space on the drive if files stay the same. This can easily be automated using cronjobs to create and rotate snapshots on a daily or weekly basis. It is also handy to create a snapshot before copying files off to an external hard disk so that I can keep working on my photos and videos while the backup is running in the background.

Other, smaller backups

I also have rsnapshot running every hour to take backups of my XMP sidecar files that get created by Darktable, my RAW photo processing application. Since Darktable automatically saves any changes made to a photograph as they happen, it is quite possible to accidentally delete the whole editing history (the RAW file itself is never modified, though, by the way). However, I have to say that this hasn't happened to me at all in almost two years of using the program. Anyway, keeping the small XMP files available for up to a month is an easy, inexpensive and great way to keep my mind at peace.

Conclusion

Just today, Caleb Pike released a video on his workflow and backup strategy as well. While fairly different due to different tools and computer expertise, the main principles and requirements stay the same and I recommend watching his take on the topic, especially if you found mine too technical or Unix centric.

Please head over to Google+ or Twitter @tobiaswulff (see links on top of the page) to discuss this article or any of my photography and videography work. My Flickr and Vimeo pages also provide some space to leave comments and keep up to date with my portfolio. Lastly, if you want to get updates on future blog posts, please subscribe to my RSS feed. I plan to publish a new article every Wednesday.