Outdoor Photography and Videography

Tobi Wulff Photography

Home / Categories / Linux / Essays

Backup strategy for photos and videos on Linux

This is a continuation of my previous Workflow article. In the workflow article I've linked to Chase Jarvis' excellent videos and blog posts about working with digital media and keeping it safe. I highly recommend to watch them if you are interested in a more in-depth look at workflow and backups.

When I started getting into photography beyond simple point-and-shoot or cellphone snapshots, I realised that leaving all my valuable photos in one location without a decent backup plan would be too risky. I've never had a hard drive suddenly die on me but I'm sure that one day it will happen - suddenly or, maybe even worse, gradually. Not that my old snapshots weren't valuable but RAW photography suddenly involved much more data and also more high-quality prints, competitions, and a growing portfolio. Of course, I had done backups to an external hard drive and this is often as far as most people take it. This prevents data loss due to a sudden one-disk failure. However, even my older photos which were taken during some of the most memorable and important periods of my life were always just backed up to that other drive in the same room. A violent power surge while it is plugged in, a fire, water damage, or theft could easily render all those important files inaccessible.

Requirements

The requirements were easy to jot down:

  • Full system protection against a one-disk failure so I don't get stopped dead in the tracks if something happens to my hardware,
  • off-site backups for the most important data so that even my house being swallowed off the surface of the earth doesn't lead to significant data loss,
  • ability to take snapshots of folders or partitions so that I can experiment with files without the risk of corrupting or losing any data (this also helps with taking backups),
  • protection against data corruption and bit flipping which does happen.

Redundancy

Multi-terrabyte hard disks are become pretty affordable and if your priced digital possessions are shot with expensive cameras, there's no good reason not to invest another $100-200 to make your data and ideally your whole system fully redundant. A RAID1 mirrors all data on the disk so if one disk suddenly fails the other one can keep running. It is recommended to use disks from different batches - or even models - so that a manufacturing issue does not propagate across both drives. RAID1 can easily be done in software by the Linux kernel and there are no complicated algorithms that could lead to issues recovering data further down the road. It is as simple as: both disk have exactly the same information stored on them.

There are also a few other benefits of RAID1. First, while write speeds are slower, read speeds are twice as fast. Once the photo or video data has been offloaded from the memory cards, editing software can take advantage of that for a quicker and more fluid workflow. Additionally, when using a file system like btrfs, any data corruption on one disk can be repaired with data from the other disk. I'll talk about this further down.

For hardware I use two Western Digital Red 4TB NAS Hard Drives (de). WD has an excellent reputation and the Red drives are designed for workstations and Network Attached Storage (NAS) systems where disks can run many hours at a time (think heavy editing or transcoding) or even 24/7. For a great big-scale reliability study of current hard drives check out this Backblaze article about the drives in their data centre - personally, I'd stay away from Seagate.

Off-site backups

All your important data must be stored securely off-site. While it is luckily quite unlikely to fall victim to a house-destroying catastrophe or serious theft it is still possible, and it is the one point were complete data loss could happen when it is expected the least. I use two identical external drives and one always lives off-site in a safe location. I swap them out every one to two weeks depending on the flow of new photos, videos and editing files.

For my big backup disks I use the more affordable Western Digital Green 3TB Hard Drives (de) which are not designed to run for long periods of time so they shouldn't be used in workstations or servers. For backups they are ideal because they are only spun up every other week and it means cheaper storage: all of my data with room to spare for at least the next half year for under $100. To attach the backup disk to my computer I use a cheap and fast Sabrent USB 3.0 to SATA External Hard Drive Docking Station, then rsync to copy new and changed data from my workstation to the disk.

Since I don't really need any of my older, smaller drives for my workstation (if I run out of space it is easier to buy new big multi-TB disks rather than reuse old sub-TB ones and assemble them into RAID0s and RAID1s), I'm also planning to copy some of my finished projects and previous years of photography to those smaller drives and to keep them in yet another off-site location as an archive. My only concern is that after a year of not spinning them up and running a btrfs scrub, data corruption might start to get noticeable.

Data integrity

The Linux file system btrfs provides multiple features that come in very handy when keeping data safe and running backups. For starters, btrfs scrub start /media/data will start a check of all my multimedia files and fix any issues found. Btrfs keeps metadata (which includes checksums) and actual data separate so if there is any damage to the real data there is a chance it can be recovered from the metadata. Furthermore, in a RAID1 system the data can be restored from the other disk which shouldn't show the same (random) data corruption.

The next great feature is snapshots: btrfs subvolume snapshot photography photography-backup creates a snapshot of all my photos and keeps changes that happen from now on separate (through a mechanism called copy-on-write). So if I accidentally delete a file in my photography/ folder I can get it back from the backup folder, yet it doesn't use up any additional space on the drive if files stay the same. This can easily be automated using cronjobs to create and rotate snapshots on a daily or weekly basis. It is also handy to create a snapshot before copying files off to an external hard disk so that I can keep working on my photos and videos while the backup is running in the background.

Other, smaller backups

I also have rsnapshot running every hour to take backups of my XMP sidecar files that get created by Darktable, my RAW photo processing application. Since Darktable automatically saves any changes made to a photograph as they happen, it is quite possible to accidentally delete the whole editing history (the RAW file itself is never modified, though, by the way). However, I have to say that this hasn't happened to me at all in almost two years of using the program. Anyway, keeping the small XMP files available for up to a month is an easy, inexpensive and great way to keep my mind at peace.

Conclusion

Just today, Caleb Pike released a video on his workflow and backup strategy as well. While fairly different due to different tools and computer expertise, the main principles and requirements stay the same and I recommend watching his take on the topic, especially if you found mine too technical or Unix centric.

Please head over to Google+ or Twitter @tobiaswulff (see links on top of the page) to discuss this article or any of my photography and videography work. My Flickr and Vimeo pages also provide some space to leave comments and keep up to date with my portfolio. Lastly, if you want to get updates on future blog posts, please subscribe to my RSS feed. I plan to publish a new article every Wednesday.

Photo Management Workflow in Linux

Ever since I started taking RAW photographs and developing them on my computer, I put a lot of thought ("obsessing over") into how I want to organize my workflow and in particular how to structure my files.

Many photographers might start by simply dumping all their pictures into a folder for each trip or shoot, hopefully ordered by date. If your folder names do not start with YYYY-MM-DD you're going to have a hard time quickly finding and grabbing your photos with a file manager - on the other hand, photo management software can simply use the date and time in the EXIF data to sort and find digital assets. However, what can happen if you take that approach is that you lock yourself into one specific workflow with one specific application.

Apart from that, one folder sounds fine, after all, different files for different purposes have different file extensions: RAW as the original ("digital negative") that never gets modified, XMP sidecar files that store additional metadata and RAW processing steps, XCF/PSD work-in-progress editing files, and finally JPEGs. Nevertheless, I keep my JPEGs and my RAWs in different folders so that I can quickly copy all the JPEGs to an external device to show them to other people. Having to go through the export functionality of a photo management tool would complicate this process unnecessarily.

A great video to watch is Chase Jarvis' TECH blog post and video on his company's workflow, and if you've got some extra time to kill: Chase Jarvis LIVE Q&A on workflow which goes into much more useful detail (but also rambles on about less important stuff from time to time). There is a lot of solid advice from highly successful professionals in there but it can easily be applied to your personal needs by scaling it down a notch. After all, most of what they do still applies to one-man-bands and enthusiasts:

  • Use redundant hardware to prevent data loss due to technical failures
  • Backup regularly and off-site to prevent data loss due to theft or human error
  • Have a standardized workflow so that all your folders are organized consistently
  • Tag and rate to find photos once you've accumulated thousands

Requirements

Availability

I want to be able to quickly get to my photos and show them to someone or copy them to a USB flash drive without having to go through my photo management software. By having separate folders for each photo shoot sorted by date and within those separate folders for each file type ("digital negative", 'developed photos", "project files"), I can always use the CLI or a file manager to get exactly the files I want without reading their EXIF etc data.

Interoperability

Interoperability might not be very important if you know that you will always use one specific program and that you can rely on this program being available, up-to-date and meeting your needs for many years to come (we are talking potentially decades here). Personally, I wouldn't put that much trust into it, and while proprietary photography applications have a slightly better track record than something like MS Office, keep in mind that things might still change very suddenly and your favorite program might not meet your requirements anymore or work in a very different, non-backwards compatible way (like when FCPX came out).

My photo management program of choice, digikam, relies on my own file system structure of my albums, and displays them basically exactly like they are stored on disk. However, it can also browse all photos by date, tag, rating, etc. This way, I can quickly search and filter for specific criteria, or just browse my albums as they are stored in folders. For much, much more information on digikam I highly recommend the eBook digikam recipes which is easily worth the little money it costs if you're looking into using Linux for your photography workflow.

Backups

I don't want to rely solely on my photo management program to do the backups. This is another reason why I started this blog post and my workflow considerations with a sane and well organized file system structure: any backup program will be able to grab those files (all of them or a subset) and copy them somewhere else doing full, incremental and differential backups. Restoring them is also easier if the photo or photo shoot in question can be found quickly.

I will talk more about my backup strategy and how I've implemented it (including tools around rsync and btrfs) in a future blog post.

Personal Workflow

My personal directory structure which reflects most of the workflow:

  • It starts with a folder for the YEAR (e.g. 2014),
  • within each year I have a sub folder DATE_PROJECT/TRIP (e.g. 2015-09-30_Trip_Location),
  • if there are a lot of photos: DAY (full date) or other categorization (e.g. by camera, sub-project, etc)
  • Further sub folders based on workflow (see below)
  • File names are usually DATE_TIMESTAMP (with suffix _1, _2 etc if multiple photos have the same timestamp). I've also seen many people keep the camera's original file name but add date and time at the front. Personally, I don't see any point in keeping the camera's numbering scheme - it doesn't convey any useful information apart from avoiding duplicates if date/time are the same.
These are the sub folders that I use to organize the files within each shoot:
  • jpg: final JPEGs, often straight from camera and the ones developed by myself; I want all the final pictures in one folder so I can browse them easily
  • orf: my current camera's (Olympus) RAW files
  • liveworks: contains RAW files and their XMP sidecars

This covers 95% of my usual projects which are trips or events with lots of out-of-camera JPEGs and a few jewels I want to work on from RAW. These are the ones you see show up in my portfolio. I usually don't have HDR, panorama or other composite shots but if I need a place for them I would put them in a new folder in liveworks. JPEGs that get a final touch-up in GIMP also go into a new "edit" folder.

There is also a discussion to be had about whether to keep developed files or not. While it should always be possible and easy to get to any photo - if developed: from RAW plus sidecar, if edited: from GIMP's XCF- it also makes it quite difficult to quickly access the final image, or the one I uploaded to my website because I also want to publish it somewhere else, or the one I adjusted for printing. Therefore, I keep all the final images around in the "jpg" folder and name them using a flexible system of suffixes. Basically, there is no strict system as long as it is clear what the image and its intention are. It is nice to have the final image appear first in alphanumerical sorting order. Here are some examples:

  • DATE_TIMESTAMP_final_no_wm.jpg: final image (this is usually the final photo, the way I like it most, no watermark therefore not directly for publishing,
  • DATE_TIMESTAMP_final_wm.jpg: same as above but with a watermark/signature and therefore suitable for publishing,
  • DATE_TIMESTAMP_bw.jpg: a black and white version of the photo if I feel like both versions, color and monochrome, work well,
  • DATE_TIMESTAMP_Ax.jpg: image in A format for printing on A4, A3, etc,
  • DATE_TIMESTAMP_dark.jpg or _bright.jpg: different exposures from the final image if they are worth keeping, for instance if it gives the image a different atmosphere,
  • no suffix at all usually means the jpg comes straight out of the camera.

Digikam and most other photo management software can group images. When I keep all developed and processed JPEGs in one folder, I can group them under the "_final_no_wm" version so that all the different varieties won't clutter the album but I can still quickly access all versions by expanding the group.

Conclusion

It's been 1.5 years since I started doing photography with RAW files and this workflow has worked really well for me. I have hardly modified it apart from renaming a few folders. I could see myself dividing my files even more in the future, say, between outdoor trips and events, but for now the quantity of photos is perfectly manageable as described in this blog post. I might look again into digikam's import functionality and what it can do for me but it didn't convince me the first time I tried so I stuck with using the CLI to create sub directories and copying the files from the camera's SD card to their respective folders.

Please head over to Google+ or Twitter @tobiaswulff to discuss this article or any of my photography and videography work. My Flickr and Vimeo pages also provide some space to leave comments and keep up to date with my portfolio. Lastly, if you want to get updates on future blog posts, please subscribe to my RSS feed. I plan to publish a new article every Wednesday.