Backup, someone ?

24 September, 2014

FRIENDLY REMINDER: Have you back up your data today ?

If you've never seen this sentence, then write it down, and put it somewhere in evidence.

Why ? you ask ? Because. Having multiple copies of your data is important if you plan on keeping them on the long term. You know, a hard drive will not tell you: Hey ! I'm gonna die in two days around 2 am, please copy me somewhere else.. There are so many way to loose data... And you'll experience some of them, trust me !

Anyway, back to the topic ! In this post, I'm gonna tell you a simple way to backup your data. All you need is the following:

A external storage support (USB key, hard drive, tapes, ...)
An archiver (cpio, tar, ar, ...)
A compressor (gzip, bzip2, xz, ...)
Some shell glue

Preparation

First, you need to figure out what you want to backup: configs ? multimedia ? code ? For the purpose of this article, Let's say I want to backup all my images, located in /data/img. Let's figure out the size of this directory:

── du -sh /data/img
5.5G    /data/img/

This could fit on my USB key. Let's mount and prepare it. In the meantime, we will create a user dedicated to the backup process:

# useradd -M -g users
# mount /dev/sdd1 /mnt
# mkdir /mnt/backup
# chown backup:users /mnt/backup

Now the drive is ready to accept backups. Let's see how to create them.

Backing up

What's a backup already ?

In information technology, a backup, or the process of backing up, refers to the copying and archiving of computer data so it may be used to restore the original after a data loss event. The verb form is to back up in two words, whereas the noun is backup.

RECOVER, that's the only word that matter. A backup is useless if you can't recover data from it. PERIOD.

In my case, I chose cpio, because I find it simple to recover data from a cpio archive. We'll see later how to do so. If you find it easier to do with tar, feel free to adapt the following to your likings.

So what's the plan ? First, we'll create an archive containing all the files we want. Then, compress the said archive to gain some space, and finally, manage those backups to keep multiple copies.

Archiving

For this task, I chose cpio, which takes filenames on stdin, and creates an archive to stdout. The fact it outputs to stdout give the ability to compress the archive while it's created. A good thing with it is that it will only use 512 bytes of RAM ! Indeed, when you pipe data through a pipe, it will only pass 512 bytes at a time, then wait for the data to be processed, and so on... YOu can check your pipe buffer with ulimit -a. Anyways:

── find /data/img -type f | cpio -o | gzip -c > /mnt/backup/images.cpio.gz

And the archive is created and compressed ! Pretty easy isn't it ? Let's see how to manage them now.

Managing

Be creative for this part ! you can either use $(date +%Y-%m-%d) as a name for the backup, write a crawler to change names based on their timestamp, or maybe use some rotating script, like the one written by ypnose.

I modified the script to allow an automatic rotation of files, in case the file number limit is reached. Here it is:

#!/bin/sh
#
# z3bra - (c) wtfpl 2014
# Backup a file, and rotate backups : file.0.BAK - file.1.BAK, ...
#
# Based on a original idea from Ypnose. Thanks mate !
# <http://ywstd.fr/blog/2014/bakup-snippet.html>

EXT=${EXT:-BAK} # extension used for backup
LIM=${LIM:-9}   # maximum number of version to keep
PAD=${PAD:-0}   # number to start with

usage() {
    cat <<EOF
usage: `basename $0` [-hrv] <file>
        -h  : print this help
        -r  : perform a rotation if \$LIM is reached
        -v  : verbose mode
EOF
}

# report action performed in verbose mode
log() {
    # do not log anything if not in $VERBOSE mode
    test -z $VERBOSE && return

    echo "[$(date +%Y-%m-%d)] - $*"
}

# rotate backups to leave moar room
rotate() {
    # do not rotate if the rotate flags wasn't provided
    test -z $ROTATE && return

    # delete the oldest backup
    rm ${FILE}.${PAD}.${EXT}

    # move every file down one place
    for N1 in `seq $PAD $LIM`; do
        N2=$(( N1 + ROTATE ))

        # don't go any further
        test -f ${FILE}.${N2}.${EXT} || return

        # move file down $ROTATE place
        log "${FILE}.${N2}.${EXT} -> ${FILE}.${N1}.${EXT}"
        mv ${FILE}.${N2}.${EXT} ${FILE}.${N1}.${EXT}
    done
}

# actually archive files
archive() {
    # test the presence of each version, and create one that doesn't exists
    for N in `seq $PAD $LIM`; do
        if test ! -f ${FILE}.${N}.${EXT}; then

            # cope the file under it's new name
            log "Created: ${FILE}.${N}.${EXT}"
            cp ${FILE} ${FILE}.${N}.${EXT}

            exit 0
        fi
    done
}

while getopts "hrv" opt; do
    case $opt in
        h) usage; exit 0 ;;
        r) ROTATE=1 ;;
        v) VERBOSE=1 ;;
        *) usage; exit 1 ;;
    esac
done

shift $((OPTIND - 1))

test $# -lt 1 && usage && exit 1

FILE=$1

# in case limit is reach, remove the oldest backup
test -f ${FILE}.${LIM}.${EXT} && rotate

# if rotation wasn't performed, we'll not archive anything
test -f ${FILE}.${LIM}.${EXT} || archive

echo "Limit of $LIM .$EXT files reached run with -r to force rotation"
exit 1

Now, to "archive" a file, all you need to do is :

── cd /mnt/backup
── backup.sh -r images.cpio.gz

And it will create the following tree:

── ls /mnt/backup
images.cpio.gz        images.cpio.gz.3.BAK images.cpio.gz.7.BAK
images.cpio.gz.0.BAK  images.cpio.gz.4.BAK images.cpio.gz.8.BAK
images.cpio.gz.1.BAK  images.cpio.gz.5.BAK images.cpio.gz.9.BAK
images.cpio.gz.2.BAK  images.cpio.gz.6.BAK

Aaaaaand we're done ! Wrap it all in a crontab, and the backup process will start:

# start a backup a 2 am, everyday
0 2 * * * find /data/img -type f |cpio -o |gzip > /mnt/backup/image.cpio.gz

# rotate backups limiting their number to 7 (a whole week)
0 3 * * * cd /mnt/backup && LIM=6 backup.sh -r image.cpio.gz

Should be enough for now. But here comes the most important part...

Restoring

This is the most important one, but not the trickiest, don't worry. We're on friday, and your friends are arriving in a few minutes to see the photos from your last trip. Before they arrive, you decide to cleanup the directory, and notice a .filedb-47874947392 created by your camera in the said directory. Let's remove it:

── cd /data/img/2014/trip_to_sahara/
── ls -a .filedb-*
.filedb-47874947392
── rm -f .filedb- *
rm: can't remove '.filedb-': No such file or directory
── ls -la .
total 0
drwxr-xr-x    1 z3bra    users          402 Sep 24 00:41 .
drwxr-xr-x    1 z3bra    users          402 Sep 24 00:41 ..
-rw-r--r--    1 z3bra    users            0 Sep 24 00:58 .filedb-47874947392

Oh god.. Why..? This shitty space between the '-' and the '*' in your rm command is going to fuck your presentation up ! Hopefully, you made a backup this morning at 2 am... Let's restore your whole directory from it:

── mount /dev/sdd1 /mnt
── cd /mnt/backup
── ls -la
total 0
drwxr-xr-x    1 z3bra    users      402 Sep 10 00:41 .
drwxr-xr-x    1 z3bra    users      402 Sep 10 00:41 ..
-rw-r--r--    1 z3bra    users        0 Sep 19 02:01 images.cpio.gz
-rw-r--r--    1 z3bra    users        0 Sep 15 03:00 images.cpio.gz.0.BAK
-rw-r--r--    1 z3bra    users        0 Sep 16 03:00 images.cpio.gz.1.BAK
-rw-r--r--    1 z3bra    users        0 Sep 17 03:00 images.cpio.gz.2.BAK
-rw-r--r--    1 z3bra    users        0 Sep 18 03:00 images.cpio.gz.3.BAK
-rw-r--r--    1 z3bra    users        0 Sep 19 03:00 images.cpio.gz.4.BAK
-rw-r--r--    1 z3bra    users        0 Sep 13 03:00 images.cpio.gz.5.BAK
-rw-r--r--    1 z3bra    users        0 Sep 14 03:00 images.cpio.gz.6.BAK

We are friday 19 september. As you can see from the timestamp, backups number 5/6 are from last week. The backup from this morning is the number 4, and the latest is the one without any number.

cpio allow extracting files from an archive using the following syntax

── cpio -i -d < archive.cpio

-i ask for an extraction, while -d tells cpio to recreate the directory tree if it does not exists. Check the wikipedia article for more explanations on how it works.

So, to restore our lost directory you'd proceed like this:

# archive was created from absolute path, and cpio restor files from current
# directory, so let's move to root, to restore files directly
── cd /

# you can pass globbing patterns to cpio, so that it only restores what you
# want. Don't forget to decompress the archive first
── gzip -cd /mnt/backup/images.cpio.gz | cpio -ivd data/img/2014/trip_to_sahara/*
data/img/2014/trip_to_sahara/IMG-0001.JPG
data/img/2014/trip_to_sahara/IMG-0002.JPG
data/img/2014/trip_to_sahara/IMG-0003.JPG
data/img/2014/trip_to_sahara/IMG-0004.JPG
data/img/2014/trip_to_sahara/IMG-0005.JPG
data/img/2014/trip_to_sahara/IMG-0006.JPG
data/img/2014/trip_to_sahara/.filedb-47874947392
23 blocks

── ls /data/img/2014/trip_to_sahara
IMG-0001.JPG IMG-0003.JPG IMG-0005.JPG
IMG-0002.JPG IMG-0004.JPG IMG-0006.JPG

# be careful this time !
── rm /data/img/2014/trip_to_sahara/.filedb-47874947392

And it's all good ! Don't forget to keep your drive safe, and duplicate it if you can, just in case.

Hope it will be useful to someone, cheers !