Backup, someone ?
24 September, 2014
FRIENDLY REMINDER: Have you back up your data today ?
If you've never seen this sentence, then write it down, and put it somewhere in evidence.
Why ?
you ask ? Because. Having multiple copies of your data is important
if you plan on keeping them on the long term.
You know, a hard drive will not tell you: Hey ! I'm gonna die in two days
around 2 am, please copy me somewhere else.
. There are so many way to loose
data... And you'll experience some of them, trust me !
Anyway, back to the topic ! In this post, I'm gonna tell you a simple way to backup your data. All you need is the following:
- A external storage support (USB key, hard drive, tapes, ...)
- An archiver (cpio, tar, ar, ...)
- A compressor (gzip, bzip2, xz, ...)
- Some shell glue
Preparation
First, you need to figure out what you want to backup: configs ? multimedia ?
code ? For the purpose of this article, Let's say I want to backup all my
images, located in /data/img
. Let's figure out the size of this directory:
── du -sh /data/img
5.5G /data/img/
This could fit on my USB key. Let's mount and prepare it. In the meantime, we will create a user dedicated to the backup process:
# useradd -M -g users
# mount /dev/sdd1 /mnt
# mkdir /mnt/backup
# chown backup:users /mnt/backup
Now the drive is ready to accept backups. Let's see how to create them.
Backing up
What's a backup already ?
In information technology, a backup, or the process of backing up, refers to the copying and archiving of computer data so it may be used to restore the original after a data loss event. The verb form is to back up in two words, whereas the noun is backup.
RECOVER, that's the only word that matter. A backup is useless if you can't recover data from it. PERIOD.
In my case, I chose cpio
, because I find it simple to recover data from a cpio
archive. We'll see later how to do so. If you find it easier to do with
tar, feel free to adapt the following to your likings.
So what's the plan ? First, we'll create an archive containing all the files we want. Then, compress the said archive to gain some space, and finally, manage those backups to keep multiple copies.
Archiving
For this task, I chose cpio
, which takes filenames on stdin, and creates an
archive to stdout. The fact it outputs to stdout give the ability to compress
the archive while it's created. A good thing with it is that it will only use
512 bytes of RAM ! Indeed, when you pipe data through a pipe, it will only pass
512 bytes at a time, then wait for the data to be processed, and so on... YOu
can check your pipe buffer with ulimit -a
. Anyways:
── find /data/img -type f | cpio -o | gzip -c > /mnt/backup/images.cpio.gz
And the archive is created and compressed ! Pretty easy isn't it ? Let's see how to manage them now.
Managing
Be creative for this part ! you can either use $(date +%Y-%m-%d)
as a name for
the backup, write a crawler to change names based on their timestamp, or maybe
use some rotating script, like the one written by
ypnose.
I modified the script to allow an automatic rotation of files, in case the file number limit is reached. Here it is:
#!/bin/sh
#
# z3bra - (c) wtfpl 2014
# Backup a file, and rotate backups : file.0.BAK - file.1.BAK, ...
#
# Based on a original idea from Ypnose. Thanks mate !
# <http://ywstd.fr/blog/2014/bakup-snippet.html>
EXT=${EXT:-BAK} # extension used for backup
LIM=${LIM:-9} # maximum number of version to keep
PAD=${PAD:-0} # number to start with
usage() {
cat <<EOF
usage: `basename $0` [-hrv] <file>
-h : print this help
-r : perform a rotation if \$LIM is reached
-v : verbose mode
EOF
}
# report action performed in verbose mode
log() {
# do not log anything if not in $VERBOSE mode
test -z $VERBOSE && return
echo "[$(date +%Y-%m-%d)] - $*"
}
# rotate backups to leave moar room
rotate() {
# do not rotate if the rotate flags wasn't provided
test -z $ROTATE && return
# delete the oldest backup
rm ${FILE}.${PAD}.${EXT}
# move every file down one place
for N1 in `seq $PAD $LIM`; do
N2=$(( N1 + ROTATE ))
# don't go any further
test -f ${FILE}.${N2}.${EXT} || return
# move file down $ROTATE place
log "${FILE}.${N2}.${EXT} -> ${FILE}.${N1}.${EXT}"
mv ${FILE}.${N2}.${EXT} ${FILE}.${N1}.${EXT}
done
}
# actually archive files
archive() {
# test the presence of each version, and create one that doesn't exists
for N in `seq $PAD $LIM`; do
if test ! -f ${FILE}.${N}.${EXT}; then
# cope the file under it's new name
log "Created: ${FILE}.${N}.${EXT}"
cp ${FILE} ${FILE}.${N}.${EXT}
exit 0
fi
done
}
while getopts "hrv" opt; do
case $opt in
h) usage; exit 0 ;;
r) ROTATE=1 ;;
v) VERBOSE=1 ;;
*) usage; exit 1 ;;
esac
done
shift $((OPTIND - 1))
test $# -lt 1 && usage && exit 1
FILE=$1
# in case limit is reach, remove the oldest backup
test -f ${FILE}.${LIM}.${EXT} && rotate
# if rotation wasn't performed, we'll not archive anything
test -f ${FILE}.${LIM}.${EXT} || archive
echo "Limit of $LIM .$EXT files reached run with -r to force rotation"
exit 1
Now, to "archive" a file, all you need to do is :
── cd /mnt/backup
── backup.sh -r images.cpio.gz
And it will create the following tree:
── ls /mnt/backup
images.cpio.gz images.cpio.gz.3.BAK images.cpio.gz.7.BAK
images.cpio.gz.0.BAK images.cpio.gz.4.BAK images.cpio.gz.8.BAK
images.cpio.gz.1.BAK images.cpio.gz.5.BAK images.cpio.gz.9.BAK
images.cpio.gz.2.BAK images.cpio.gz.6.BAK
Aaaaaand we're done ! Wrap it all in a crontab, and the backup process will start:
# start a backup a 2 am, everyday
0 2 * * * find /data/img -type f |cpio -o |gzip > /mnt/backup/image.cpio.gz
# rotate backups limiting their number to 7 (a whole week)
0 3 * * * cd /mnt/backup && LIM=6 backup.sh -r image.cpio.gz
Should be enough for now. But here comes the most important part...
Restoring
This is the most important one, but not the trickiest, don't worry. We're on
friday, and your friends are arriving in a few minutes to see the photos from
your last trip. Before they arrive, you decide to cleanup the directory, and
notice a .filedb-47874947392
created by your camera in the said directory.
Let's remove it:
── cd /data/img/2014/trip_to_sahara/
── ls -a .filedb-*
.filedb-47874947392
── rm -f .filedb- *
rm: can't remove '.filedb-': No such file or directory
── ls -la .
total 0
drwxr-xr-x 1 z3bra users 402 Sep 24 00:41 .
drwxr-xr-x 1 z3bra users 402 Sep 24 00:41 ..
-rw-r--r-- 1 z3bra users 0 Sep 24 00:58 .filedb-47874947392
Oh god.. Why..?
This shitty space between the '-' and the '*' in your rm
command is going to
fuck your presentation up !
Hopefully, you made a backup this morning at 2 am... Let's restore your whole
directory from it:
── mount /dev/sdd1 /mnt
── cd /mnt/backup
── ls -la
total 0
drwxr-xr-x 1 z3bra users 402 Sep 10 00:41 .
drwxr-xr-x 1 z3bra users 402 Sep 10 00:41 ..
-rw-r--r-- 1 z3bra users 0 Sep 19 02:01 images.cpio.gz
-rw-r--r-- 1 z3bra users 0 Sep 15 03:00 images.cpio.gz.0.BAK
-rw-r--r-- 1 z3bra users 0 Sep 16 03:00 images.cpio.gz.1.BAK
-rw-r--r-- 1 z3bra users 0 Sep 17 03:00 images.cpio.gz.2.BAK
-rw-r--r-- 1 z3bra users 0 Sep 18 03:00 images.cpio.gz.3.BAK
-rw-r--r-- 1 z3bra users 0 Sep 19 03:00 images.cpio.gz.4.BAK
-rw-r--r-- 1 z3bra users 0 Sep 13 03:00 images.cpio.gz.5.BAK
-rw-r--r-- 1 z3bra users 0 Sep 14 03:00 images.cpio.gz.6.BAK
We are friday 19 september. As you can see from the timestamp, backups number 5/6 are from last week. The backup from this morning is the number 4, and the latest is the one without any number.
cpio
allow extracting files from an archive using the following syntax
── cpio -i -d < archive.cpio
-i
ask for an extraction, while -d
tells cpio
to recreate the directory
tree if it does not exists. Check the wikipedia
article for more explanations on how it works.
So, to restore our lost directory you'd proceed like this:
# archive was created from absolute path, and cpio restor files from current
# directory, so let's move to root, to restore files directly
── cd /
# you can pass globbing patterns to cpio, so that it only restores what you
# want. Don't forget to decompress the archive first
── gzip -cd /mnt/backup/images.cpio.gz | cpio -ivd data/img/2014/trip_to_sahara/*
data/img/2014/trip_to_sahara/IMG-0001.JPG
data/img/2014/trip_to_sahara/IMG-0002.JPG
data/img/2014/trip_to_sahara/IMG-0003.JPG
data/img/2014/trip_to_sahara/IMG-0004.JPG
data/img/2014/trip_to_sahara/IMG-0005.JPG
data/img/2014/trip_to_sahara/IMG-0006.JPG
data/img/2014/trip_to_sahara/.filedb-47874947392
23 blocks
── ls /data/img/2014/trip_to_sahara
IMG-0001.JPG IMG-0003.JPG IMG-0005.JPG
IMG-0002.JPG IMG-0004.JPG IMG-0006.JPG
# be careful this time !
── rm /data/img/2014/trip_to_sahara/.filedb-47874947392
And it's all good ! Don't forget to keep your drive safe, and duplicate it if you can, just in case.
Hope it will be useful to someone, cheers !