Monday 2 September 2013

Using Google Drive for Server Backup

Automated Backup Jobs

Recently we reviewed our server backup to make sure everything is working correctly and we also looked into improving our solution as always.

Our current solution uses a bash script to create some tar archives which are then compressed with bzip2 which gives us some significant file size reduction especially for text files which our database dumps are.

When the backup archives are created the bash script uses mutt to email the archives to a Google gmail account this solution provides us with a nice easy to access backup files.

Recently we have been running into trouble with the size of our backup files so one constraint we had was using a different storage method we decided upon using Google Drive for the job.


Image of some clock work cogs
Automation


What to Backup

The first task we had was to review our backup and check we are saving the correct files and directories and we are not missing anything important.

For us we needed to backup our subversion repositories which contain all of our code base and revision history so this is an important asset to us.

Next we have our bug tracking software which is a Bugzilla install so we have to backup the database tables for that.

We have a few other databases that need to be backed up along with the bugzilla database so that can all be handle in one swoop.

Finally we need to backup the web root of all our website that we host the reason we backup the source tree here and not just use SVN is due to images being uploaded and other files which are created dynamically.


Backing Up Databases

Here we use passthru to execute a system command to dump the databases then create a md5 hash of the dump file then archive and compress them. We should be able to make some significant space savings using compression.

The reason we create a checksum of the file is to help with determining the archive is the same as the day it was created. We check them manually a few times a year to verify it is working correctly and the checksums match, more on this later.

For dumping all databases you can use the --all-databases switch for the mysqldump program the following will backup our databases.
mysqldump --user={SQL_USER} --password={SQL_PASS} --all-databases

Using the command above as the first parameter for the passthru(); function will dump all the data we need.


Backing Up Subversion

Our server has multiple subversion repositories so we need to backup each one and generate a file checksum using the md5_file(); function. For our solution we create a DirectoryIterator() object and iterate through the directories under our SVN root directory.

By testing if the item is a directory and not a dot we can use the svnadmin program which we can call using passthru again.
svnadmin dump /path/to/repos > dump_file_name.svn

Once we have iterated through the array we can create a tar archive of the dump and md5 checksum files which we then compress.


Backing Up Website

Finally we backup our website root directories so we have a copy of any files that have been generated automatically such as news cover images.

We decided not to backup the whole of the source code files under some of our web roots because we use subversion for revision control we have our tagged releases and the HEAD of the trunk.


If we need to restore it will be restored from the subversion backup this makes more sense to us as we can use stable.
tar -pcf webroot_backup_file_name.tar /path/to/web/root

Using tar simplifies the creation of an archive of files we will be compressing this tar achive to try and make it smaller in size.

Compressing Data Files

When we have got all our data in a tar archived we like to compress the archive to try and reduce file size and reduce bandwidth usage. Although bzip2 is slower than gzip we can achieve a greater reduction in size.

The program bzip2 has a few switches one of them is to set the block sized used with -9 offering the most efficient but slowest compression speed this is the option we use as time taken does not matter and all tasks are run with a nice level of 10.

Using Google google-api-php-client Library

To assist in using Google Drive as the back end storage for our backup solution we downloaded the API library from Google to help us speed up development.


Using the library allows us to interface with Google Drive you can also use curl to upload files by sending POST requests but we chose to use the PHP library.

Object orientated design allows us to reuse our upload code for each type of backup (database, subversion, website, system) by having a base class and extending this class for each concrete implementation.

If you download the library from here and extract it to a directory you have access to you can use it to access all of Google's API's using one set of libraries not just Google Drive.


Upload Code

All of the code needed to use Google Drive as a back end for database backups is included below all you need to modify is the {FILE_TO_UPLOAD_NAME} this should be the path of the file to upload and change the mime type if you are uploading something other than a bzip2 file there is a list of mime types here.
require_once('{PATH_TO_GOOGLE_LIB}/src/Google_Client.php');
require_once('{PATH_TO_GOOGLE_LIB}/src/contrib/Google_DriveService.php');
require_once('{PATH_TO_GOOGLE_LIB}/src/contrib/Google_Oauth2Service.php');
$File = new Google_DriveFile();
$File->setTitle({FILE_TO_UPLOAD_TITLE});
$File->setMimeType('application/bzip2');
try {
    $data = file_get_contents({FILE_TO_UPLOAD_NAME});
    $createdFile = $this->Service->files->insert($File, array(
        'data' => $data,
        'mimeType' => 'application/bzip2'
        )
    );
    echo "File ID: " . $createdFile->getId() . "\n";
}
catch (Exception $e) {
    echo "There was an exception, message was: " . $e->getMessage() . "\n";
}


PHP Client Shell

We have also implemented a command shell which can be used as a client interface to the backup system allowing all the functions to be carried out by issuing commands such as list to display the contents of the Google Drive.

So far we have implemented the following commands and we will be adding some more in the future if we feel they are needed. There are also shorthand versions of the commands available but they are not listed below (eg.. dl for the download command).

  • list, List contents of current Google Drive directory
  • upload {FILE_NAME}, Upload the file identified by {FILE_NAME} to the Google Drive
  • download {FILE_ID}, Download a file identified by its Google Drive file id
  • backup {TYPE}, Perform backup action of {TYPE}. {TYPE} can be database|subversion|website|system
  • remove {FILE_ID}, Remove the file identified by {FILE_ID} from the Google Drive
  • system {ACTION}, Run a system command action

Conclusion

Our new implementation of our backup solution is quite heavy for the task at hand and anything we have done here could be done using a bash script. If you take this approach you could use curl from the command line to post the files to the Google Drive.

In a future article we will be covering the command shell part of our backup system and looking at our system restore scripts which we use to help automate building and configuring a clean Debian system install.

No comments:

Post a Comment