Stuff Yaron Finds Interesting

Technology, Politics, Food, Finance, etc.

Rsync, RsyncX 2.1 & OS X

I really needed to backup my system. I had evaluated a bunch of backup software when a friend (who shall remain nameless to protect the guilty) pointed out Rsync and RsyncX 2.1. Rsync is a command line open source program used to do sophisticated data backup. RsyncX is a 'friendly' front end to Rsync. Rsync by itself is about as friendly as an angry viper. RsyncX is a little less unfriendly, say about the level of a club to the head. After seven or so hours I finally managed to get everything working. Below I give step-by-step instructions on how I used RsyncX to set up both my local and remote backups. My take away from this experience is that I should have just bought backup software.
Rsync is a program available just about anywhere UNIX hangs out. It is used to backup data. It is outrageously powerful which means that it has pages of options which I aspire to never understand. RsyncX 2.1, near as I can tell, is primarily intended to create a version of Rsync that can handle some of the magic in the Mac file system, specifically the resource and data forks. In addition the team did a bunch of work creating a nicer UI for Rsync. Unfortunately the work still has a long way too go. The UI provided by RsyncX consists of assistants and other interfaces primarily designed to generate Rsync scripts. Unfortunately the assistants and interfaces I used almost all suffered from multiple bugs. In addition the documentation is out of date and the interfaces and explanations provided by the program were difficult for me to follow. All things considered I wish I hadn't tried to use RsyncX and instead had just bought some backup software, there is a lot of good software available in the $20-$60 range. Unfortunately, much like the boiled frog, I didn't realize I should jump out until it was too late.

My purpose then in writing this article is to document what I did so if something ever goes wrong I have some hope of fixing it. Also, for those who really feel the need to save $20-$60, this document will provide at least some guidance on how to use RsyncX.

I used RsyncX's Rotating Backup Assistant to create my two backup scripts. This assistant, available under the Assistants menu item, creates a script that takes advantage of Rsync's incremental backup features which are fully explained here. In the instructions below I walk through each of the screens that the assistant has. At points to numerous to count RsyncX will prompt for a password. I don't point out these times below.

Local Backup Script

My first backup script is a straight forward backup of my home directory on my Mac to an external Firewire drive connected through the very nifty ComboDock from WiebeTECH. This is a 'local to local' copy. I start off by running the assistant.

Select Source

I found the first screen confusing because it wasn't immediately clear to me that it was actually asking for two different pieces of information. It wants to know if the source is a boot drive or a sub folder and then it wants to know if the source is on the local machine or needs to be contacted remotely. In this case I choose sub folder and typed in the path to my home directory (e.g. /Users/user name) and then selected "My source is on this machine"and pressed continue.

Select Destination

I typed in a directory to store my incremental backups. I then typed in a name for the incremental directories. In other words, if I type in /Volumes/Blah as the first value and Incremental as the second value then incremental backups will be stored in directories with names like /Volumes/Blah/Incremental.0, /Volumes/Blah/Incremental.1, etc. I then pressed continue.

Set Rotation Size

This controls how many incremental backups are kept. What this means is that every time a backup is run the current backup will show up in Incremental.0 and the previous backup will be put into Incremental.1 and so on until the contents of Incremental.22 are moved into Incremental.23 thus destroying the 24th backup. Keep in mind that using the hard link features discussed in the previous article the total space taken up by the backups probably won't be that bad. I choose 24 because it's a nice number and pressed continue.

If I find this takes up too much space then I will reduce it

Backup Frequency

Every Day seems reasonable to me. Although Every Hour isn't as bad as it sounds. Rsync is very efficient and can be run without really screwing up the machine. But I selected every day and pressed continue.

All Done!

I give a name to the script. This name will automatically have '.command' added to the end of it. I then specified where the backup script should be kept and pressed done.

RsyncX will add a pointer to the script into Cron and open the RsyncX Scheduler (which is a viewer onto Cron). Unfortunately RsyncX has a bug which causes it to put the script into Cron without adding ".command" to the end. To fix this scroll down to the script. Press 'Modify' then when the window pops up press 'Select Script' and go to the script file in the file dialog and click on it. Then press 'Add This Event'. Now the Cron file will have a new line with the correct name. Unfortunately the term 'modify' is a misnomer. Copy would probably be more accurate because once the editing is done and the new line is added the old line is still there. Scroll to the old line and select delete.

Before finishing I had to select the icon in finder for the drive I was backing up to and then go to file->Get Info. At the bottom of that dialog was a check box whose text read "Ignore ownership on this volume". I had to make sure that the check box was NOT checked. Rsync's incremental backup algorithm will not work if the checkbox is checked.

That's it. Now the system will run the backup script once a day. To make sure the script is working correctly I pressed on 'View Progress' in the RsyncX Schedule and selected the schedule entry for the script script and pressed 'Run Event'. The progress of the script can be viewed from the window. The first backup will take a while but subsequent backups should be relatively fast.

Remote Backup Script

I also have a script that backs up the files on goland.org, my website. This is a remote machine hosted by an ISP. Since Goland.org is a remote machine I have to use Rsync's remote backup features. Rsync supports two styles of remote backup. One is using a dedicated Rsyncd process on the remote machine running the Rsync protocol. But my ISP doesn't support that. The other way is by talking to another Rsync client over SSH. Thankfully my ISP does support that. But there is a complication. Rsync cannot automatically log in to SSH. There is a "–password-file" option for Rsync to allow one to specify a password to be used for remote access but this doesn't work with SSH. The result is that when I run Rsync it will ask me for my SSH password so it can log in. This isn't a big deal except that I want to run the backup as a cron job and so won't be around to type in a password. Thankfully there is no such thing as a new problem and someone has already solved this one.

Step 1 – I opened a terminal window on my local machine and typed in "ssh-keygen -t rsa". I pressed return for the two password prompts and accept the default destination for the key files.
Step 2 – I opened ~/.ssh/id_rsa.pub and copied the contents to to the file ~/.ssh/authorized_keys on goland.org (e.g. the remote system).

What this did was generate a public/private key pair on my local machine which I then copied the public part of to the remote machine. By pressing return twice I didn't put a password on the private key. This means that anyone logged in as me on my Mac can SSH to my website without using a password. Given the requirement to first hack my Mac (which will make available much greater portions of pain than hacking my website) I think the risk is acceptable. Later on I will run a piece of software called ssh-agent that will load up my keys locally and use them to automatically log in to remote systems. The result is that I never need to type in a password to SSH to goland.org which means my script can now run without me doing anything.

With this done I ran the Rotating Backup Assistant.

Select Source

This is the same dual part dialog I found confusing before. To find the value to fill in the first part I ran ssh-agent and then ssh'd to goland.org and typed in "pwd" on the remote machine to show me the path to my remote folder. I then selected the sub folder option in RsyncX and typed in the path. I then selected the option telling the system to use ssh (since my ISP doesn't support rsyncd) and pressed continue.

Select Source – ssh connection

I typed in my goland.org login name, typed in goland.org as the DNS address and then pressed continue.

Select Destination

I put in the path and incremental folder name for the destination of the backup files and pressed continue.

Set Rotation Size

I choose 24 and pressed continue.

Backup Frequency

I choose "No thanks, I'll schedule this script later" and pressed continue. I'll explain why below.

All Done!

I then typed in the script name and location and press done.

Alas, there was quite a bit more work to do before the remote backup would run correctly.

First, I have to change the ownership of the script from root to myself so I can run it, 'sudo chown "My User Name" scriptName.command'. Yes, I know, I could just make it world runnable but that strikes me as a bad idea.

Next, I had to edit the command script itself. I added the following code to the top of the script:

#
# Check that the ssh-agent is running, and if not, kick it off
#
/bin/ps -wU ${USER} | /usr/bin/grep "[s]sh-agent" > /dev/null
if [[ $? -gt 0 ]]; then
ssh-agent
fi

(The previous was adapted from this article on using ssh-agent in OS X.)

"ssh-agent" is a program that loads the public/private key I generated for the SSH session. That way when Rsync tries to connect through SSH it won't have to prompt for a password. The code that wraps it checks to see if ssh-agent is already running (it remains resident when it is run the first time) and if so doesn't try to run it again. Without this check many instances of ssh-agent would be created. This isn't a big deal, it doesn't take much memory, but it is the moral equivalent of a memory leak and I'd like to avoid that.

Then I needed to remove two options from the rsync command at the bottom of the script file, –showtogo and –eahfs. These options aren't supported by my ISP's rsync client. Showtogo tells Rsync to display how many files are left to be synchronized. I believe eahfs tells Rsync to synchronize a hfs+ file system, which my ISP is not running.
Then I needed to add the flag "–copy-unsafe-links" to the script. My ISP connects my home account to my HTML files using a symbolic link and RSYNC won't follow a symbolic link unless this flag is included.
Following the instructions I got here I copied /etc/crontab to ~/crontab and edited it by deleting all the lines starting with the ones that contain the word 'periodic' and working my way down with the exception of the line that contained my backup script from the local backup I set up previously. I then edited that line to remove root (you don't specify users in local crontabs) and change the script location to point at my remote backup script. The result looks like:
@daily^TAB^TAB^TAB^TAB^TAB/Users/MyHomeDirectory/ScriptName.command

The tabs are very important as they are how crontab delimits fields. I then saved the file and ran "crontab ~/crontab" which tells crontab about the file. One of cron's many nice features is that once you tell it about a setting it doesn't forget. So even if I reboot it will remember the settings I programmed into it. However, if I want to update those settings I either have to use "crontab -e" or the "crontab ~/crontab" trick I used above.

The directories created by RsyncX are all owned by root but since crontab is going to run as me I need to change their ownership or the script will fail when it tries to move the directories. To do this I executed 'sudo chown -R "My User Name" rootdirectoryname'. This goes through the directories, including the incrementals, and changed the ownership to me.

The reason for changing ownership and running crontab locally is that the ssh work I did previously is unique to my account. The keys are kept in my account. In theory I could have run everything from root but that posed two problems. First, I don't like using root if I don't have to. I need to use root for the local backup because there are local files that couldn't be copied by anyone else. But there is no need to use local root for a remote backup. Second, for security reasons, there isn't a root account on OS X. There is root permission but no actual account. By not having the account whole classes of security holes are avoided. In order to be able to run the script as root with the ssh-agent script I would have had to create the root account. For what it's worth I did try to change the user in RsyncX's cron editor to myself and this seemed to almost work but it caused some bizarre problems with the directories. In fact, the directories that got created were so screwed up that when I tried to 'get info' on them using the finder they crashed the finder! Rather than figure out the problem I decided to just use ~/crontab.

My last step was to run a test. The easiest way is to just run the command file (e.g. ./filename.command). But I decided to get a little fancy and actually put in a crontab that would run a few minutes from when I put it in. Let's say that it will soon be 5:00 PM. In that case I would edit ~/crontab and add the line at the bottom:
0^TAB17^TAB*^TAB*^TAB*^TABSame script path as from previous line
Then I saved the file and ran "crontab ~/crontab".

I then waited and sure enough at the right time I could hear my disk run and see files show up. I then went and deleted the line from ~/crontab and re-ran "crontab ~/crontab". It's probably a good idea to run the command file directly from the prompt if only to see what errors may be generated.

A final, important, note is that I found that the incremental hard link feature doesn't seem to actually work right for my remote backup. That is, rather than just hard linking to files that haven't changed the system copies all files down again. The amount of data I have up on my ISP site is relatively small so I really don't care but this is still kind of annoying. I have contributed to a thread in order to ask for help with this problem. I should also note that hard linking works just fine for my local backup.

Conclusion

After spending nearly seven hours of my time (yes, I'm that slow) I finally managed to get everything up and running. Buying a decent backup software program for the Mac would have cost me somewhere between $20 and $60. In return for which, given my penchant for reading manuals, I probably would have spent around an hour getting everything set up exactly the way I wanted. My remote backup probably would have had to run over FTP and so wouldn't be as fast but I can live with that. All in all, if I had it to do over again I would have just bought a program. Oh well, live and learn.

8 Responses to Rsync, RsyncX 2.1 & OS X

  1. Allen Huffman says:

    Thanks for this information! I've been using a ditto script I found on macosxhints.com a long time ago, but went searching for something a bit more flexible. This looks like a winner (but I'd buy a backup package that could do what I want if I knew which one did that ;). — Allen

  2. Gert Veltink says:

    mail: veltink_AT_plymonster.com (remove nster)

    Hi,

    thanks for your description of RSyncX.

    It's not entirely true that buying software would have been a better solution, at least not for me!

    I already started using RsyncX myself even though I own a copy of Retrospect that came with my Maxtor OneTouch. At least that version of Retrospect does not allow you to make a backup over a network, which is a KO ctiterion for me. I don't know whether another version of Retrospect will allow network backups or whether you have to buy extensions to this software.

    Nevertheless, I decided to stay away from Retrospect completely in part because I find the user experience “strange” and non-Mac like, which always makes me very cautious. But what really founded my decision is the user experience I got for the latest Retrospect version under Windows XP on my DELL M60. UNUSABLE! The application doesn't even paint its GUI correctly. So there I had to fall back to earlier version that's halfway decent.

    I think RSyncX may be difficult to set up, but once you have it running it's a very nice tool.

    Thanks for your decription once again!

    Regards,

    Gert

  3. Yaron says:

    Gert, I do agree that RSyncX is pretty nifty and once you do have it set up it runs like a champ. I don't worry at all about my backups because RSyncX does the job in spades, both for my local and across-the-network backups. Furthermore I'm really glad I can contribute back to the community by providing my article with step by step instructions. But, oy gevald, do you know how many hours it took me to figure out everything in that article and how many more to write it up in enough detail so that other people could skip my pain? Now that I've paid the price I'm a happy RSyncX user, but I really wish I could have skipped the pain all together.

  4. Pw says:

    I’ve tried 3-4 different ways of creating these public/private keys but I’m still getting prompted for passwords. Have looked at debugging and it seems to be offering the keys ok but still no luck.

  5. Mads says:

    Hi

    I rally appreciate this outline. I have as well been all over to find a free well working backup solution. I have setup an incremental backup of my local mac (users) to my NAS, which is mounted on startup. Fully as described in your beginning section before remote… I have 4 Q’s that I hope you can help with?

    1) in the RsyncX scheduler window I have modified and add’ed acording your description but I cannot delete the wrong script. Do you know about this and what to do?

    2) I am constantly prompted for admin PW, can this be turned of? Or stored in RsyncX somewhere.

    3) Doing Incremental backup say daily 24 times before overwriting the first one made. Will I then on day 25 have a full backup of my users folder, as the first one made will be, or will I only have the changes from day 24 to 25. Meaning that I will have to schedule a full copy of users every 24 day in a seccond script?

    4) Does RsyncX need to be started upon each reboot in order for the schedule to work or will it work even though the application is not running?

    I hope you are willing to help me out.

    Thanks
    Mads

    • Administrator says:

      It’s not so much a question of wanting to help you as being able to help you. If you notice the date on the article it’s from 2004. I eventually gave up on using Rsync not because of any flaws in Rsync but because I constantly found myself having to make little tweaks to keep everything running (that’s why I wrote the article in the first place, to remind myself what to do) that I just didn’t have time for.

      Currently I use Time Machine but that doesn’t work at all well for me since I use File Vault (which means my personal files only get backed up when I log out and I don’t get any of the restore capabilities). Your timing is actually interesting because right now I’m trying to evaluate a new set up. My current leaning is to use something like Carbon Copy Cloner to do occasionally full bootable backups and use Jungledisk to do local incremental (encrypted) backups of key files.

      1) I vaguely remember this problem and equally vaguely remember that you could just disable the script, but it would always show up. It’s a bug in RsyncX. I don’t think RsyncX is even supported any more, not officially anyways.

      2) I don’t remember that happening so I don’t have anything useful to say.

      3) No, you won’t have to do a full backup. What’s actually happening is that each of the incremental directories is linked to the same file instances (when the files haven’t changed). So when the 25th backup is completed (again, I’m doing this from old memories) it will be linked to all the files that were in the system. Only then will the 1st backup folder be deleted. Any files that existed in the 1st backup folder and no where else (e.g. aren’t linked to any of the other backup folders) will be deleted but all the other files will be left alone. So once you get Rsync running you shouldn’t have to do anything. It will automatically maintain each and every file needed to make sure each and every backup is 100% complete. The real trick is that because it uses linking to the files if two or more folders contain the same file (e.g. the file didn’t change over the backups) then only one physical instance of the file (with two or more file system links) exist on the drive.

      4) The backups aren’t done by RsyncX. They are done by Rsync. RsyncX is just a GUI that sits on top of Rsync. When you set up the CRON jobs that is what will cause Rsync to run. So once the CRON jobs (as described in the article) are running you shouldn’t have to do anything else.

  6. Mads says:

    Thanks for your promt reply.

    Your answer to 3 is interesting. I use iPhoto. and this app is copying pictures from a folder called original to modified if the picture is modified. Wonder how rsync deals with different pictures with the same filename?

    I have actually been thinking to upgrade to Leopard to get Timemachine. My Synology NAS has Timemachine functionality, but it would be nice to have a test environment to work with it in order to see how it fits my needs. And ultimately have a backup before I do it.

    Just for info: My NAS is then daily backing up to Amazon S3 online storage. This was very easy to setup in the NAS. Which is a setup I am very happy about. But getting a consistent easy backup from the mac to the NAS seems a bit cumbersome.

    I have tried iBackup, Backuplist+, SyncSyncSync (or something the like) but I do not believe I am there yet. Looking forward to hear about your conclusion of the current evaluation of CCC and Jungledisk.

    • Administrator says:

      Actually I mistyped, I meant Crashplan not Jungledisk. I only want my backup bytes on drives I directly control.

Leave a Reply

Your email address will not be published. Required fields are marked *