Well… I can no longer recommend CrashPlan, even using my fix below. I recently upgraded my file server, performed a computer adoption according to CrashPlan’s instructions…. and CrashPlan lost all of my backups from that machine. All 16TB or so or them. It also lost my backup set definitions after the adoption.
CrashPlan support was… not particularly helpful. They claim they see the 16TB of data attached to the right computer GUID, but I can’t see the data in my client, and when I look at my account in the CrashPlan web interface, I see nothing. Support said it “might work” if I forced a backup to run… and it didn’t.
My CrashPlan account expires soon, and I won’t be renewing.
See my new post here for some scripts that will edit the config files and restart the CrashPlan service for you automatically!!!
If my fix doesn’t work for you, try changing the server that you’re connecting to, as described HERE. Thanks to reader Jorgen for his comment with this tip! This will, unfortunately, mean starting your backups over from scratch. 🙁
TL;DR: CrashPlan’s dedup algorithm is a bottleneck on faster network connections and/or slower CPUs. Changing a single setting inside an XML file pretty much disables the algorithm and makes things not slow to a crawl over time.
So, I recently got a shiny new DSL line installed that has an awesome 10Mbps upload speed — 10x faster than my cable connection, and the fastest upload speed of any residential service available to me that doesn’t have ridiculously low usage caps (thanks TekSavvy for your unlimited plans!).
I decided to take advantage of this increased network capacity to expand the amount of stuff that I backup using CrashPlan+ — due to my relatively slow upload speeds previously, I only backed up critical documents, leaving stuff like pictures from my SLR camera to my local backup system. With 10Mbps up… no longer! <insert BACKUP ALL THE THINGS meme image here>
Anywho… things went great. At first. I added a ton of files to my backup set, and off CrashPlan went, uploading at 9.6Mbps… then 9.4Mbps…. then 9Mbps… then 8Mbps… then 7Mbps…. and down and down and down… when it hit the 3Mbps mark and kept going down (despite me tweaking the compression/deduplication/network buffer/other settings), I figured an email to CrashPlan support was in order. The reply I received, while professional and clearly NOT just a template response, was basically “Well, 3Mbps is better than most of our users get; it’s a shared network; you’ve already done everything I can recommend.” That was clearly not what I was hoping for. I should also note that one of the things I noticed and mentioned to support was that the CrashPlan process was taking up 100% CPU time on one core, indicating it might be CPU-limited.
So I set about gathering real stats, and thanks to CrashPlan’s built in logging… I had a whole bunch of data points. Using a bit of grep, awk and sed (the 3 sweetest words I know next to Perl 😉 ), and a bit of Excel charting, I came up with this:
Hmm. Interesting. Anyone who’s studied computer science should be screaming “O(ln(n))” right now — this type of logarithmic decay screams “algorithm performance issue”. You simply do not, ever, see this pattern due to overloaded network capacity. This type of pattern, coupled with 100% CPU use, told me there was something wrong with one of CrashPlan’s features, and I strongly suspected the de-duplication functions, because I already had compression turned off (I knew most of my data wasn’t compressible) and encryption algorithms don’t decay like that.
So… I emailed CrashPlan support again with this evidence (along with a whole bunch of CompSci geek reasoning), and was told, basically, “you’re already faster than most of our customers, if you don’t like it, find another provider”. Yikes.
Talking with a few other people I know who use CrashPlan with large (multi-terabyte) data sets… this seems to be a common problem.
So… being a geek, I figured I’d check to see if I could do anything other than spend a bunch of money to upgrade the CPU in that system.
I navigated to the CrashPlan configuration directory, /usr/local/crashplan/conf/ (this is on Linux; Windows users, you’ll have to figure out there this is yourself, sorry!), and started digging. I stumbled across this gem in the file my.service.xml :
Hmm. Interesting. “0” is often used as a metavalue that means “unlimited”… so maybe CrashPlan will ALWAYS dedupe EVERYTHING when going over a WAN link, but only dedupes files smaller than 1GB when going over a LAN link, presumably because they recognize that there’s a balance between CPU capacity and network capacity. It seems that they assume everyone has ridiculously slow upload speeds that are typical of most residential Internet connections.
Being the smarty that I am, I figured I’d set it to not dedupe any files larger than 1 byte when going over the WAN:
(Note: If you use backup sets, you will have more than one of these lines, one per backup set. I suggest changing them all; I have not done any testing to see what happens if you disable it only for some of the backup sets.)
I then restarted the CrashPlan engine (/etc/init.d/crashplan restart).
And… VOILA. My CPU usage dropped from 100% of 1 core down to ~10% of 1 core, and my upload jumped from 2.5Mbps (and dropping) to 7.5Mbps (and holding/fluctuating between 6.9Mbps and 7.5Mbps). I’m now seeing patterns that more accurately cover the “variable network bandwidth due to shared service” theory, without seeing logarithmic decay in performance.
I have confirmed with other people that have noticed slowdowns on large data sets that this fix works for them as well, so I can confirm it’s not just something weird on my system.
I updated my 2nd ticket with CrashPlan support with this information, suggesting that they expose the max-file-size-for-dedupe inside the client, rather than making people go dig through XML files.
Update: Once I reached non-peak hours, I’m starting to see upload speeds >9Mbps again as well. Yay!
Update 2: Check out my followup blog post with an updated traffic graph HERE