loading twitter...

May 2008

Arrow: first results

I’ve completed enough code with Arrow to get some results for local backups. Eventually, I want to be able to do remote backups over an SSH tunnel, similar to how rsync works, but this is a good start on trying out the idea. To run this test I wrote a simple program that backs up single files, specified by commands read from stdin, and a Python harness that drives the backup.

This first test used all Linux kernel versions in the 2.6.23 series; that is, every kernel version from 2.6.23 to 2.6.23.17. This table shows the results:


Patch Patch Size Source Size Backup Size Time Files Delta
0 0 252,065,213 283,107,750 451.08 22,530 0
1 3,218 252,065,491 283,119,871 91.83 2 12,121
2 34,402 252,066,473 283,208,456 89.34 18 88,585
3 65,218 252,072,178 283,362,188 102.47 53 153,732
4 94,870 252,073,453 283,591,942 143.06 70 229,754
5 127,389 252,077,945 283,912,484 181.44 81 320,542
6 182,957 252,082,973 284,389,157 193.47 115 476,673
7 187,496 252,084,315 284,831,665 220.82 113 442,508
8 188,895 252,084,342 285,277,506 229.20 107 445,841
9 221,604 252,087,238 285,811,410 238.36 132 533,904
10 284,301 252,091,147 286,524,145 240.94 181 712,735
11 283,078 252,091,071 287,204,874 254.57 175 680,729
12 281,040 252,090,640 287,865,620 256.07 158 660,746
13 283,099 252,090,838 288,510,953 255.74 254 645,333
14 283,830 252,090,842 289,146,161 257.06 146 635,208
15 541,078 252,147,239 290,212,238 265.04 228 1,066,077
16 541,304 252,147,258 291,108,027 267.25 217 895,789
17 555,389 252,150,368 292,006,750 277.29 218 898,723

I’m fairly happy with the way it performs, and with the savings between versions. Of course, the savings aren’t optimal, and there is a significant amount of overhead with all the hashes and error-correction codes stored along with the data, but I’m satisfied that it doesn’t inflate the space used significantly more than the plain text-based diff. Also, the actual disk space used by the chunks is significantly larger than the space above, since each block is pre-allocated, and many of the slots are empty (this could be optimized, by allocating the keys and headers ahead of time, but allocating the space for the chunks on an as-needed basis).

Next I’m going to try implementing the SSH tunneling, at least, a prototype of it. Counting the network bandwidth used in a network backup is the next test.

Uncategorized

Comments (0)

Permalink

Circuit 17

Circuit 17

syndicated

Comments (0)

Permalink

Circuit 17

Circuit 17

syndicated

Comments (0)

Permalink

Arrow: Base-64 on the file system

Arrow stores most everything as files. Blocks, which can store multiple chunks, identified by the chunk’s hash code, are stored in sequentially-numbered files, starting from zero and going to whatever (internally, this is a 64-bit unsigned integer, so it can’t be more than 264, but I don’t think that will be an issue ;-) . Version files, lists of chunks that comprise the underlying file, are identified by a random GUID.

I initially made all these file names base-64, with a minor modification: instead of ‘/’, I used ‘*’. This gave me something I can store on sensible file systems, it doesn’t waste too much space with the encoding, and it’s really simple to encode and decode. The trouble happened when I tried running arrow on Mac OS X, which by default has a case-insensitive file system [1]. Arrow failed pretty spectacularly when it tried writing more than 26 blocks into the store.

I could have rewritten the storage parts to use something like hexadecimal, but instead I just replaced the lower-case letters with other letters from the ASCII alphabet. It just so happens that if you avoid control and space characters, and characters that can’t be a part of file paths (/ and .), you have just enough characters to cover the 26 you can’t use! I know I could have used space characters, or characters between 128 and 255, but I opted to stay in ASCII, and came across an alphabet that works:

ABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&'(),-:;<>?@[~]^_`{|}0123456789+*

I thought it was pretty neat that ASCII had just enough to cover what I needed.

I’ve put some of the code up on-line. I don’t have permission to publish all of it, but I’ll see if I can get it.

1. You can format HFS+ as case-sensitive, but this tends to not work well in practice. There’s still lots of legacy issues.

Uncategorized

Comments (0)

Permalink

The Fraud Hits

Today I had to call my bank and cancel my credit card. It looks like there were fraud attempts, and it seems the only response to that is to go nuclear: wipe out the account.

Long story short, I tried ordering a new power adapter for my laptop, since the one I have is starting to flake out on me — if the cord is held in the wrong position, no power goes to the laptop. I’m not a hardware engineer, but yeah, that’s not good. I was alerted that the charge for this order could not go through, and I’m at my parent’s house this weekend, so I couldn’t get any verification calls directly.

The charges were from something called “Scholastic at Home,” and you don’t have to search for very long to find out that this is a clear source of a lot of fraud, going on for at least two years now. The changes were small, and I found references to “test charges,” which are small charges that test if the charges get challenged or not.

You should pay attention to charges on your credit card. If something looks unfamiliar, do a web search for the company. You might hit on something like this, which seems to be a shotgun-style fraud attempt. Someone got a million credit card numbers, and is trying them out to see if any can be used. It’s likely profitable even if you need to deal with some percentage of people alert enough to stop it.

Credit cards seem to be an increasingly disposable thing. Security is still hard to get right, we use cards in more and more places, and people still make lots of money by exploiting them. They’re convenient, but risky to use, so you’ll need to destroy them regularly. Never use anything tied directly to your bank account. Your credit card is a virtual buffer for money you currently have. I’ve personally just regarded credit cards as only this: a buffer, which is more convenient to use than cash. I never let debt sit on my card.

Uncategorized

Comments (0)

Permalink