Arrow stores most everything as files. Blocks, which can store multiple chunks, identified by the chunk’s hash code, are stored in sequentially-numbered files, starting from zero and going to whatever (internally, this is a 64-bit unsigned integer, so it can’t be more than 264, but I don’t think that will be an issue ;-). Version files, lists of chunks that comprise the underlying file, are identified by a random GUID.
I initially made all these file names base-64, with a minor modification: instead of ‘/’, I used ‘*’. This gave me something I can store on sensible file systems, it doesn’t waste too much space with the encoding, and it’s really simple to encode and decode. The trouble happened when I tried running arrow on Mac OS X, which by default has a case-insensitive file system [1]. Arrow failed pretty spectacularly when it tried writing more than 26 blocks into the store.
I could have rewritten the storage parts to use something like hexadecimal, but instead I just replaced the lower-case letters with other letters from the ASCII alphabet. It just so happens that if you avoid control and space characters, and characters that can’t be a part of file paths (/ and .), you have just enough characters to cover the 26 you can’t use! I know I could have used space characters, or characters between 128 and 255, but I opted to stay in ASCII, and came across an alphabet that works:
ABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&'(),-:;<>?@[~]^_`{|}0123456789+*
I thought it was pretty neat that ASCII had just enough to cover what I needed.
I’ve put some of the code up on-line. I don’t have permission to publish all of it, but I’ll see if I can get it.
1. You can format HFS+ as case-sensitive, but this tends to not work well in practice. There’s still lots of legacy issues.