Birch

The class at UCSC that I just finished was great: it consisted of reading a bunch of academic papers (4 per week, for about 10 weeks), and doing a final project, and nothing else. My project was to write “a metadata search file system,” which renders a metadata search query as a file system: directories represent queries, and a file exists in a directory if it matches the query.

For the search, I used the Spotlight system included in Mac OS X. The concept is simple: a directory is a Spotlight query (that is, it encapsulates a NSMetadataQuery instance) and files in that directory are files that match that query. A directory path forms a conjunction, so if there are two queries foo and bar, the path foo/bar will contain only files that match foo and bar. The application has a little user interface that lets you add, delete, and modify queries; you can add queries to other queries (for later use in conjunctions) by creating a new directory with the same name as another pre-defined query (query/directory names are global, and using a new name simply creates a brand-new query).

The file system is implemented as an NFS server, so it can exist only in user-space (and, because there is no API like FUSE for OS X yet). This works reasonably well, even though there are bugs in the OS X kernel that allows an NFS server that sends malformed replies to crash the kernel (it is a divide by zero bug, and I filed a bug with Apple about the issue). The server represents all files (which, of course, are matches to queries) as symbolic links, and this works really well: applications like Finder resolve the links, so as soon as you tell it where to find the file, it leaves you alone, making the I/O handling much simpler (the file system does implement some real file I/O, but only for special files like .DS_Store, and some other things Finder uses; it only buffers these files in memory, though).

Why do we want to do this? I mean, Finder already has “smart folders,” which can be used to make a Spotlight search look like a folder. The issue is that you can’t use that folder in any other applications (well, you can use it through the open/save dialogs, which does take care of some uses). Rendering the search as a POSIX file system means you can use those search results in any program you already have, without rewriting or recompiling that program.

In general I think the project was successful, as an experiment: there are a bunch of wrinkles in the system, such as how certain operations (like readdir) can take a long time to finish, which interacts badly with many programs (Finder goes completely out to lunch while it is doing a readdir, for example). I don’t think any of these issues are inherent, though, to this kind of system: using some kernel-level support, we can optimize this further, as well as plug in to the mechanism the kernel uses for the kevent and fsevent interfaces.

Birch is a Mac OS X application, and I’ve released it under the General Public License. The code is hosted at Google’s code hosting, and a demo-worthy binary is available as a Universal binary.

The name “Birch” comes from the kind of tree outside my window. It’s also a little ironic, since it names something that does away with trees after a kind of tree :-)