Monday, October 11, 2010

Ekam finds includes, runs tests

Sorry for missing the last couple weeks. I have been working on Ekam, but didn't have time to get a whole lot done and didn't feel like I had anything worth talking about. But now, an update! As always, the code can be found at:

Ekam now compiles and runs tests:

Here you can see it running some tests from protocol buffers. It automatically detects object files which register tests with Google Test and links them against gtest_main (if necessary). Then, it runs them, highlighting passing tests in green, and printing error logs of failing tests which show just the failures:

By the way, compile failures look pretty nice too:

Is that... word wrap? And highlighting the word "error" so you can see it easily? *gasp* Unthinkable!

Intercepting open()

Ekam now intercepts open() and other filesystem calls made by the compiler. It works by using LD_PRELOAD (DYLD_INSERT_LIBRARIES on Mac) to inject a custom shared library into the compiler process. This library defines its own implementation of open() which makes an RPC to the Ekam process in order to map file names to their actual physical locations. Essentially, it creates a virtual filesystem, but is lighter-weight than FUSE. This serves two purposes:

  1. Ekam can detect what the dependencies of an action are, e.g. what headers are included by a C++ source file, based on what files the compiler tries to open. With this information it can construct the build graph and determine when actions need to be re-run. (See previous blog entries about how Ekam manages dependencies.)
  2. Ekam can locate dependencies just-in-time, rather than telling the compiler where to look ahead of time. For example, Ekam does not need to set up the C++ compiler's include path to cover all the directories of everything the code might include. Instead, the include path contains only a single "magic" directory. When the compiler tries to open a file in that directory, Ekam searches for that file among all public headers it knows about across the source tree.

The down side of this approach is that every OS has quirks that must be worked around. FreeBSD seems to be the least quirky: the only thing I found weird about this system is that I had to define both open() and _open() to catch everything (and similarly for all other system calls I wanted to intercept). OSX and Linux, meanwhile, have some ridiculous hacks going on in which certain system calls are remapped to different names at compile time, so the injected library has to be sure to intercept the remapped name instead of the original. I still am running into some trouble on Linux where some calls to open() seem to bypass my code, while others hit it just fine. I need to spend more time working on it, but I do not have a Linux machine at home. (Maybe someone else would like to volunteer?)

Try it out

If you are running FreeBSD or OSX, you can try using Ekam to build protobufs or your own code (currently broken on Linux as described above). I've updated the quick start guide with instructions.


  1. Rather than intercepting system calls, you may want to look into integrating with llvm/clang, which have a more modular approach and probably would work better for you. It is likely llvm/clang will kill off gcc over the next couple years anyway, so you probably won't lose anything. ;)

  2. Actually, I was originally planning to do exactly that. I'm a big fan of Clang. However, it occurred to me that intercepting system calls would be a more general solution. It will work not just with Clang and GCC, but also javac, protoc, and other tools. In fact, right now I'm even using it when running tests, in order to find the test data files.