Monday, August 2, 2010

Kake: A build system with no build files

UPDATE: Renamed to "Ekam" because "Kake" apparently looks like a misspelling of a vulger German word. Link below updated.

I finally got some real coding done this weekend.

http://code.google.com/p/ekam/

Kake is a build system which automatically figures out what to build and how to build it purely based on the source code. No separate "makefile" is needed.

Kake works by exploration. For example, when it encounters a file ending in ".cpp", it tries to compile the file. If there are missing includes, Kake continues to explore until it finds headers matching them. When Kake builds an object file and discovers that it contains a "main" symbol, it tries to link it as an executable, searching for other object files to satisfy all symbol references therein.

You might ask, "But wouldn't that be really slow for a big codebase?". Not necessarily. Results of previous exploration can be cached. These caches may be submitted to version control along with the code. Only the parts of the code which you modify must be explored again. Meanwhile, you don't need to waste any time messing around with makefiles. So, overall, time ought to be saved.

Current status

Currently, Kake is barely self-hosting: it knows how to compile C++ files, and it knows how to look for a "main" function and link a binary from it. There is a hack in the code right now to make it ignore any symbols that aren't in the "kake2" namespace since Kake does not yet know anything about libraries (not even the C/C++ runtime libraries).

That said, when Kake first built itself, it noticed that one of the source files was completely unused, and so did not bother to include it. Kake is already smarter than me.

Kake currently requires FreeBSD, because I used kqueue for events and libmd to calculate SHA-256 hashes. I thought libmd was standard but apparently not. That's easy enough to fix when I get a chance, after which Kake should run on OSX (which has kqueue), but will need some work to run on Linux or Windows (no kqueue).

There is no documentation, but you probably don't want to actually try using the existing code anyway. I'll post more when it is more usable.

Future plans

First and foremost, I need to finish the C++ support, including support for libraries. I also need to make Kake not rebuild everything every time it is run -- it should remember what it did last time. Kake should also automatically run any tests that it finds and report the results nicely.

Eventually I'd like Kake to run continuously in the background, watching as you make changes to your code, and automatically rebuilding stuff as needed. When you actually run the "kake" command, it will usually be able to give you an immediate report of all known problems, since it has already done the work. If you just saved a change to a widely-used header, you might have to wait.

Then I'd like to integrate Kake into Eclipse, so that C++ development can feel more like Java (which Eclipse builds continuously).

I'd like to support other languages (especially Java) in addition to C++. I hope to write a plugin system which makes it easy to extend Kake with rules for building other languages.

Kake should eventually support generating makefiles based on its exploration, so that you may ship those makefiles with your release packages for people who don't already have Kake.

Kake will, of course, support code generators, including complex cases where the code generator is itself built from sources in the same tree. Protocol Buffers are an excellent test case.

To scale to large codebases, I'd like to develop a system where many Kake users can share some central database which keeps track of build entities in submitted code, so that Kake need not actually explore the whole code base just to resolve dependencies for the part you are working on.

7 comments:

  1. Do you have a plan for avoiding an explosion of build artifacts when there are enough build variants that you can't afford to build them all (in terms of CPU usage and/or disk space)?

    At some point, isn't your tool going to need human guidance to choose between 40 languages x 50 countries x 6 host OSs x 4 DEBUG levels x 3 different (binary-compatible!) versions of a third-party library, etc., etc.?

    I find that the difficult part about most build systems is that there are a ridiculous number of ways that you *could* fit together the puzzle pieces to compile, but only a small number of those are interesting at any given point...and the painful part is figuring out how to say "no" to zillions of permutations and "yes" to the single configuration that I need now.

    How do you plan to stop kake from being a resource-hungry monster?

    ReplyDelete
  2. Cool idea, very interesting :D

    ReplyDelete
  3. @mk: I think any particular build would only use *one* configuration. The default would probably be to compile for the host OS, language, and country of the build machine, "-O2 -g", and the newest version of each library that is available on the system. The person running the build would be able to override individual settings as needed.

    ReplyDelete
  4. When you say "The person running the build would be able to override individual settings as needed"...does that mean that instead of:
    make release

    I'll have to know to type:
    ekam -O2 -g -DDEBUG=0 --and=\"so\ on\"

    Or if the ekam rule for "release" is in a file, what differentiates that file format from make or ant or rake files?

    ReplyDelete
  5. Makefiles today contain lots of information about the specific files which are to be built. That's what I'm trying to eliminate. You'd still want build configurations to be defined somewhere.

    Note that Java has had this sort of thing for a long time (e.g. Eclipse automatically builds everything as you work). Ekam will provide this for C++, and for multi-language projects.

    ReplyDelete
  6. This is a nice idea. I imagine this could be really useful for converting projects away from messy legacy build systems like autotools.

    I know that in the past I have tried to do manually what Ekam does automatically: I've gone around the loop of trying to add files by hand until a program compiles and links, and it gets tedious very quickly.

    Can Ekam discover compile-time options, such as for files that require "-D_GNU_SOURCE" or "`pkg-config --cflags gtk+-2.0`" or even "-O1"? (I think there are parts of GNU libc that will fail to link if you compile without optimisation (!), because it requires dead code removal to remove some symbol references.)

    ReplyDelete
  7. @Mark: Auto-detecting the need for something like -D_GNU_SOURCE is probably infeasible. But Ekam should be able to auto-detect most of the things that pkg-config normally produces (e.g. -I, -L, and -l flags). As for -O1, anything which *requires* that to compile is broken and Ekam is probably not going to be able to help it. :) Though it is interesting to think that when compiling with optimization, Ekam will automatically drop dependencies that are only called from dead code.

    For -D_GNU_SOURCE, I think the right solution is to define this at the top of any source file that requires it. But, Ekam should also have a way to specify these options explicitly for the cases where there is no reasonable way to detect them.

    ReplyDelete