An article titled Falsehoods programmers believe about build systems recently made the rounds on the Internet (additional comments on Reddit). As I've written about before, I think build systems are could be better, so this list of "incorrect" assumptions that build systems make is pretty interesting. One thing that this list neglects is that these assumptions are not mistakes for some companies, projects, or languages. I think this is why there are so many build systems: its easy to write for a small number of projects because you can make a bunch of simplifying assumptions. Then as the system is used for more projects, it inevitably needs more complexity. This is why I like Ninja: I think its minimal focus on only describing rules to transform input files into output files is large enough to be useful, but small enough to still be simple.
However, even Ninja makes assumptions that are violated by some languages, such as Java. Here are a bunch of ways that the way Java is "compiled" is unusual, and tends to break build tools like Ninja.
Output directories: The recommended approach is to put .java
files in a directory hierarchy that matches the package structure. However, this is not required. Thus, package/subpackage/A.java
could end up creating otherpackage/A.class
.
Multiple, unpredictable output files: A single .java
file can produce multiple .class
files. Worse yet, you can't figure out the names without parsing the file. While nested types will produce files named name$*.class
, which is relatively predictable, Java allows you to include non-nested, package private classes in the same file. As an example:
public class Wtf { public static final int V = 1; } class PackagePrivate { public static final int U = 2; }
Compiling Wtf.java
produces both Wtf.class
and PackagePrivate.class
. This causes a problem for incremental builds since the build system may be unaware of these "extra" classes. For example, if PackagePrivate
is removed from Wtf.java
, the build system needs to know to delete PackagePrivate.class
. Otherwise, classes that depend on the PackagePrivate
class will continue to compile instead of generating a compiler error.
Implicit compilation of dependencies: When javac is used on a single .java file, it will also implicitly compile all the classes it depends on (technically the transitive closure of all dependencies). This means build tools need to do something complicated to avoid unnecessary recompilation, and it complicates parallel builds. This is actually sort of required by the language. Javac must at least parse all dependencies, since Java permits circular dependencies between classes. This can also cause problems when creating jar files, since the build tool may be unaware that other packages should be included as well.
No easy way to get dependencies from the compiler: In order to do incremental builds correctly, the build system needs to know about all dependencies, so it can recompile everything that depends on a given class when that class is updated. In the case of Java, this is easy to miss since because of the JIT, your builds will work until someone introduces a change to the API, at which point you get a runtime error. Sadly, javac knows this information, but doesn't have any way to output it.
These are the reasons that many Java build systems have broken incremental builds. For example, it is basically impossible to do correct incremental builds with ant, at least the last time I checked. Thankfully Java compiles are usually fast enough that full rebuilds are acceptable.