Builds And Transitive Dependencies

Posted February 9, 2007
Filed under: Uncategorized |

I came to think that transitive dependencies are mostly evil. Really. They can be useful sometimes, if you keep them on a tight leash. But chances are you’re going to shoot yourself in the foot sooner or later using them. First let me illustrate with severals use cases and examples that I’ve experienced first hand.

First example, you have a simple web project and use several libraries. And 3 or 4 of them happen to depend on Xerces as almost the whole word depends on Xerces for some reason. Now you buid a WAR file and your build system is nice enough to put all your dependencies, including transitive ones, in its WEB-INF/lib directory. Nice enough? Think again. You’re going to end up with 3 or 4 different versions of Xerces in there. Which one is going to be selected at runtime? Well, do you want to bet on whether your app server follows alphabetical order or not?

Second example, you’re using XDoclet. Or the Spring Framework. Or any other framework including several modules, some necessary, some optional. All those modules sort of depend on each other. So chances are that by pulling one, you’re going to pull everything. And usually people don’t think much about the way others are going to download their own project outisde of the regular distribution, they just include everything they can think of as their dependencies. And you end up with a lot of garbage that you will never use and beat the record of the biggest software distribution ever built.

Another example. You depend on project A. Project A depends on B. And B depends on C. So far so good. Now it turns out that the repositories containing all these dependencies are mostly a mess. So somebody just comes up and removes the version of C that B depends on. Your build is broken.

A last one. This one would be actually pretty funny if it wasn’t so pathetic. On Apache Ode we have a JBI wrapper to allow deployment in a JBI container. We’re using a Maven plugin from ServiceMix to build Service Assemblies. Now this plugin happens to depend on the whole ServiceMix engine because it also includes tasks to auto deploy and run the server directly. So we end up pulling all the ServiceMix project just for a plugin. Now here is the best part: ServiceMix uses Ode. It’s part of its dependencies. So when we build our stuff for the first time, we end up downloading all ServiceMix plus all OUR stuff that we’re currently building. How crazy is that?

Conclusion

Given all this mess, what’s a build system to do? I think the transitive dependency problem has no solution, there are some techniques that can be used to keep some control but deep down it’s really flawed. Because the dependencies that are right for you can’t be guessed, just like the code you’re writing can’t be all generated.

However I think we’re still going to add it into Raven. Yep, you heard me well. And there are 2 reasons why:

It can save you a lot of time at the early stage of a project or for prototyping. It’s really nice to get a setup quickly up and running.
I already know that it’s going to be the most asked for feature. Implementing every single stupid feature that people ask for is a bad idea. However this feature is ony partially stupid and it can also make sense (see above).

So to give you weapons against chaos, pain and despair (exageration is my friend), I’d like to keep transitivity under control to allow you to opt out at any time. To do that, the transitivity would be toggable, you’d be able to turn it off and then specify everything explicitly. When you choose to do so, as we’re all a bunch of lazy asses, Raven would let you generate a dependency declaration with all the transitiveness you need. You’d then clean it up a bit to fit your needs, adding rationality to an insane accumulation of libraries, and when it’s all pretty, include it in your build.

So any other strong opinion on transitive dependencies? Ideas?

Pictures by Kevin Day and Dadooron.

12 comments so far

Adam Bouhenguel on February 9, 2007

If you’re building for Java SE, why not offer transitive dependencies through custom classloaders for each target? You can make good on each target’s needs with respect to the specific versions of the libraries they require by packaging them in an uberjar or something similar.

So when I depend on A@1.0 and B@1.0 , B may very well carry a version of A@1.1, but I don’t ever have to worry about the side effects of such a dependency. For cases where you’d be duplicating a common dependency, a simple dependency graph can help the build system reshuffle the custom classloader configurations. That of course assumes that everyone agrees on the same A@1.0.

Reply
mriou on February 9, 2007

Adam, my worry is not so much for the build system itself, except when dependencies can’t be found at all. It’s more for the runtime and the expectation that a build environment will be as close as possible to the runtime.A system of isolated classloaders will work well for the build but mostly people won’t be able to get anything running out of what it produces.

Reply
Tom Ayerst on February 10, 2007

I suspect that most of the time the there will not be a conflict and automatic transitive dependency management is just too useful to not have it.

What I would like (and I’ll start looking at how to do it). Is to have transitive dependencies but flag up conflicts for manual resolution and then capture the choice and use it in future.

Later it could offer heuristics for “best” choice etc.

Reply
mriou on February 10, 2007

Tom, I agree transitive dependency is just too useful, but conflict happens much more often than you would think. Turns out there’s not much originality in the Java world and everybody use the same libraries (commons-*, XML parsing). And then once you’re packaging all those mixed together, surprises happen and it takes time, when you’re not focused on it, to realize it’s because you have several copies of the same thing.

But I also agree that if we carefully think of it, there are solutions to these problems. We can try to be more clever than what other tools (read Maven) offer.

You’re more than welcome to look at the problem. I think I’ll also look into how to improve our Gem repository. Like including dependencies in generated gems and build “grouped” gems, like the rails one for example (that links to ActiveRecord, ActionPack, …). Because at least RubyGems asks you whether you want to install a dependency or not.

Reply
Saulius Sinkunas on March 20, 2007

BTW how many time do you have to check dependencies? It was always dangerous for me to depend on online remote repositories. That’s why we keep all dependant jars locally. This transative dependency is usefull only at startup or when you want to add some big fat library. Almost 95% of time you will reuse your existing dependant jars. And don’t want to worry that at the last stage (before production) somebody changes some jar or dependency.

Reply
mriou on March 20, 2007

The dependencies should always be downloaded only once on a given computer. The issue is when you wipe out your dependency directory, change your computer or somebody else tries to build. You could also want to build 3 years later and find that nothing’s distributed anymore.

So depending on your level of paranoia you can either startup with external dependencies handling and then check them in svn or cvs. Or you can stick with an external repository. That’s more or less your choice.

Reply
Tobias Roeser on April 29, 2007

Transitive dependencies are an essential requirement dictated by the nature of software development. Each buildsystem which claims to be a real one, has to support transitive dependendies.

The question is, at what level this support goes and how clever these dependencies are handled. Working with Maven, I can say that dependency handling cannot going worse. Comming from the Linux distribution Gentoo I’m used to a package managment system (portage) which is more clever than all tools I’ve found for the whole java and scripting language world in terms of (transitive) dependency management.

Please read my complete comment here: https://lepetitfou.dyndns.org/home/node/33

Reply
mriou on April 29, 2007

So you wouldn’t even describe Ant as “a real build system”?

Reply
Tobias Roeser on April 29, 2007

Ingenious question!

The term build system has to be defined before discussing what a real build system is. But I think you can agree, that a simple make replacement like Ant is not enough. Or at least that for the most mid-size projects with many packages/modules more is needed.

Thats the reason why we want some meta support from out build system like dependency resolution or any other task that need some cleverness to help the developer. Ant is just a DSL for shell scripting.

Reply
Tobias Roeser on April 29, 2007

Ok, maybe I mixed the terms buildsystem and package management system. Then you are right, a build system does not need to handle transitive dependencies (at least not to resolve foreign packages) but then, it depends itself on another tool (or a human) to provide all these.

This is the reason why we want more than make, Ant, Rake, (fill in your tool)… We want the buildsystem to include all these package management capabilities and thus, at least Ant fails in this term. Even Maven sometimes fails as you’ve described above. I’ve seen it revolving different packages for the same codebase but on a different computer, depending on the content of the local repository – what a crap.

Reply
pyclocioche on February 24, 2008

To me it is necessary to find

Reply
hilarious on January 23, 2010

this is a funny one:
“transitive dependency is just too useful…”

Transitive dependency is obviously bad to an experienced developer. Most developers have very little experience and think that any ‘cool sounding word’ must be ‘the next cool thing’. Well, it’s not.

How could Maven possibly know which order to pass the dependencies to the compiler?

How does a developer remove a library from their module, without understanding all of the upstream dependencies?

How do you package a module, that possibly only uses 2 or 3 libraries, but whose downstream dependencies may include 40? or 50?

Of course there’s no way to know (consider classes loaded by reflection, no static analysis could possibly determine all downstream dependencies).

Transitive dependencies:
1. introduce uncertainty into the build process
2. make removing a library from a module (or ‘project’) require recompiling all upstream dependencies
3. make it impossible to easily see what libraries a module (or ‘project’) actually uses
4. saves a little bit of XML typing

It looks like reason (4) is the logic behind the Maven team. But why did they choose XML in the first place? Because they thought it is ‘the next cool thing’.

Reply