As the tone of this post will hopefully convey, I'm a big fan of open source software (OSS) in general. I use it daily, and help support OSS with reproducible bug reports, usability feedback, patches. Any code I write and share online I release as open source software too. This also applies to my day job doing bioinformatics - I should probably disclose at this point that just over a year ago I was elected to the board of directors of the Open Bioinformatics Foundation (OBF). I believe that the openness in open source software is vital for scientific software - a binary blob is a black box, with only the author's description to tell you what they think it does with your data (and in some cases they don't even tell you that).
Commercial vs Academic
One of the practical issues with an "academic only"/"non-commercial" licence is where does that line lie? Research done at a university is probably OK... but what if the grant is part funded by an industrial parter, as in the BBSRC Industrial Partnerhip Awards (IPA), or a CASE studentship (as scheme offered by most of the RCUK funding agencies)? Every license like this seems to be different, meaning wasted time reviewing their terms - something you don't have to worry about with a mainstream OSS licence.
How about government funded research institutes? I know from first hand experience that their academic status is often viewed as borderline. In one case Apple said yes, and granted access to their Education Store with its lower prices (very worthwhile for their computers). On the other hand, The MathWorks said no, meaning they charged their extortionate full prices for MatLab (later they introduced a third category for such borderline cases). That wasn't a problem for me personally as I'm quite happy using Python + NumPy + matplotlib instead, all OSS and free.
Then there are research organisations which do a mixture of 'pure' academic work and analysis as a service. For example, many sequencing centres offer to ship you just raw data, raw data plus a paid analysis service, or a collaborative approach.
In my experience, every free for "academic only"/"non-commercial" usage licence is unique, and frankly I don't want to waste time reading it - but you need to. For instance, a licence may require each individual user to agree - which is a logistics nightmare if you want to wrap the software as a service (for example in Galaxy). There are similar concerns if there are usage restrictions based on location - multiple sites or campus might not be covered.
The initial version of the GATK v2 licence was a particularly jaw dropping example: It required that you only disclose the results of data analysis to other academic non-commercial users! That was quickly addressed following user out-cry (see this thread and the Twitter comments from July 2012).
Mixed licenses for a large codebase are another headache - if you're only able or willing to accept one of the licences on option, sorting out which parts of the code this gives you access to (and if that is enough for your needs) can be another barrier to usage.
No redistribution or packaging
One of the practical benefits of OSS is the licences allow anyone (even companies) to modify and redistribute the code. In particular, Linux distributions and similar efforts on Windows (e.g. Cygwin) and Mac OS X (e.g. macports), can provide repositories of packaged software with meta-data describing inter-dependencies, which can be downloaded and installed at the click of a button or a single command at the terminal. This is enormously useful. Similarly, OSS licences allow the creation and sharing of virtual machines as way of sharing preconfigured systems.
On a smaller scale, webserver or GUI front ends written to provide a user friendly interface to command line tools also benefit from being able to bundle the underlying tools (both for ease of install, but also to avoid changes with different versions of a dependency).
None of that is possible with "academic only"/"non-commercial" licenses.
Why use a free for academic use only licence?
The only reasons I can understand to do this are about money and control. Overzealous university intellectual property agents might think there is money to be made selling software to commercial users - which could be defensible in some poorly funded cases I suppose. In the case of control, having a novel tool unique to your group can give you a leg up over your academic rivals - rationalised on the grounds that the method itself has been published so your rivals can reimplement it if they want to. I find that ethically distasteful.
Why are the Broad doing this? Why are they making things worse?!
Initially GATK v1 was MIT licensed (one of the simplest and most liberal OSS licenses).
According to the Broad, they were getting requests for support and therefore wanted to offer that as a commercial service. That's fine - but that doesn't explain why they didn't continue releasing all of GATK under an open source licence while selling optional support (a proven commercial strategy as used by RedHat Linux). I've yet to hear a clear answer from the Broad on this.
In July 2012 for GATK v2, a mixed model was adopted - the functionality of GATK v1 remained open source under the name GATK-Lite, but new functionality would be released without source code, and restricted to commercial users for a fee, or available free of charge to academic non-commercial users. People complained, especially about not being able to even see the source.
I didn't like the idea at the time, but I felt the announced hybrid approach for GATK v2 didn't seem too bad, providing they stuck to the outlined plan where the core "GATK Lite" remained open source, and closed source functionality in the full GATK would migrate to it in time (sadly this did not happen). Quoting from the July 2012 GATK v2 announcement:
GATK-Lite isn't a dead-end branch of GATK1. All GATK-Lite infrastructure will be fully supported -- to the same degree as GATK1 -- by the GSA team, as we will rely on these tools day-in and day-out. GATK-Lite is evolve in lock-step with the full GATK, GATK-Lite and GATK(-full) will carry the same release numbers, and will be pushed out by the GSA group simultaneously. As we add new file formats to the GATK (BCF2, for example) these changes will go into the core of GATK, and be available through both GATK and GATK-Lite.And their FAQ claimed:
Will you ever make the new GATK 2.0 tools open source?Yes, over time we plan to migrate closed source tools into the open source branch of the GATK.This month (Jan 2013), the Broad Institute announced new licence terms for GATK v2.4 - they still offer a free-as-in-beer option for academic use only (but this time including source code - the only good news), or the option to buy a commercial licence with support via their partner company. The bad news was the open source GATK-Lite has been dropped, with only the core programming framework remaining open source under the MIT license.
Note that some of the previously open source analysis tools ("walkers") which were released with GATK-Lite are no longer open source - causing considerable inconvenience to to groups already using this (and breaking the expectations laid with the release of GATK v2 and GATK-Lite).
To my mind, the description of GATK-Lite given in July 2012, and how it was described in Jan 2013 are very different:
Second, we did a poor job of communicating the purpose of Lite and how it differed from the Full version. Even though Lite was always intended as an interim solution, some organizations opted to adopt it instead of the Full version and seem to view it as a viable long-term solution for genetic analysis.I doubt anyone outside the Broad Institute was surprised by the fact that lots of people and groups adopted GATK Lite? It was described as the open source core of GATK v2, with new functionality to be added over time. But now GATK Lite is being dropped.
GATK licensing in a nutshell
GATK v1 was 100% open source, as of July 2012 only GATK-Lite was open source, as of Jan 2013 even less was open source (only the core GATK framework).
The core framework does remain open source (MIT licensed), but that is all. Some of the previously open source analysis tools ("walkers") which were released under GATK-Lite are no longer open source - causing considerable inconvenience to to groups already using this (and breaking the expectations laid with the release of GATK v2 and GATK-Lite).
The GATK v2 suite on top of the framework is not open source. If you're eligible, you can use the free-as-in-beer non-commercial academic license - which lets you see the source code. Since this is not free-as-in-speech, this is a look but don't touch arrangement. If it is even possible to edit the source code and recompile it, you won't be able to share your changes. If you want to reuse their implementation ideas in your own software, you can't. Clearly this is antithetical to the ideals of using open source software in science - something that I had thought was now mainstream in bioinformatics.
I'm upset because I care.
P.S. See also Mick Watson's post about why the GATK re-licensing matters, and this growing GATK Twitter archive on Storify.