2013-09-20

Using Travis-CI for testing Galaxy Tools

Travis CI is one of the best things to happen to GitHub in some time - it adds automated testing capabilities to your source code repository as changes are committed, and even on pull requests to help ensure new work doesn't break existing functionality.

We've been using this for Biopython for over a year, but this month I've started using TravisCI for testing my add-ons for the Galaxy Project as well. My Galaxy tools (see also Cock et al. 2013) were already being tested every night once uploaded to the Galaxy Tool Shed, and I always stage releases via the Galaxy Test Tool Shed before posting them on the main Galaxy Tool Shed. However this fixed nightly schedule isn't very flexible for debugging failures.

Galaxy BLAST tools:
Galaxy sequence analysis tools:

I've currently got TravisCI working for my two Galaxy tool repositories on GitHub. Both configurations follow the same basic approach, which I have tried to explain in this post, and run the tests as soon as I update GitHub.

Using TravisCI for Galaxy Tool development

The core idea of a TravisCI setup is you define a special file .travis.yml in the root of your GitHub repository, which explains how to setup and run your tool's tests. TravisCI monitors your repository via the GitHub API, and automatically triggers tests which it runs on a farm using Virtual Machine images.

TravisCI has ready made Virtual Machine images for the major programming languages, such as Python, and supports multiple versions of them (e.g. Python 2.6, 2.7, 3.3, etc) which is great for cross platform testing. However, this isn't really what we need for Galaxy Tools, which can be written in any language, and need a Galaxy Instance to run inside. As long as it includes a system-level installation of Python 2.6 or 2.7, the TravisCI image can run Galaxy - the best image to pick comes down to the dependencies of the tools you wish to test. For now I just pick the TravisCI Java image, because some of the tools I want to test require Java which isn't included on the TravisCI Python or Perl images - whereas the Java image does include system level Perl and Python.

I've got this working for two GitHub repositories so far:

Both follow the same approach, which I have tried to explain in the .travis.yml comments.

Installing Galaxy

First we must fetch a copy of Galaxy, which would normally be done like this:

$ hg clone https://bitbucket.org/galaxy/galaxy-dist

However, an hg clone is slow and we don't need the full history. Instead we can use wget to grab the latest version:

$ wget https://bitbucket.org/galaxy/galaxy-dist/get/stable.tar.bz2

This is faster, but in my tests using git clone was even faster, and a wget from GitHub faster still. The only catch is there isn't an official mirror of Galaxy on GitHub, so I've been piggy-backing on John Chilton's GitHub mirror of Galaxy. Like me, John seems happier working with git than with hg.

Once downloaded, Galaxy needs to create some configuration files by copying the default sample files, and setup the database (using SQLite by default). The run.sh script seems to be the only place that does this - but we don't want to run the Galaxy server. This needs to happen before we can run the tests, and thus far this is the most elegant solution I have found to do it:

$ ./run.sh --stop-daemon || true

This setups the configuration, checks to see if Galaxy is running so it can be stopped, and fails because Galaxy wasn't already running. That would return an error code and abort the TravisCI run, so we hide the error via the true command. This is a hack, but it works for now.

Configuring/Installing Galaxy Tools

I decided to follow the old-school manual setup route pre-dating the Galaxy Tool Shed, which works by moving the tool files under Galaxy's tools folder, and adding the tool XML to the tool_conf.xml listing. Actually I replace the default listing with hundreds of tools I don't want to test with a minimal XML file listing just my tools.

Configuring/Installing Tool Dependencies

This is actually the first thing my .travis.yml script does. This can require downloading and compiling them, and adding them to the $PATH or setting the appropriate environment variable. Right now this duplicates the effort put into a tool_dependencies.xml file for automating this via the Galaxy Tool Shed.

Configuring/Installing Data-Types

Again, I followed the old-school manual route pre-dating the Galaxy Tool Shed, which works by adding the definition to the core datatypes. I do this by providing a customised datatypes_conf.xml (from which I have removed most of the file formats to speed things up), and where Python code is needed move this into the Galaxy lib folder. This is fiddly, but so far I only have a handful of BLAST datatypes where this is needed.

Running the tests

Finally we use the Galaxy script to run the tests, which could be as simple as:

$ ./run_functional_tests.sh

However, that would run all the tests for every tool listed. In the tool_conf.xml file I've created a dedicated section so instead I request just that be tested:

$ ./run_functional_tests.sh -sid Continuous-Integration-Travis

I seem to be missing a dependency needed by default which would produce an HTML report file - so what I ended running was a little less clean:

$ python ./scripts/functional_tests.py -v `python tool_list.py Continuous-Integration-Travis`

This mimics what the wrapper shell script does via the section identifier argument.

Conclusions

It is early days, but having spent the time and effort getting this to work (debugging the dependency installation is a bit tricky), it should pay dividends down the line and complement the nightly tests via the Galaxy Tool Shed - and of course testing locally on my development setup.

It would be interesting to explore if setting up a local Tool Shed (running within the TravisCI instance) would be a better way to handle tool and dependency installation. This would have the benefit of also testing my tool_dependencies.xml files work.

There is an open issue for speeding up the Galaxy functional test framework which would be helpful in general, but is also a potential issue on TravisCI as the worker nodes are subject to time limits.

No comments:

Post a Comment