Reproducible in silico genomics training at the CRG in Barcelona

Reproducibility of in-silico pipelines analysis has become one of biology’s most pressing issues. The exponential growth of biological datasets, increasingly complex data analysis methods and the lack of community standards all present major challenges. These obstacles are exacerbated when considering the installation, deployment and maintenance of bioinformatics pipelines across the diverse range of computational platforms and configurations on which these applications are expected to be applied (workstations, clusters, HPC, clouds, etc.).

The training unit at the Centre for Genomic Regulation (CRG), in collaboration with Foster plus, organized the Nextflow: reproducible in silico genomics workshop in Barcelona on 14 and 15 September, 2017. Nextflow is an open source software enabling the reproducibility of complex computational data analysis workflows. It addresses the problem of scientific reproducibility by making code easily re-usable and deployable across very different platforms. The team behind Nextflow have created a powerful tool integrated with other popular technologies and industry standards such as Git, GitHub and Docker. “Nextflow enables researchers to easily use software containers technology, wrapping up all the software of an analysis and ensures the results can be replicated by anyone, anywhere” explains CRG group leader Cedric Notredame.

The training was organised across two days and structured in two main sessions. It combined talks, demos, a tutorial/workshop for beginners as well as two hackathon sessions for more advanced users. In the first session there were selected talks focused on the problem of reproducibility in bioinformatics pipelines. Speakers from leading institutions and organisations, such as Pasteur Institute, King’s College London, Synthetic Genomics, Roche Sequencing and Amazon among the others, introduced their use cases, best practices and how they have applied Nextflow to enable reuse, collaboration and transparent results of their computational genomics data analyses. The second session included an introductory course on the Nextflow programming environment for novice users and a parallel hackathon for expert users that provided the possibility to share and collaborate together on selected projects. Nextflow project lead, Paolo Di Tommaso (CRG), describes how “getting together in the same room helped foster new collaborations and strengthen existing ties with users and developers”. During the hackathon, coordinated by Evan Floden (CRG), several contribution proposals emerged and in the end, five diverse ideas were chosen for communal development ranging from new pipelines through to the addition of new features in Nextflow. “The hackathon format allows for productive, constructive work to occur in an open and informal environment” Di Tommaso noted.

(credits to Anna Sole)

Check out the short movie from the event.
All material, course content, assignments and the hackathon projects is available at the following GitHub repository.
For more information on the event check out the Nextflow blog: Nexflow Hackathon 2017