HPC-Pipes

Content

A 2-day course on best practices for setting up, running, and sharing reproducible bioinformatics pipelines and workflows

The course will provide guidance on how to automate data analysis using common workflow languages such as Snakemake or Nextflow. Subsequently, we will delve into ensuring the reproducibility of pipelines and explore available options. Participants will learn how to share their data analysis and software with the research community. We will also delve into different strategies for managing the produced research data. This includes addressing the challenges posed by large volumes of data and exploring computational approaches that aid in data organization, documentation, processing, analysis, storing, sharing, and preservation. These discussions will encompass the reasons behind the increasing popularity of Docker and other containers, along with demonstrations on how to effectively utilize package and environment managers like Conda to control the software environment within a workflow. Finally, participants will learn how to manage and optimize their pipeline projects on HPC platforms, using compute resources efficiently.

Learning Outcomes

A student who has met the objectives of the course will:

Understand the general process of building a robust pipeline (regardless of data type) using
- workflow languages
- environment/package managers
- optimized HPC resources
- FAIRly managed data and tools
Be able to design their own custom pipelines with tools appropriate for their individual analysis needs

Requirements

The workshop is for researchers, professors, and PhD students at SUND who seek to acquire skills in effectively managing data and analyses in bioinformatics.

Knowledge of R/Python and bash is required, as well as basic understanding of an omics analysis pipeline.

We strongly recommend taking this course after completing the course HPC-Launch, a single day course which covers theoretical concepts for HPC and RDM in health data science.

Expected Frequency

1-2 times a year

The course is organized by the Health Data Science Sandbox

HeaDS