Loading…
GCC BOSC 2018 has ended
The 2018 Galaxy Community Conference (GCC2018) and Bioinformatics Open Source Conference 2018 (BOSC2018) are meeting together in Portland, Oregon, United States, June 25-30, 2018.  There will be two days of training, a two+ day meeting, and four days of intense collaboration.  The meeting features joint & parallel sessions, shared keynotes, poster & demo sessions, birds-of-a-feather, and social events.  GCCBOSC is organized by Oregon Health & Science University and will be at Reed College.

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

A. Training [clear filter]
Sunday, June 24
 

3:00pm

Conference desk open
Checkin for your lodging and pick up your conference badge and materials.

The registration desk is on the first floor of the Performing Arts Building.

Need to checkin after 11pm?  We'll provide a way.   

Sunday June 24, 2018 3:00pm - 11:00pm
PAB Atrium Performing Arts Building, Reed College
 
Monday, June 25
 

8:00am

Conference desk open
Pick up your conference package and checkin to on-campus lodging.  The registration desk is on the first floor of the Performing Arts Building.

Monday June 25, 2018 8:00am - 10:00pm
PAB Atrium Performing Arts Building, Reed College

8:00am

Training Day 1 - Galaxy

Monday June 25, 2018 8:00am - 10:00pm
Performing Arts Building 3017 SE Woodstock Blvd, Portland, OR 97202, USA

9:00am

Galaxy 101 - A gentle introduction to using Galaxy
→ Tutorial

Key:  BB | XB |  -  |  -  | GG |  -

This workshop will focus on introducing the Galaxy user interface and how it can be used to analyze large datasets. We will cover the basic features of Galaxy, including where to find tools, how to import and use your data, and an introduction to workflows. This session is recommended for anyone who has not used, or only rarely uses Galaxy.
First of 3 sessions in the Introduction to Using Galaxy series.
Prerequisites:
  • Little or no experience using Galaxy.
  • A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best.


Speakers
avatar for Bérénice Batut

Bérénice Batut

Post-doc, University of Freiburg
Bérénice is a post-doc in the Freiburg Galaxy Team (Backofen lab). She analyzes biological data (metagenomics, HTS data, evolution) and develops tools to facilitate data analysis, mainly for Galaxy.Bérénice is also passionate about training. She regularly gives workshops on data... Read More →
JW

Joachim Wolff

Albert-Ludwigs-University Freiburg



Monday June 25, 2018 9:00am - 11:30am
PAB 104 Performing Arts Building, Reed Campus

9:00am

Intro to Galaxy Administration 1
→ Training Material

Key:  -  |  -  | IP |  -  |  -  | CL

These 3 workshops introduce Galaxy administration, including
  • installing and configuring your own server
  • customizing your set of reference genomes
  • installing tools from the Galaxy Tool Shed
  • defining custom tools with Planemo
  • users and groups
  • storage management and quotas
  • and more 

Prerequisites:
  • Knowledge and comfort with the Unix/Linux command line interface and a text editor. If you don't know what cd, mv, rm, mkdir, chmod, grep and so on can do then you will struggle in these workshops

Computational infrastructure for this workshop is generously provided by Jetstream, NSF's user-friendly cloud computing environment  for researchers.

Speakers
avatar for Enis Afgan

Enis Afgan

Research scientist, Galaxy Project, Johns Hopkins University
avatar for Marius van den Beek

Marius van den Beek

Institut Curie, Paris
avatar for John Chilton

John Chilton

Galaxy Project, Penn State University
avatar for Nate Coraor

Nate Coraor

Galaxy Project, Penn State University
avatar for Carrie Ganote

Carrie Ganote

Indiana University Bloomington
avatar for Simon Gladman

Simon Gladman

Melbourne Bioinformatics, The University of Melbourne
avatar for Nuwan Goonasekera

Nuwan Goonasekera

University of Melbourne
avatar for Martin Čech

Martin Čech

Dev, Galaxy Project, Penn State University
Galaxy Project, Penn State University


Monday June 25, 2018 9:00am - 11:30am
PAB 320 Performing Arts Building, Reed Campus

11:30am

Lunch
Monday lunch will be in the Reed Commons Cafe.  You'll pay for lunch with the swipe card issued to you when you checkin for the conference.

Monday June 25, 2018 11:30am - 12:30pm
Reed Commons Cafe Reed College Commons, Reed Campus

12:30pm

Intro to Galaxy Administration 2
→ Training Material

Key:  -  |  -  | IP |  -  |  -  | CL

These 3 workshops introduce Galaxy administration, including
  • installing and configuring your own server
  • customizing your set of reference genomes
  • installing tools from the Galaxy Tool Shed
  • defining custom tools with Planemo
  • users and groups
  • storage management and quotas
  • and more 

Prerequisites:

  • Knowledge and comfort with the Unix/Linux command line interface and a text editor. If you don't know what cd, mv, rm, mkdir, chmod, grep and so on can do then you will struggle in these workshops

Computational infrastructure for this workshop is generously provided by Jetstream, NSF's user-friendly cloud computing environment  for researchers.

Speakers
avatar for Enis Afgan

Enis Afgan

Research scientist, Galaxy Project, Johns Hopkins University
avatar for Marius van den Beek

Marius van den Beek

Institut Curie, Paris
avatar for John Chilton

John Chilton

Galaxy Project, Penn State University
avatar for Nate Coraor

Nate Coraor

Galaxy Project, Penn State University
avatar for Carrie Ganote

Carrie Ganote

Indiana University Bloomington
avatar for Simon Gladman

Simon Gladman

Melbourne Bioinformatics, The University of Melbourne
avatar for Nuwan Goonasekera

Nuwan Goonasekera

University of Melbourne
avatar for Martin Čech

Martin Čech

Dev, Galaxy Project, Penn State University
Galaxy Project, Penn State University


Monday June 25, 2018 12:30pm - 3:00pm
PAB 320 Performing Arts Building, Reed Campus

12:30pm

RNA-Seq Analysis with Galaxy
Key:  BB | XB |  -  |  -  | GG |  -

This workshop will introduce the concepts behind transcriptomics with NGS data and how to analyze this data in Galaxy. Specifically, this workshop will focus on de novo transcriptome reconstruction of RNA-seq data with the following goals:
  • comprehensive identification of all transcripts across an experiment
  • appropriately annotating classes of transcripts
  • generating abundance estimates across a transcriptome
  • significance testing of differentially expressed transcripts
  • visualisation of reads and transcript structures
Second of 3 sessions in the Introduction to Using Galaxy series.
Prerequisites:
  • a general knowledge of Galaxy (for example, you should be familiar with the material in Galaxy 101 or have attended Introduction to Galaxy).
  • a wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best.

Speakers
avatar for Anton Nekrutenko

Anton Nekrutenko

Galaxy Project, Penn State University
Penn State University


Monday June 25, 2018 12:30pm - 3:00pm
PAB 104 Performing Arts Building, Reed Campus

3:00pm

Break
Breaks on Monday will be in the Reed Commons.

Monday June 25, 2018 3:00pm - 3:30pm
Reed Commons Cafe Reed College Commons, Reed Campus

3:30pm

Galaxy For Proteogenomics !
Workshop Slides
Workshop Tutorial
Workshop Server (will go away at some point)

Key:  BB | XB |  -  |  -  | GG |  -

Large-scale ‘omics’ data generation is driven by high throughput genome and transcriptome sequencing, and proteome characterization using mass spectrometry. As a result, many researchers are turning to generating integrative analysis of these ‘multi-omics’ datasets given the great potential to provide novel biological insights. These multi-omics applications are particularly challenging for data analysis, as they require the use of multiple, domain-specific software programs on scalable infrastructure capable of handling the computing and storage needs of this large-scale data.

Galaxy has been shown to solve the problems of software integration and scalability for multi-omics analysis, and additionally offer benefits for sharing complete workflows in a user-friendly environment accessible by wet-bench scientists. In this workshop, we will introduce the use of Galaxy platform for multi-omics data analysis applications. As a representative example, we will focus on the maturing field of proteogenomic applications. Proteogenomics most commonly integrates RNA-Seq data, for generating customized protein sequence databases, with mass spectrometry-based proteomics data, which are matched to these databases to identify novel protein sequence variants. (Cancer Res. (2017); 77(21):e43-e46. doi: 10.1158/0008-5472.CAN-17-0331.)

Workshop Content:
The course will include a basic introduction to proteomics and will include a hands-on session that will cover use of analytical workflows to generate proteomic databases from RNASeq data. The use and understanding of modules for analytical workflows for proteogenomics analysis will be discussed.

Workshop Schedule:

  • Introduction to multi-omic studies
  • RNASeq Data Processing: Generating protein sequence search databases using Galaxy platform
  • Hands-on session for proteomics data analysis using Galaxy
  • Identification of novel proteoforms and visualization

Workshop Goals:

  • Introduce the Galaxy framework as a solution for data analysis across ‘omics’ domains
  • Demonstrate use of Galaxy for a proteogenomic analysis (RNA-seq and proteomic integrative analysis)
  • Lay the foundation for attendees to implement Galaxy at their own facility or institution to meet ‘omics’ data analysis and informatics needs

Third of 3 sessions in the Introduction to Using Galaxy series.

Prerequisites:

  • a general knowledge of Galaxy (for example, you should be familiar with the material in Galaxy 101 or have attended Introduction to Galaxy).
  • a wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best.

Computational infrastructure for this workshop is generously provided by Jetstream, NSF's user-friendly cloud computing environment  for researchers.

Speakers
avatar for Tim Griffin

Tim Griffin

Professor, University of Minnesota
avatar for Pratik Jagtap

Pratik Jagtap

Research assistant Professor, University of Minnesota
University of Minnesota
avatar for JJ Johnson

JJ Johnson

Senior Software Developer, Minnesota Supercomputing Institute, University of Minnesota
Galaxy for genomics and proteomics
avatar for Praveen Kumar

Praveen Kumar

Graduate Student, University of Minnesota
avatar for Subina Mehta

Subina Mehta

Researcher, University of Minnesota




Monday June 25, 2018 3:30pm - 6:00pm
PAB 104 Performing Arts Building, Reed Campus

3:30pm

Intro to Galaxy Administration 3
→ Training Material

Key:  -  |  -  | IP |  -  |  -  | CL

These 3 workshops introduce Galaxy administration, including
  • installing and configuring your own server
  • customizing your set of reference genomes
  • installing tools from the Galaxy Tool Shed
  • defining custom tools with Planemo
  • users and groups
  • storage management and quotas
  • and more 

Prerequisites:

  • Knowledge and comfort with the Unix/Linux command line interface and a text editor. If you don't know what cd, mv, rm, mkdir, chmod, grep and so on can do then you will struggle in these workshops

Computational infrastructure for this workshop is generously provided by Jetstream, NSF's user-friendly cloud computing environment  for researchers.

Speakers
avatar for Enis Afgan

Enis Afgan

Research scientist, Galaxy Project, Johns Hopkins University
avatar for Marius van den Beek

Marius van den Beek

Institut Curie, Paris
avatar for John Chilton

John Chilton

Galaxy Project, Penn State University
avatar for Nate Coraor

Nate Coraor

Galaxy Project, Penn State University
avatar for Carrie Ganote

Carrie Ganote

Indiana University Bloomington
avatar for Simon Gladman

Simon Gladman

Melbourne Bioinformatics, The University of Melbourne
avatar for Nuwan Goonasekera

Nuwan Goonasekera

University of Melbourne
avatar for Martin Čech

Martin Čech

Dev, Galaxy Project, Penn State University
Galaxy Project, Penn State University


Monday June 25, 2018 3:30pm - 6:00pm
PAB 320 Performing Arts Building, Reed Campus

6:00pm

Dinner on your own
Dinner is on your own on Monday night.  The Woodstock neighborhood is nearby and has many choices.  The Reed Commons may also be open for dinner (watch this space).

Monday June 25, 2018 6:00pm - 7:30pm
TBA

7:30pm

Bioinformatics Training and Education with the Galaxy Training Network
Slides

Key:  -  | XB | IP |  -  | GG |  -

Galaxy with its flexibility, reproducibility, and scalability is an ideal environment for teaching and training diverse scientific topics.
The Galaxy Training Network is a community initiative dedicated to high-quality Galaxy-based training around the world. One of its objectives is to support trainers with complete training material and recommendations about bioinformatics training. Templates and best training practices were defined to help trainers create new material, unify the different material, and make training materials more accessible and easy for users to learn and for teachers to teach (https://training.galaxyproject.org).
This workshop will introduce participants to the infrastructure of the GTN training materials and describe how to generate training materials following best practices. Participants will be introduced to the generation of Galaxy Interactive Tours and Docker Galaxy flavors dedicated to teaching and training sessions. The workshop will also cover best practices for running Galaxy-based workshops - how to plan a training session based on number of attendees, time constraints, resource availability, etc.

Speakers
avatar for Bérénice Batut

Bérénice Batut

Post-doc, University of Freiburg
Bérénice is a post-doc in the Freiburg Galaxy Team (Backofen lab). She analyzes biological data (metagenomics, HTS data, evolution) and develops tools to facilitate data analysis, mainly for Galaxy.Bérénice is also passionate about training. She regularly gives workshops on data... Read More →


Monday June 25, 2018 7:30pm - 10:00pm
PAB 320 Performing Arts Building, Reed Campus
 
Tuesday, June 26
 

8:00am

Training Day 2 - All topics
Tuesday offers 5 parallel tracks and presents a wide range of topics throughout the day.  Topics were nominated and then voted on by the Galaxy and OBF communities.  


Tuesday June 26, 2018 8:00am - 6:00pm
Reed College Campus

8:00am

Conference desk open
Checkin for today or tomorrow and pickup your conference registration materials.

Tuesday June 26, 2018 8:00am - 10:00pm
PAB Atrium Performing Arts Building, Reed College

9:00am

Community built analyses that run everywhere with bcbio
Slides

Key:  -  | XB | IP |  -  |  -  | CL

bcbio
is community built analyses for germline and somatic variant calling, RNA-seq, single cell, smallRNA and ChIP-seq. This workshop will focus on using bcbio to run analysis pipelines in heterogenous environments -- local machines, HPC, cloud providers and commercial services (and also hopefully Galaxy). Attendees will learn how to practically run their analyses on their platform of choice, while discussing how the community can contribute to building, sharing and maintaining workflows across multiple platforms. Recent presentations of bcbio show some of the topics we plan to cover.

Prerequisites

Speakers
avatar for Brad Chapman

Brad Chapman

Bioinformatics Core, Harvard Chan School
Brad is a senior research scientist in the Bioinformatics Core at Harvard T.H. Chan School of Public Health. He develops open source community built infrastructure for analyzing biological data, focusing on multi platform interoperability and automated validation against heterogeneous... Read More →
RS

Radhika S. Khetani

Harvard T.H. Chan School of Public Health
avatar for Lorena Pantano Rubino

Lorena Pantano Rubino

Research Scientist, Harvard Chan School
I work with small RNAs, mainly detected through sequencing technology. I like visualization and multi-omic integration methods. :)



Tuesday June 26, 2018 9:00am - 11:30am
PAB 332 Performing Arts Building, Reed Campus

9:00am

Conda and Containers
→ Tool Dependencies and Conda:          Slides, PDF
→ Tool Dependencies and Containers:  
SlidesPDF

Key:  -  |  -  | IP | TD |  -  | CL

This workshop is aimed at people with some desire to develop dependencies for tools (either Galaxy or Common Workflow Language (CWL) tools).
We believe the best practice for declaring dependencies for either Galaxy or the CWL is using Conda and Bioconda. Conda is a cross platform package manager that has minimal requirements to use which makes it ideal for HPC. It can also build isolated environments ideal for platforms like Galaxy or CWL implementations. The Bioconda project is a set of Conda recipes for bioinformatics. The BioContainers project builds best practice containers automatically for all Bioconda recipes, so building a BioConda recipe for a package allows the same binaries to be used by both Galaxy (inside or outside a container) and by any conformant CWL implementation.
We will go through the process of creating, testing, and publishing a Bioconda package and we will work through an example of connecting these packages to a real world tool. Participants will be able to work through the examples using either Galaxy or CWL tools.
Prerequisites
  • Some knowledge of tool development - either CWL or Galaxy.
  • Knowledge and comfort with the Unix/Linux command line interface and a text editor. If you don't know what cd, mv, rm, mkdir, chmod, grep and so on can do then you will struggle in these workshops

Speakers
avatar for John Chilton

John Chilton

Galaxy Project, Penn State University
avatar for Björn Grüning

Björn Grüning

University of Freiburg
University of Freiburg
avatar for Martin Čech

Martin Čech

Dev, Galaxy Project, Penn State University
Galaxy Project, Penn State University



Tuesday June 26, 2018 9:00am - 11:30am
PAB 320 Performing Arts Building, Reed Campus

9:00am

Data Carpentry Genomics Workshop: Data Organization and Automation with Shell I
Key:  BB | XB |  -  |  -  | GG | CL

This is the first session of a 3 session series.

The goal of this one-day, three-part tutorial is to teach participants the fundamental data management and analysis skills and needed to conduct genomics research including: best practices for organization of bioinformatics projects and data as well as use of command line utilities for managing large volumes of genomic data. This tutorial is derived from the Data Carpentry Genomics Workshop, focused on the organization and intro to command line lessons. The tutorial uses active learning and hands-on practice to teach participants the skills and perspectives needed to work effectively with genomic data. Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain.
Lessons:
The tutorial assumes no prior experience with the tools covered. However, learners are expected to have some familiarity with biological concepts, including nucleotide abbreviations. Participants should bring their laptops and plan to participate actively.
Prerequisites



Speakers
avatar for Tracy K. Teal

Tracy K. Teal

Executive Director, The Carpentries
Dr. Teal is the Executive Director of The Carpentries and a co-founder of Data Carpentry. After completing her PhD in Computation and Neural Systems at California Institute of Technology, her work as an NSF Postdoctoral Fellow focused on bioinformatics approaches to metagenomics and microbial community analysis. Then as an Assistant Professor of Microbiology and Molecular... Read More →


Tuesday June 26, 2018 9:00am - 11:30am
PAB 131 Performing Arts Building, Reed Campus

9:00am

Handling integrated biological data using Python (or R) and InterMine
→ Tutorial and slides

Key:  BB | XB |  -  |  -  | GG | CL

This tutorial will guide you through loading and analyzing integrated biological data (generally genomic or proteomic) in InterMine via an API in Python or R. Topics covered will include automatically generating code to perform queries, customising the code to meet your needs, and automated analysis of sets, e.g gene sets, including enrichment statistics. Skills gained can be re-used in any of the dozens of InterMines available, covering a broad range of organisms and dedicated purposes, from model organisms to plants, drug targets, and mitochondrial DNA.
Prerequisites
  • Basic Python or R skills are advantageous but not required.
  • WiFi-enabled laptop with Python or R Studio.

Speakers
avatar for Daniela Butano

Daniela Butano

Research SofwareEngineer, InterMine, University of Cambridge
LR

Leyla Ruzicka

ZFIN, University of Oregon
avatar for Yo Yehudi

Yo Yehudi

Software Developer, InterMine (University of Cambridge)
Integrated genomic data (InterMine)


Tuesday June 26, 2018 9:00am - 11:30am
PAB 104 Performing Arts Building, Reed Campus

9:00am

WDL: the Workflow Description Language
Training Materials

Key:  -  | XB | IP | TD |  -  | CL

The advent of open, portable workflow languages is an exciting development which allows for the definition of a workflow to be decoupled from the execution. One can create workflows which can run unmodified on local compute, HPC clusters, or the cloud. As these languages are not tied to a specific execution environment, the descriptions can easily be shared, discovered and even composed together to form more complex workflows. The Workflow Description Language (WDL) is an open, community driven standard that is designed from the ground up as a human-readable and -writable way to express portable tasks and workflows.
In this session we’ll walk through the lifecycle of writing, sharing and discovering portable workflows in WDL. We’ll introduce the WDL syntax. You’ll learn how to write and run a workflow locally. We’ll demonstrate how to use EPAM’s Pipeline Builder toolto visualize WDL workflows. We’ll look at Dockstore, an open platform where one can publish, share and discover workflows. Finally we’ll put everything together and see how we can compose workflows we find on Dockstore together with our own additions to create new, more powerful workflows.
Prerequisites:
  • Mac or Linux computer
  • Java 8 installed and runnable from the command line
  • A workshop bundle downloaded from TBD

Speakers
avatar for Geraldine Van der Auwera

Geraldine Van der Auwera

Associate Director of Outreach and Communications, Broad Institute
I run outreach and communication efforts to the global scientific research community on behalf of the Data Sciences Platform (DSP) at the Broad Institute. My team's efforts include technical support, documentation, social media presence and user education events for users of software... Read More →
avatar for Jeff Gentry

Jeff Gentry

Senior Principal Software Engineer, Broad Institute
Jeff leads the development of the Cromwell workflow engine at the Broad Institute and is in the leadership groups for both the Workflow Description Language (WDL) as well as the Common Workflow Language (CWL). He has over 15 years of experience developing software for the bioinformatics... Read More →
CL

Chris Llanwarne

Broad Institute



Tuesday June 26, 2018 9:00am - 11:30am
Vollum Lounge Vollum College Center, Reed Campus

11:30am

Lunch
Tuesday lunch will be in the Reed Commons Cafe.  You pay for lunch with the swipe card issued to you when you checkin for the conference.

Tuesday June 26, 2018 11:30am - 12:30pm
Reed Commons Cafe Reed College Commons, Reed Campus

12:30pm

Data Carpentry Genomics Workshop: Data Organization and Automation with Shell II
Key:  BB | XB |  -  |  -  | GG | CL

This is the second session of a 3 session series.

The goal of this one-day, three-part tutorial is to teach participants the fundamental data management and analysis skills and needed to conduct genomics research including: best practices for organization of bioinformatics projects and data as well as use of command line utilities for managing large volumes of genomic data. This tutorial is derived from the Data Carpentry Genomics Workshop, focused on the organization and intro to command line lessons. The tutorial uses active learning and hands-on practice to teach participants the skills and perspectives needed to work effectively with genomic data. Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain.
Lessons:
The tutorial assumes no prior experience with the tools covered. However, learners are expected to have some familiarity with biological concepts, including nucleotide abbreviations. Participants should bring their laptops and plan to participate actively.
Prerequisites

Speakers
avatar for Tracy K. Teal

Tracy K. Teal

Executive Director, The Carpentries
Dr. Teal is the Executive Director of The Carpentries and a co-founder of Data Carpentry. After completing her PhD in Computation and Neural Systems at California Institute of Technology, her work as an NSF Postdoctoral Fellow focused on bioinformatics approaches to metagenomics and microbial community analysis. Then as an Assistant Professor of Microbiology and Molecular... Read More →


Tuesday June 26, 2018 12:30pm - 3:00pm
PAB 131 Performing Arts Building, Reed Campus

12:30pm

Galaxy Interactive Environments
Key:  -  |  -  | IP |  -  |  -  | CL

In this session you will get in-depth introduction to Interactive Environments (IE). You will learn how to setup and secure IE’s (JupyterRStudio, etc.) in a production Galaxy instance. Moreover, we will create an IE on-the-fly to get you started in creating your own Interactive Environments.In this session you will get an introduction to Interactive Environments (IE) as an easy and powerful way to integrate arbitrary interactive web services into Galaxy. We will demonstrate the IPython Galaxy Project and the general concept of IE’s.
Prerequisites:
  • Basic understanding of Galaxy from a developer point of view.
  • General knowledge about Docker
  • Knowledge and comfort with the Unix/Linux command line interface and a text editor. If you don't know what cd, mv, rm, mkdir, chmod, grep and so on can do then you will struggle in this workshop.
  • A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best.

Speakers
avatar for Björn Grüning

Björn Grüning

University of Freiburg
University of Freiburg


Tuesday June 26, 2018 12:30pm - 3:00pm
PAB 332 Performing Arts Building, Reed Campus

12:30pm

Introduction to Common Workflow Language
Slides

Key:  -  | XB | IP | TD |  -  | CL

Common Workflow Language (http://commonwl.org) is a standard for writing portable scientific workflows that can execute on a variety of compute environments and workflow systems.
  • What is CWL, history
  • Status of CWL implementations (Arvados, Toil, Rabix, Galaxy, Cromwell, cwtool, ...)
  • Wrapping bioinformatics tools in CWL
  • Connecting tools together into workflows
  • Best practices for writing portable workflows
  • Hands on "bring your own pipeline" session
  • Q&A on advanced topics based on audience
Prerequisites
  • Unix command line experience
  • Experience with Docker

Speakers
avatar for Michael R. Crusoe

Michael R. Crusoe

Project Lead & Co-founder, Common Workflow Language project
Michael R. Crusoe is one of the co-founders of the CWL project and is the CWL Project Lead. His facilitation, technical contributions, and training on behalf of the project draw from his time as the former lead developer of C. Titus Brown's k-h-mer project, his previous career as... Read More →
KH

Kenzo-Hugo Hillion

Center of Bioinformatics, Biostatistics and Integrative Biology (C3BI); Institut Pasteur de Paris
avatar for Hervé Ménager

Hervé Ménager

Research Engineer, Institut Pasteur
Bioinformatics and Biostatistics Hub of the C3BI, Institut Pasteur



Tuesday June 26, 2018 12:30pm - 3:00pm
PAB 320 Performing Arts Building, Reed Campus

12:30pm

Practical use of the Galaxy API command line tools
→ Slides
→ All Training Material

Key:  -  | XB | IP | TD |  -  | CL

How to use the Galaxy API to automate workflows.
Galaxy has an always-growing API that allows for external programs to upload and download data, manage histories and datasets, run tools and workflows, and even perform admin tasks. This session will cover various approaches to access the API, in particular using the BioBlend Python library.
Prerequisites
  • Unix command line
  • Basic understanding of Galaxy from a developer point of view.
  • Python programming.
  • A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best.

Speakers
avatar for Dannon Baker

Dannon Baker

Galaxy Project, Johns Hopkins University
Talk to me about Galaxy Development, especially the UI!
avatar for Marius van den Beek

Marius van den Beek

Institut Curie, Paris
avatar for Nicola Soranzo

Nicola Soranzo

Earlham Institute



Tuesday June 26, 2018 12:30pm - 3:00pm
Vollum Lounge Vollum College Center, Reed Campus

12:30pm

Setting up for success: Everything you need to know when planning for an RNA-seq analysis I
→ Training Materials

Key:  BB | XB |  -  |  -  |  -  |  -

Note: This is the first session of a two-session workshop.


This two-part workshop is geared towards researchers who are thinking about conducting an RNA-seq experiment and are interested in knowing more about what is involved. The planning process requires taking a step back to evaluate various factors and ultimately assess the feasibility of the experiment, including thinking about potential pitfalls and how to avoid them. The workshop will go into detail about the different strategies for working with RNA-seq data depending on the biological question being addressed. Specific topics include:
  • Best practice guidelines for experimental design (Biological replicates, Paired-end vs Single-end, Sequencing depth).
  • Data storage and computational requirements.
  • Overview of commonly used workflows for differential gene expression, de-novo assembly, isoform quantification and other uses of RNA sequencing.
The focus of this workshop is to outline current standards and required resources for the analysis of RNA sequencing data. This workshop will not provide an exhaustive list of software tools or pipelines available; rather it aims to provide a fruitful discussion on how best to prepare for performing RNA-seq data analysis from the lab to manuscript preparation.
Prerequisites:
  • None

Speakers
RS

Radhika S. Khetani

Harvard T.H. Chan School of Public Health
MM

Meeta Mistry

Bioinformatics Analyst and Trainer, Harvard T.H Chan School of Public Health
I'm a wet-lab bench scientist turned bioinformatician who loves to play with data. When I am not consulting in bioinformatics analysis, I am teaching it.
MP

Mary Piper

Harvard T.H. Chan School of Public Health


Tuesday June 26, 2018 12:30pm - 3:00pm
PAB 104 Performing Arts Building, Reed Campus

3:00pm

Break
Breaks on Tuesday will be close the training venues.

Tuesday June 26, 2018 3:00pm - 3:30pm
PAB Atrium Performing Arts Building, Reed College

3:30pm

Command line workflow management systems: Snakemake and Nextflow
Snakemake Tutorial

Key:  -  | XB | IP | TD |  -  | CL

This session introduces command line and text based workflow management systems by the examples of Snakemake and Nextflow. We will start with an overview of concepts and challenges then spend an hour on each of these platforms.
Snakemake
We will show how to define a workflow in the Snakemake workflow language, and how to execute it using the Snakemake command line interface. In particular, we will show how Snakemake enables reproducible science by allowing
  • automation of every step of a data analysis from raw data to final figures
  • scalability of the workflow to any major computing architecture (compute server, cluster, grid, cloud) without having to modify the workflow definition
  • portability of the workflow by integration with the Conda package manager and Singularity containers.
Nextflow
This session will introduce the Nextflow framework, the tool basic concepts and how it enables the definition and the deployment of large-scale distributed computational pipeline in a portable and reproducible manner across cloud and clusters. In particular it will be discussed:
  • installation and introduction to the dataflow processing model
  • workflows parallelisation and scalability
  • portable workflows containerisation with Docker, Singularity and Shifter
  • cloud deployment strategies
Prerequisites
  • Linux command line experience

Speakers
JK

Johannes Köster

Genome Informatics, Institute of Human Genetics, University of Duisburg-EssenDepartment of Medical Oncology, Harvard Medical Schoolhttps://koesterlab.github.io
avatar for Paolo Di Tommaso

Paolo Di Tommaso

Research Software engineer, Center for Genomic Regulation (CRG)


Tuesday June 26, 2018 3:30pm - 6:00pm
Vollum Lounge Vollum College Center, Reed Campus

3:30pm

Data Carpentry Genomics Workshop: Data Organization and Automation with Shell III
Key:  BB | XB |  -  |  -  | GG | CL

This is the third session of a 3 session series.

The goal of this one-day, three-part tutorial is to teach participants the fundamental data management and analysis skills and needed to conduct genomics research including: best practices for organization of bioinformatics projects and data as well as use of command line utilities for managing large volumes of genomic data. This tutorial is derived from the Data Carpentry Genomics Workshop, focused on the organization and intro to command line lessons. The tutorial uses active learning and hands-on practice to teach participants the skills and perspectives needed to work effectively with genomic data. Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain.
Lessons:
The tutorial assumes no prior experience with the tools covered. However, learners are expected to have some familiarity with biological concepts, including nucleotide abbreviations. Participants should bring their laptops and plan to participate actively.
Prerequisites

Speakers
avatar for Tracy K. Teal

Tracy K. Teal

Executive Director, The Carpentries
Dr. Teal is the Executive Director of The Carpentries and a co-founder of Data Carpentry. After completing her PhD in Computation and Neural Systems at California Institute of Technology, her work as an NSF Postdoctoral Fellow focused on bioinformatics approaches to metagenomics and microbial community analysis. Then as an Assistant Professor of Microbiology and Molecular... Read More →


Tuesday June 26, 2018 3:30pm - 6:00pm
PAB 131 Performing Arts Building, Reed Campus

3:30pm

Galaxy Architecture
Slides (PDF)

Key:  -  |  -  | IP |  -  |  -  |  -

Want to know the big picture about what is going on inside Galaxy? This workshop will give participants a practical introduction to the Galaxy code base with a focus on changing those parts of Galaxy most often modified by local deployers and new contributors.
The workshop will include the following specific content:
  • A description of the various file and top-level directories in the Galaxy code base.
  • An overview of important Python modules - including models, tools, jobs, workflows, visualisations, and API controllers.
  • An overview of important Python objects and concepts in the Galaxy codebase - including the Galaxy transaction object ("trans"), the application object ("app") , and the configuration object ("config").
  • An overview of various plugin extension points. - An overview of important JavaScript modules that power the front-end.
  • An overview of important JavaScript concepts used by Galaxy - in particular Backbone MVC, Webpack, ES6, and Vue.
  • An overview of the client build system used to generate compressed JavaScript, cascading stylesheets, and other static web assets.
  • A demonstration of a complete start-to-finish modification of Galaxy - including forking the project on Github, modifying files, running the tests, checking style guidelines, committing the change, pushing it back to your local Github fork, and opening a pull request.
  • A brief description of other projects in the Galaxy ecosystem (CloudMan, the Tool Shed, Ephemeris, bioblend, docker-galaxy-stable, Pulsar, and Planemo).
Prerequisites:
  • Your interest.

Speakers
avatar for John Chilton

John Chilton

Galaxy Project, Penn State University
avatar for Nate Coraor

Nate Coraor

Galaxy Project, Penn State University



Tuesday June 26, 2018 3:30pm - 6:00pm
PAB 320 Performing Arts Building, Reed Campus

3:30pm

GATK4: What's new and how to run it
→ Prep
Tutorial

Key:  BB | XB |  -  |  -  | GG | CL

GATK4 is the brand new version of the Genome Analysis Toolkit (GATK), an open-source genomics software package focused on variant discovery. This workshop will highlight the key changes and updates in the new 4.0 version which was released in January 2018 (see https://software.broadinstitute.org/gatk/gatk4).

The goal of the workshop is to equip participants with the essential know-how to get started with GATK4, whether they have previously used GATK or not. We will walk participants through hands-on exercises aimed at developing familiarity with the GATK4 command line. We will also guide participants through several different ways of running GATK4 tools and pipelines on publicly available platforms including but not limited to Galaxy. We hope this will help participants understand what are their options and choose the platform that best suits their needs and abilities.

The workshop will cover the following topics:

1. Introduction to GATK and the Best Practices:
  • Purpose, variant calling basics and standard data types

2. What's new in GATK4:
  • New syntax/invocations, performance improvements and tips & tricks for using GATK effectively
  • Expanded scope of analysis:
    • Scaling germline variant discovery with GenomicsDB
    • Calling somatic short variants with the new and improved Mutect2
    • Calling somatic copy number variants with GATK CNV

3. Options for running GATK
  • Running tools individually on a laptop (with Docker)
  • Running Spark-capable tools individually on a Spark cluster (via Google Dataproc)
  • Running pipelines on a laptop using Cromwell
  • Running pipelines on Google Cloud using Cromwell + Pipelines API
  • Running pipelines on the FireCloud analysis portal through the web GUI
  • Running pipelines on FireCloud through the API + Python bindings
  • Running tools and pipelines on Galaxy

Free credits will be provided for running on Google Cloud. For more information on FireCloud, see https://software.broadinstitute.org/firecloud.

Prerequisites:
  • Basic familiarity with terms and concepts of genetics and genomics, including high-level understanding of high-throughput sequencing technologies and file formats used in genomic analysis.
  • Basic familiarity with the command line environment and usage of command line tools.
  • NO familiarity is expected with Spark or cloud computing concepts.


Speakers
avatar for Geraldine Van der Auwera

Geraldine Van der Auwera

Associate Director of Outreach and Communications, Broad Institute
I run outreach and communication efforts to the global scientific research community on behalf of the Data Sciences Platform (DSP) at the Broad Institute. My team's efforts include technical support, documentation, social media presence and user education events for users of software... Read More →
KN

Kate Noblett

Broad Institute



Tuesday June 26, 2018 3:30pm - 6:00pm
PAB 332 Performing Arts Building, Reed Campus

3:30pm

Setting up for success: Everything you need to know when planning for an RNA-seq analysis II
Key:  BB | XB |  -  |  -  |  -  |  -

Note: This is the second session of a two-session workshop.


This two-part workshop is geared towards researchers who are thinking about conducting an RNA-seq experiment and are interested in knowing more about what is involved. The planning process requires taking a step back to evaluate various factors and ultimately assess the feasibility of the experiment, including thinking about potential pitfalls and how to avoid them. The workshop will go into detail about the different strategies for working with RNA-seq data depending on the biological question being addressed. Specific topics include:
  • Best practice guidelines for experimental design (Biological replicates, Paired-end vs Single-end, Sequencing depth).
  • Data storage and computational requirements.
  • Overview of commonly used workflows for differential gene expression, de-novo assembly, isoform quantification and other uses of RNA sequencing.
The focus of this workshop is to outline current standards and required resources for the analysis of RNA sequencing data. This workshop will not provide an exhaustive list of software tools or pipelines available; rather it aims to provide a fruitful discussion on how best to prepare for performing RNA-seq data analysis from the lab to manuscript preparation.
Prerequisites:
  • None

Speakers
RS

Radhika S. Khetani

Harvard T.H. Chan School of Public Health
MM

Meeta Mistry

Bioinformatics Analyst and Trainer, Harvard T.H Chan School of Public Health
I'm a wet-lab bench scientist turned bioinformatician who loves to play with data. When I am not consulting in bioinformatics analysis, I am teaching it.
MP

Mary Piper

Harvard T.H. Chan School of Public Health


Tuesday June 26, 2018 3:30pm - 6:00pm
PAB 104 Performing Arts Building, Reed Campus