Course description
Programming skills and software tools for building automated bioinformatics pipelines and computational biology analyses. Emphasis on UNIX tools and R libraries for distilling raw sequencing data into interpretable results. This course is aimed at students familiar with UNIX and with some programming experience in python, R, or C/C++.
Instructional staff
Please click on the links above for email addresses.
Meeting times and locations
Classes:
Monday and Wednesday, 9:00-10:20 am, Foege S110 (http://www.washington.edu/home/maps/southcentral.html?gnom).
Class Slack:
We will use Slack during class and outside of class to communicate, share code snippets, ask and answer questions. The class slack is here:
You will receive an invitation to join prior to the first class.
Office hours:
- No official office hours. Post questions on Slack as needed.
Prerequisites
- Substantial background in molecular and cellular biology, genetics, biochemistry, or related disciplines.
- Familiarity with UNIX.
- Some programming experience in python, R, or C/C++.
- Students are encouraged to have taken GENOME559 and/or GENOME560.
Course requirements
- The course involves hands-on programming during class time. We will use the GS compute cluster, so make sure you can log into it from your computer remotely.
- All programming projects are due by the start of class on the date listed.
- You are welcome to talk to classmates about principles for solving problems, but please do not share code or program together. In many ways, writing your own code is where you will learn the most for this class.
Examinations
There will be no examinations.
Course grade
Grades will come 50% from the programming projects and 50% from class participation.
Course materials
We will read from several online resources and tutorials. I strongly encourage you to read all of the material in the following:
- Comprehensive single-cell transcriptional profiling of a multicellular organism (Packer et al)
- Git Basics
- Pro Git
- BASH basics
- Essential UNIX
- Sed and Awk
- Sed and Awk, pocket ref
- STAR Manual
- SAM format
- samtools
- BED format
- bedtools
- R Markdown: the definitive guide
- R for Data Science
- ggplot2: elegant graphics for data analysis
- Monocle: an analyis toolkit for single-cell RNA-seq
- Garnett: Automated cell type classification
- R packages
Specific, selected readings for the course will be listed in the course schedule below.
Helpful software
- Visual studio code - An outstanding code editor and integrated development environment
- Rstudio - An integrated development environment for R
Class schedule
Date | Topic | Reading | Assigments | |
---|---|---|---|---|
3/25 | Course overview, student setup, and version control html pdf | Git Basics | ||
3/27 | Intro to bioinformatics pipelines, automation html | Essential UNIX; BASH basics (sections 1-7) | ||
4/1 | Tools for working with tables html | Sed and Awk | ||
4/3 | NGS read alignment html | SAM format; bedtools | ||
4/8 | no class, Cole at NHGRI Training Meeting | Project 1 due | ||
4/10 | Bespoke tools for exploratory analysis html | Monocle documentation; Garnett documentation | ||
4/15 | Electronic lab notebooks with Markdown html; | R for Data Science (Chapter 27); R Markdown (chapter 3) | ||
4/17 | Making figures html | R for Data Science (Chapter 13) | ||
4/22 | Tools for working with tables, part II html; Relational databases html | R for Data Science (Chapters 10, 12, and 5 ); R for Data Science (Chapter 13) | ||
4/24 | R packages html | R packages (Wickham) | Project 2 due |