Course description
Programming skills and software tools for building automated bioinformatics pipelines and computational biology analyses. Emphasis on UNIX tools and R libraries for distilling raw sequencing data into interpretable results. This course is aimed at students familiar with UNIX and with some programming experience in python, R, or C/C++.
Instructional staff
Please click on the links above for email addresses.
Meeting times and locations
Classes:
Monday and Wednesday, 9:00-10:20 am, Foege S110 (http://www.washington.edu/home/maps/southcentral.html?gnom).
Class Slack:
We will use Slack during class and outside of class to communicate, share code snippets, ask and answer questions. The class slack is here:
You will receive an invitation to join prior to the first class.
Office hours:
- No official office hours. Post questions on Slack as needed.
Prerequisites
- Substantial background in molecular and cellular biology, genetics, biochemistry, or related disciplines.
- Familiarity with UNIX.
- Some programming experience in python, R, or C/C++.
- Students are encouraged to have taken GENOME559 and/or GENOME560.
Course requirements
- The course involves hands-on programming during class time. We will use the GS compute cluster, so make sure you can log into it from your computer remotely.
- All programming projects are due by the start of class on the date listed.
Setting up your computer
- We will make extensive use of GitHub and GitHub Co-pilot in the class. If you are a student, you can get free Co-pilot access. Please make sure you have a GitHub account and Co-pilot access before the first class.
- Email me your github ID prior to the first class.
- Install Visual Studio Code.
- Configure Visual Studio Code to work with GitHub Co-pilot.
- We will be using both R and python at various points in the course.
- You are responsible for being able to maintain your R and Python environments so that you can do the in-class exercises.
- This guide may be helpful for setting up R for use with Visual Studio Code.
- This guide may be helpful for using python with Visual Studio Code.
Examinations
There will be no examinations.
Course grade
Grades will come 50% from the programming projects and 50% from class participation.
Course materials
We will read from several online resources and tutorials. I strongly encourage you to read all of the material in the following:
- Comprehensive single-cell transcriptional profiling of a multicellular organism (Packer et al)
- Git Basics
- Pro Git
- BASH basics
- Essential UNIX
- Sed and Awk
- Sed and Awk, pocket ref
- STAR Manual
- SAM format
- samtools
- BED format
- bedtools
- R Markdown: the definitive guide
- R for Data Science
- ggplot2: elegant graphics for data analysis
- Monocle: an analyis toolkit for single-cell RNA-seq
- Garnett: Automated cell type classification
- R packages
Specific, selected readings for the course will be listed in the course schedule below.
Helpful software
- Visual studio code - An outstanding code editor and integrated development environment
- Rstudio - An integrated development environment for R
Class schedule
Date | Topic | Reading |
---|---|---|
3/31 | Course overview, student setup, and version control pdf | Git Basics; |
4/2 | Intro to bioinformatics pipelines, automation pdf | Cao et al; Packer et al |
4/9 | Read alignment pdf | SAM format; bedtools; STAR; STARsolo |
4/14 | Workflow automation pdf | Essential UNIX; BASH basics (sections 1-7); Snakemake |
4/21 | Exploratory data analysis pdf | R for Data Science (Chapter 13 especially) |
4/23 | Electronic lab notebooks with R Markdown pdf | R for Data Science (Chapter 27); R Markdown (chapter 3) |