From Comaiwiki

m (Using the Terminal (4 Videos))
 
(25 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[[image: Slide2.jpg.JPG‎|400px|thumb|left|A Sample Screenshot]]
+
[[image: barcoded_adapter.png‎|400px|thumb|The Y adapter for Illumina sequencing with a custom barcode (violet).]]
  
 
__NOTOC__  
 
__NOTOC__  
 +
  
 
=Instructional Videos For Next Generation Sequence Analysis=
 
=Instructional Videos For Next Generation Sequence Analysis=
Line 8: Line 9:
 
Around 2007, the advent of next generation sequencing opened a new approach to biology.  These sequencing techniques, such as [http://www.454.com/ 454] and [http://www.illumina.com/ Illumina], are able to produce vast amounts of sequence for a relatively low cost. The sheer volume of DNA data produced by this method can be intimidating.
 
Around 2007, the advent of next generation sequencing opened a new approach to biology.  These sequencing techniques, such as [http://www.454.com/ 454] and [http://www.illumina.com/ Illumina], are able to produce vast amounts of sequence for a relatively low cost. The sheer volume of DNA data produced by this method can be intimidating.
  
However, this does not need to be the case. A little bit of knowledge of the relevant computer programs and techniques can lead us a long way. This page links to a number of our tutorial videos, some ready, some under production, on simple methods for next-generation sequence analysis.  
+
However, this does not need to be the case. A little bit of knowledge of the relevant computer programs and techniques can carry us a long way. This page links to a number of our tutorial videos on simple methods for next-generation sequence analysis. If you are interested in practical tools for your sequencing program visit our [http://comailab.genomecenter.ucdavis.edu/index.php/Data_methods methods page].  
  
These videos are aimed primarily at biologists who previously lacked the bioinformatics knowledge to analyze and manipulate these large data sets.   
+
These videos are aimed primarily at biologists who previously lacked the bioinformatics knowledge to analyze and manipulate these large data sets.  Direct funding for this these videos was provided by the UC Davis Genome Center. The research experience on which these videos are based is funded by the National Science Foundation Plant Genome grants DBI-0733857 (Functional Genomics of Polyploids) and DBI-0822383 ("TILLING by Sequencing"),  and National Institutes of Health R01 GM076103-01A1 (Dosage dependent regulation in hybridization) to LC.
  
 
==Using the Terminal (4 Videos)==
 
==Using the Terminal (4 Videos)==
Line 18: Line 19:
 
''NOTE FOR PC USERS:  All Macs are Unix-based machines and the Terminal is simply a command line prompt that allows the user to work in Unix. There is no default equivalent found within Windows.  However, it is possible to install a Unix environment on a Windows machine.  In this case, these videos will still be of assistance, but some minor details (such as the display of outputs) will differ from what you will see here.  For instructions on installing a Unix environment in Windows see [http://www.pcreview.co.uk/articles/Windows/Run_Linux_in_Windows/ this article.]''
 
''NOTE FOR PC USERS:  All Macs are Unix-based machines and the Terminal is simply a command line prompt that allows the user to work in Unix. There is no default equivalent found within Windows.  However, it is possible to install a Unix environment on a Windows machine.  In this case, these videos will still be of assistance, but some minor details (such as the display of outputs) will differ from what you will see here.  For instructions on installing a Unix environment in Windows see [http://www.pcreview.co.uk/articles/Windows/Run_Linux_in_Windows/ this article.]''
  
*'''[http://www.youtube.com/user/Arturgreensward?feature=mhum#p/a/u/0/zRZT4nQP3sE Video 1 - Using the Terminal: "grep" and "less"]'''
+
*'''[https://www.youtube.com/watch?v=zRZT4nQP3sE Video 1 - Using the Terminal: "grep" and "less"]'''
  
 
This video will introduce the Terminal program and discuss two basic commands.  The first is "less" which is used to display data, the second is "grep" which is used to search or count within the data.
 
This video will introduce the Terminal program and discuss two basic commands.  The first is "less" which is used to display data, the second is "grep" which is used to search or count within the data.
  
*'''[http://www.youtube.com/user/Arturgreensward?feature=mhum#p/a/u/1/E3EQvPzVLVQ Video 2 - Using the Terminal: A Sample Experiment and Data Set]'''
+
*'''[https://www.youtube.com/watch?v=E3EQvPzVLVQ Video 2 - Using the Terminal: A Sample Experiment and Data Set]'''
  
 
This video briefly discusses the biological background of the experimental data set used in the next two videos.
 
This video briefly discusses the biological background of the experimental data set used in the next two videos.
  
*'''[http://www.youtube.com/user/Arturgreensward?feature=mhum#p/u/4/EM5P2AzHlwQ Video 3 - Using the Terminal: Viewing and Parsing Data]'''
+
*'''[https://www.youtube.com/watch?v=EM5P2AzHlwQ Video 3 - Using the Terminal: Viewing and Parsing Data]'''
  
 
This video takes a look at the data from an actual experiment where an Illumina sequencing run was aligned to a cDNA database.  Here the user will learn how to navigate through the output and how to parse the data file using the "grep" command.
 
This video takes a look at the data from an actual experiment where an Illumina sequencing run was aligned to a cDNA database.  Here the user will learn how to navigate through the output and how to parse the data file using the "grep" command.
  
*'''[http://www.youtube.com/user/Arturgreensward?feature=mhum#p/u/3/Ua2MCDy45dg Video 4 - Using the Terminal: A Sample Data Analysis]'''
+
*'''[https://www.youtube.com/watch?v=Ua2MCDy45dg Video 4 - Using the Terminal: A Sample Data Analysis]'''
  
 
This video shows how to perform a basic data analysis on the same dataset as above.  Here we undertake an "in-silico Northern" analysis and look at gene expression differences for a target gene as well as a housekeeping gene.
 
This video shows how to perform a basic data analysis on the same dataset as above.  Here we undertake an "in-silico Northern" analysis and look at gene expression differences for a target gene as well as a housekeeping gene.
  
 +
==Analyzing Public Datasets with Free Tools (7 Videos)==
  
==Locating and Analyzing Public Datasets with Free Tools (0 Videos)==
+
In this new series, we'll learn how to access and analyze public datasets resulting from next-generation sequencing techniques such as Illumina and 454.  We use [http://g2.bx.psu.edu/ Galaxy] for many of these analyses.
  
''This series of videos is currently under development.''
+
*'''[http://www.youtube.com/watch?v=3aVNAIIJ8sg Video 1 - Analyzing Public Datasets: Introduction to Galaxy]'''
  
The goal of this video series is for a researcher to be able to perform useful and complex bioinformatic analyses of next-generation sequencing data without ever leaving the desk, purchasing any software, or hiring a bioinformatician.
+
This video provides a brief introduction to the Galaxy website
  
We will start by locating publicly available datasets generated by next-generation sequencing techniques.  We will learn how to perform an alignment of a sequencing dataset to a cDNA or genomic database, and then how to perform a complete analysis of the resulting alignment.
+
*'''[http://www.youtube.com/watch?v=dTRZjXuQnYU Video 2 - Analyzing Public Datasets: Finding the Data]'''
 +
 
 +
This video shows how to find a sample dataset, upload it to Galaxy, and process it for alignment.
 +
 
 +
*'''[http://www.youtube.com/watch?v=HdxRzDV9Tew Video 3 - Analyzing Public Datasets: Performing an Alignment]'''
 +
 
 +
This video shows how to upload a cDNA database to Galaxy and how to perform a BWA alignment with our sample rice sequence.
 +
 
 +
*'''[http://www.youtube.com/watch?v=h2LZlLubUOg Video 4 - Analyzing Public Datasets: Viewing the Alignment]'''
 +
 
 +
In this video we briefly examine our BWA alignment in Galaxy and convert the file to a format required by the genome browser "IGV"
 +
 
 +
*'''[http://www.youtube.com/watch?v=5kkPnCV06dE Video 5 - Analyzing Public Datasets: Introduction to IGV (cDNA)]'''
 +
 
 +
In this video we use the files created in video #4 as an introduction to using the Integrative Genomics Viewer (IGV).  We examine read coverage and visualizing a SNP.
 +
 
 +
*'''[http://www.youtube.com/watch?v=MbW_f4eZNKM Video 6 - Analyzing Public Datasets: From Start to Finish... A TopHat Alignment]'''
 +
 
 +
Here we go through an entire workflow on Galaxy, starting with two RNA-seq datasets.  These datasets are aligned to genomic DNA using TopHat and then further processed and analyzed using Cufflinks/Cuffcompare/Cuffdiff, resulting in a comparison of transcript expression between the two datasets.
 +
 
 +
*'''[http://www.youtube.com/watch?v=YeoHJFHnCrw Video 7 - Analyzing Public Datasets: An Introduction to IGV (genomic)]'''
 +
 
 +
This video uses IGV to examine the RNA-seq alignments to genomic DNA that were performed in video 6.  We also take a look at alternative splicing.
 +
 
 +
==Illumina Multiplexing==
 +
 
 +
(NOTE:  All the videos in this series assume familiarity with the principles of Illumina sequencing technology)
 +
 
 +
*'''[http://www.youtube.com/watch?v=hgSoJiOoSQQ Illumina Multiplexing 1 - A Simple Overview]'''
 +
 
 +
This video provides a brief and simple introduction to multiplexing ("barcoding") for Illumina sequencing.
 +
 
 +
*'''[http://www.youtube.com/watch?v=W5EftJL5XpQ Illumina Multiplexing 2 - Barcode Design]'''
 +
 
 +
Here we go through the principles and pitfalls of barcode design and link to some sample barcodes and our barcode generator.
 +
 
 +
*'''[http://www.youtube.com/watch?v=kOG1MnPi-K4 Illumina Multiplexing 3 - Barcode Generator]'''
 +
 
 +
This video is a quick tutorial on how to download and use our barcode generator.
 +
 
 +
*'''[http://www.youtube.com/watch?v=kOG1MnPi-K4 Illumina Multiplexing 4 - Editing the Barcode Generator]'''
 +
 
 +
This video demonstrates how to use TextEdit to edit the barcode generator if needed.
 +
 
 +
==Microbial Genome Assembly==
 +
 
 +
In this series of videos we demonstrate the use of a powerful microbial genome assembly pipeline and discusses some basic analyses
 +
 
 +
Here we show how to download, install, and run the a5 Assembly Pipeline.
 +
 
 +
*'''[http://www.youtube.com/watch?v=ePGUIj9Qbvc Introduction to the a5 Assembly Pipeline]'''
 +
 
 +
==Python Scripts: Overamplification==
 +
 
 +
In this trilogy of videos we introduce the potential problem of oveamplification in next-gen sequencing data.  We then have a detailed walkthrough of a simple python script that detects overamplification in a FASTQ file, following by a demonstration of the script in action.
 +
 
 +
(In Development)
  
 
==Future Projects==
 
==Future Projects==
  
Multiplexing for Dummies
 
 
Basic Biological Concepts for Bioinformaticians
 
Basic Biological Concepts for Bioinformaticians
 +
 +
Using Python scripts in the Terminal (we'll provide the scripts!)
 +
 +
==Funding==
 +
This page and the connected tools are funded by:
 +
*NSF Plant Genome grant DBI-0733857 (Functional Genomics of Polyploids)
 +
*NSF Plant Genome grant DBI-0924025. Heterosis Challenge Grant (HCG):The Regulatory Disruption Hypothesis for Heterosis
 +
*NSF Plant Genome grant DBI-0822383 ("TILLING by Sequencing")
 +
*National Institutes of Health R01 GM076103-01A1 (Dosage dependent regulation in hybridization) to LC.

Latest revision as of 10:51, 19 March 2014

The Y adapter for Illumina sequencing with a custom barcode (violet).



[edit] Instructional Videos For Next Generation Sequence Analysis

A project by Dr. David Coil, the Comai Lab, and the UCD Genome Center

Around 2007, the advent of next generation sequencing opened a new approach to biology. These sequencing techniques, such as 454 and Illumina, are able to produce vast amounts of sequence for a relatively low cost. The sheer volume of DNA data produced by this method can be intimidating.

However, this does not need to be the case. A little bit of knowledge of the relevant computer programs and techniques can carry us a long way. This page links to a number of our tutorial videos on simple methods for next-generation sequence analysis. If you are interested in practical tools for your sequencing program visit our methods page.

These videos are aimed primarily at biologists who previously lacked the bioinformatics knowledge to analyze and manipulate these large data sets. Direct funding for this these videos was provided by the UC Davis Genome Center. The research experience on which these videos are based is funded by the National Science Foundation Plant Genome grants DBI-0733857 (Functional Genomics of Polyploids) and DBI-0822383 ("TILLING by Sequencing"), and National Institutes of Health R01 GM076103-01A1 (Dosage dependent regulation in hybridization) to LC.

[edit] Using the Terminal (4 Videos)

This series of videos looks at using the Terminal program (which comes on every Macintosh computer) to view and parse large sequence datasets.

NOTE FOR PC USERS: All Macs are Unix-based machines and the Terminal is simply a command line prompt that allows the user to work in Unix. There is no default equivalent found within Windows. However, it is possible to install a Unix environment on a Windows machine. In this case, these videos will still be of assistance, but some minor details (such as the display of outputs) will differ from what you will see here. For instructions on installing a Unix environment in Windows see this article.

This video will introduce the Terminal program and discuss two basic commands. The first is "less" which is used to display data, the second is "grep" which is used to search or count within the data.

This video briefly discusses the biological background of the experimental data set used in the next two videos.

This video takes a look at the data from an actual experiment where an Illumina sequencing run was aligned to a cDNA database. Here the user will learn how to navigate through the output and how to parse the data file using the "grep" command.

This video shows how to perform a basic data analysis on the same dataset as above. Here we undertake an "in-silico Northern" analysis and look at gene expression differences for a target gene as well as a housekeeping gene.

[edit] Analyzing Public Datasets with Free Tools (7 Videos)

In this new series, we'll learn how to access and analyze public datasets resulting from next-generation sequencing techniques such as Illumina and 454. We use Galaxy for many of these analyses.

This video provides a brief introduction to the Galaxy website

This video shows how to find a sample dataset, upload it to Galaxy, and process it for alignment.

This video shows how to upload a cDNA database to Galaxy and how to perform a BWA alignment with our sample rice sequence.

In this video we briefly examine our BWA alignment in Galaxy and convert the file to a format required by the genome browser "IGV"

In this video we use the files created in video #4 as an introduction to using the Integrative Genomics Viewer (IGV). We examine read coverage and visualizing a SNP.

Here we go through an entire workflow on Galaxy, starting with two RNA-seq datasets. These datasets are aligned to genomic DNA using TopHat and then further processed and analyzed using Cufflinks/Cuffcompare/Cuffdiff, resulting in a comparison of transcript expression between the two datasets.

This video uses IGV to examine the RNA-seq alignments to genomic DNA that were performed in video 6. We also take a look at alternative splicing.

[edit] Illumina Multiplexing

(NOTE: All the videos in this series assume familiarity with the principles of Illumina sequencing technology)

This video provides a brief and simple introduction to multiplexing ("barcoding") for Illumina sequencing.

Here we go through the principles and pitfalls of barcode design and link to some sample barcodes and our barcode generator.

This video is a quick tutorial on how to download and use our barcode generator.

This video demonstrates how to use TextEdit to edit the barcode generator if needed.

[edit] Microbial Genome Assembly

In this series of videos we demonstrate the use of a powerful microbial genome assembly pipeline and discusses some basic analyses

Here we show how to download, install, and run the a5 Assembly Pipeline.

[edit] Python Scripts: Overamplification

In this trilogy of videos we introduce the potential problem of oveamplification in next-gen sequencing data. We then have a detailed walkthrough of a simple python script that detects overamplification in a FASTQ file, following by a demonstration of the script in action.

(In Development)

[edit] Future Projects

Basic Biological Concepts for Bioinformaticians

Using Python scripts in the Terminal (we'll provide the scripts!)

[edit] Funding

This page and the connected tools are funded by:

  • NSF Plant Genome grant DBI-0733857 (Functional Genomics of Polyploids)
  • NSF Plant Genome grant DBI-0924025. Heterosis Challenge Grant (HCG):The Regulatory Disruption Hypothesis for Heterosis
  • NSF Plant Genome grant DBI-0822383 ("TILLING by Sequencing")
  • National Institutes of Health R01 GM076103-01A1 (Dosage dependent regulation in hybridization) to LC.
Personal tools