From Comaiwiki

m
 
(13 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
__notoc__
 
__notoc__
'''This page contains the most recent version of our [http://comailab.genomecenter.ucdavis.edu/images/7/77/Barcode-tools-2.4.tgz barcoded data preparation tools], <br>the following describes the included scripts and information on proper usage.'''<br>
+
'''This page contains the most recent version of our [https://github.com/mericl/allprep barcoded data preparation tools], <br>the following describes the included scripts and information on proper usage.'''<br>
 
<h3> Barcoded Data Preparation Tools Documentation </h3>
 
<h3> Barcoded Data Preparation Tools Documentation </h3>
 
<h4>Disclaimer</h4>
 
<h4>Disclaimer</h4>
Meric Lieberman, 2011<br>
+
Meric Lieberman, 2016<br>
This work is the property of UC Davis Genome Center - Comai Lab
+
This work is the property of UC Davis Genome Center - Comai Lab <br>
 +
This is shared under a Creative Commons BY-NC-ND 4.0 license<br>
 +
https://creativecommons.org/licenses/by-nc-nd/4.0/<br><br>
  
 
Use at your own risk.<br>  
 
Use at your own risk.<br>  
Line 13: Line 15:
 
<h4>Summary</h4>
 
<h4>Summary</h4>
 
The scripts included are for use in preparation of raw illumina reads for further analysis.<br>  
 
The scripts included are for use in preparation of raw illumina reads for further analysis.<br>  
<br>Traditionally, we do all prep work with a single script called "Allprep", that does barcode  
+
<br>Traditionally, we do all prep work with a single script called "Allprep", that does barcode/index
 
check and match, <br>'N' filtering, primary and secondary adapter contamination, quality  
 
check and match, <br>'N' filtering, primary and secondary adapter contamination, quality  
 
conversion, quality trimming, <br>length trimming, and library separation.
 
conversion, quality trimming, <br>length trimming, and library separation.
  
 
However, if parts of the preparation process needs to be performed independently, we also  
 
However, if parts of the preparation process needs to be performed independently, we also  
provide smaller scripts to do the process modularly.
+
provide smaller scripts to do the process modularly.
 +
 
 +
Allprep now is capable of one performing barcode/index splitting without any trimming or read filtering, please see README for run details.
  
 
<i>Please note that:</i>
 
<i>Please note that:</i>
Line 24: Line 28:
 
A) All of the scripts have a description of what they do, input parameters, and running directions at the beginning of each program.  
 
A) All of the scripts have a description of what they do, input parameters, and running directions at the beginning of each program.  
  
B) All scripts use command line parameters as input and all scripts can be run with ./"program_name" provided Python 2.4+ is installed on your system.  
+
B) All scripts use command line parameters as input and all scripts can be run with ./"program_name" provided Python 2.6+ is installed on your system.  
  
 
C) These scripts can be used with paired end or single ended data, and some can be run on data without barcodes.
 
C) These scripts can be used with paired end or single ended data, and some can be run on data without barcodes.
  
 
<h4>Installation</h4>
 
<h4>Installation</h4>
On a unix based system, download the most current package [http://comailab.genomecenter.ucdavis.edu/images/7/77/Barcode-tools-2.4.tgz barcode-tools-2.4.tgz], then use the command <i>"tar -xzvf File-barcode-tools-2.4"</i>. <br>A new folder called barcoded_data_toolbox will appear, with all of the scripts and the README.txt inside.
+
On a unix based system, get the most current package [https://github.com/mericl/allprep  from github], then use the command <i>"unzip allprep-master.zip"</i> if not cloning it. <br>A new folder called barcoded_data_toolbox will appear, with all of the scripts and the README.txt inside.
 
<br> The README has all of the information here, as well as run parameters for all scripts.
 
<br> The README has all of the information here, as well as run parameters for all scripts.
  
Line 36: Line 40:
  
 
<pre style="font-family: Verdana, Arial, sans serif;">
 
<pre style="font-family: Verdana, Arial, sans serif;">
allPrep-6.py - Single script to do all processing as detailed above.
+
allPrep-13.py - Single script to do all processing as detailed above.
 
+
barcodeSplitter.py -  This program takes a barcode file and splits the lane sequence.txt
+
files into specified library.txt (lib#.txt) files. Note that Illumina 1.5-1.6 .txt
+
format does not have Sanger qualities and needs to be converted for use in tools that
+
require Sanger .fq (fastq) format (see below).
+
 
+
illuminaToSanger.py - This program converts an illumina 1.5+ .txt file to a Sanger
+
.fq/.fastq format (phred+33 scaling)
+
 
+
trimFastqQuality.py - This program takes a Sanger .fq/.fastq file and trims by a quality
+
cutoff, throwing away reads shorter then a specified length after trimming.
+
 
+
adapterEffectRemover.py - This program takes a .fq/.fastq file an looks for adapter
+
contamination (based on the strandard Illumina adaptor sequences). If found,
+
adapter sequences are removed. If the resulting read is shorter
+
than specified minLength the read is rejected.
+
 
+
read_N_remover.py - This program takes a .fq/.fastq file and goes through the reads,
+
rejecting any with 'N' instances in the sequence (one N is sufficient for
+
rejecting of the entire read. This is based on the observation that N within
+
reads are usually diagnostic of poor sequence quality).
+
 
+
countAllBarcodes-RAW.py - This program takes a raw reads file and goes through and counts
+
instance of every barcode seen in the file then outputs the counts to
+
"counted-"inputFileName.
+
  
 
interleaveSwitcher.py - This script will interleave two files or uninterleave a file.  
 
interleaveSwitcher.py - This script will interleave two files or uninterleave a file.  
 
This is only applicable to paired-end reads. "Interleaved" means that the two  
 
This is only applicable to paired-end reads. "Interleaved" means that the two  
 
ends of a read are placed together in the final file. View parameters in
 
ends of a read are placed together in the final file. View parameters in
script for running directions. The i flag is to interleave, and u flag for uninterleave.
+
script for running directions.
 
</pre>
 
</pre>

Latest revision as of 15:51, 29 January 2016

This page contains the most recent version of our barcoded data preparation tools,
the following describes the included scripts and information on proper usage.

Barcoded Data Preparation Tools Documentation

Disclaimer

Meric Lieberman, 2016
This work is the property of UC Davis Genome Center - Comai Lab
This is shared under a Creative Commons BY-NC-ND 4.0 license
https://creativecommons.org/licenses/by-nc-nd/4.0/

Use at your own risk.
We cannot provide support.
All information obtained/inferred with these scripts is without any implied warranty of fitness for any purpose or use whatsoever.

Summary

The scripts included are for use in preparation of raw illumina reads for further analysis.

Traditionally, we do all prep work with a single script called "Allprep", that does barcode/index check and match,
'N' filtering, primary and secondary adapter contamination, quality conversion, quality trimming,
length trimming, and library separation.

However, if parts of the preparation process needs to be performed independently, we also provide smaller scripts to do the process modularly.

Allprep now is capable of one performing barcode/index splitting without any trimming or read filtering, please see README for run details.

Please note that:

A) All of the scripts have a description of what they do, input parameters, and running directions at the beginning of each program.

B) All scripts use command line parameters as input and all scripts can be run with ./"program_name" provided Python 2.6+ is installed on your system.

C) These scripts can be used with paired end or single ended data, and some can be run on data without barcodes.

Installation

On a unix based system, get the most current package from github, then use the command "unzip allprep-master.zip" if not cloning it.
A new folder called barcoded_data_toolbox will appear, with all of the scripts and the README.txt inside.
The README has all of the information here, as well as run parameters for all scripts.

Contents

This .tgz includes:

allPrep-13.py - Single script to do all processing as detailed above.

interleaveSwitcher.py - This script will interleave two files or uninterleave a file. 
	This is only applicable to paired-end reads. "Interleaved" means that the two 
	ends of a read are placed together in the final file. View parameters in
	script for running directions.
Personal tools