From Comaiwiki

Revision as of 13:31, 16 September 2011 by Meric (Talk | contribs)

This page contains the most recent version of our barcoded data preparation tools,
the following describes the included scripts and information on proper usage.

Barcoded Data Preparation Tools Documentation


Meric Lieberman, 2011
This work is the property of UC Davis Genome Center - Comai Lab

Use at your own risk.
We cannot provide support.
All information obtained/inferred with these scripts is without any implied warranty of fitness for any purpose or use whatsoever.


The scripts included are for use in preparation of raw illumina reads for further analysis.

Traditionally, we do all prep work with a single script called "Allprep", that does barcode check and match,
'N' filtering, primary and secondary adapter contamination, quality conversion, quality trimming,
length trimming, and library separation.

However, if parts of the preparation process needs to be performed independently, we also provide smaller scripts to do the process modularly.

Please note that:

A) All of the scripts have a description of what they do, input parameters, and running directions at the beginning of each program.

B) All scripts use command line parameters as input and all scripts can be run with ./"program_name" provided Python 2.4+ is installed on your system.

C) These scripts can be used with paired end or single ended data, and some can be run on data without barcodes.


On a unix based system, download the most current package barcode-tools-2.4.tgz, then use the command "tar -xzvf File-barcode-tools-2.4".
A new folder called barcoded_data_toolbox will appear, with all of the scripts and the README.txt inside.
The README has all of the information here, as well as run parameters for all scripts.


This .tgz includes: - Single script to do all processing as detailed above. -  This program takes a barcode file and splits the lane sequence.txt 
	files into specified library.txt (lib#.txt) files. Note that Illumina 1.5-1.6 .txt 
	format does not have Sanger qualities and needs to be converted for use in tools that 
	require Sanger .fq (fastq) format (see below). - This program converts an illumina 1.5+ .txt file to a Sanger 
	.fq/.fastq format (phred+33 scaling) - This program takes a Sanger .fq/.fastq file and trims by a quality 
	cutoff, throwing away reads shorter then a specified length after trimming. - This program takes a .fq/.fastq file an looks for adapter 
	contamination (based on the strandard Illumina adaptor sequences). If found, 
	adapter sequences are removed. If the resulting read is shorter 
	than specified minLength the read is rejected. - This program takes a .fq/.fastq file and goes through the reads, 
	rejecting any with 'N' instances in the sequence (one N is sufficient for 
	rejecting of the entire read. This is based on the observation that N within 
	reads are usually diagnostic of poor sequence quality). - This program takes a raw reads file and goes through and counts 
	instance of every barcode seen in the file then outputs the counts to 
	"counted-"inputFileName. - This script will interleave two files or uninterleave a file. 
	This is only applicable to paired-end reads. "Interleaved" means that the two 
	ends of a read are placed together in the final file. View parameters in
	script for running directions. The i flag is to interleave, and u flag for uninterleave.
Personal tools