From Comaiwiki
This page contains the most recent version of our barcoded data preparation tools,
the following describes the included scripts and information on proper usage.
Barcoded Data Preparation Tools Documentation
Disclaimer
Meric Lieberman, 2011
This work is the property of UC Davis Genome Center - Comai Lab
Use at your own risk.
We cannot provide support.
All information obtained/inferred with these scripts is without any
implied warranty of fitness for any purpose or use whatsoever.
Summary
The scripts included are for use in preparation of raw illumina reads for further analysis.
Traditionally, we do all prep work with a single script called "Allprep", that does barcode
check and match,
'N' filtering, primary and secondary adapter contamination, quality
conversion, quality trimming,
length trimming, and library separation.
However, if parts of the preparation process needs to be performed independently, we also provide smaller scripts to do the process modularly.
Please note that:
A) All of the scripts have a description of what they do, input parameters, and running directions at the beginning of each program.
B) All scripts use command line parameters as input and all scripts can be run with ./"program_name" provided Python 2.4+ is installed on your system.
C) These scripts can be used with paired end or single ended data, and some can be run on data without barcodes.
Installation
On a unix based system, download the most current package barcode-tools-2.6.tgz, then use the command "tar -xzvf File-barcode-tools-2.4".
A new folder called barcoded_data_toolbox will appear, with all of the scripts and the README.txt inside.
The README has all of the information here, as well as run parameters for all scripts.
Contents
This .tgz includes:
allPrep-7s.py - Single script to do all processing as detailed above. barcodeSplitter.py - This program takes a barcode file and splits the lane sequence.txt files into specified library.txt (lib#.txt) files. Note that Illumina 1.5-1.6 .txt format does not have Sanger qualities and needs to be converted for use in tools that require Sanger .fq (fastq) format (see below). illuminaToSanger.py - This program converts an illumina 1.5+ .txt file to a Sanger .fq/.fastq format (phred+33 scaling) trimFastqQuality.py - This program takes a Sanger .fq/.fastq file and trims by a quality cutoff, throwing away reads shorter then a specified length after trimming. adapterEffectRemover.py - This program takes a .fq/.fastq file an looks for adapter contamination (based on the strandard Illumina adaptor sequences). If found, adapter sequences are removed. If the resulting read is shorter than specified minLength the read is rejected. read_N_remover.py - This program takes a .fq/.fastq file and goes through the reads, rejecting any with 'N' instances in the sequence (one N is sufficient for rejecting of the entire read. This is based on the observation that N within reads are usually diagnostic of poor sequence quality). countAllBarcodes-RAW.py - This program takes a raw reads file and goes through and counts instance of every barcode seen in the file then outputs the counts to "counted-"inputFileName. interleaveSwitcher.py - This script will interleave two files or uninterleave a file. This is only applicable to paired-end reads. "Interleaved" means that the two ends of a read are placed together in the final file. View parameters in script for running directions. The i flag is to interleave, and u flag for uninterleave.