From Comaiwiki

Revision as of 09:55, 8 July 2013 by Meric (Talk | contribs)

This page contains the most recent version of our karyotyping tool Bin-by-sam.tgz,
the following describes the included script and information on proper usage.

Bin-by-Sam Documentation


Meric Lieberman, , Isabelle Henry, 2013
This work is the property of UC Davis Genome Center - Comai Lab

Use at your own risk.
We cannot provide support.
All information obtained/inferred with these scripts is without any implied warranty of fitness for any purpose or use whatsoever.


This script outputs a read coverage by bin across a reference sequence, using a directory of samtools aligned .sam files as input.
It can also output a measure of relative coverage compared to a control dataset. There can be two types of control data: either a control file
is indicated or the mean of all files in the directory is calculated and used as the control set. In both cases, the values for relative percentage
per bin were calculated by dividing the percentage of reads mapping to that bin for the sample at hand by the mean percentage of reads
mapping to that bin for the control set. Finally, all values are multiplied by the ploidy parameter (default 2) such that values for bins present
in X copies would oscillate around X.

This script also outputs a second small file containing the number of read processed from each sam file.

Usage: [...] denotes optional parameters, if not indicated, default parameters are used. -o output-bin-file.txt -s size-of-bins [-c control .sam file] [-u] [-m number of max snps, default is 5] [-b] [-r] [-p ploidy for relative percent calculation] [-C]

For help: -h


Run in a directory with the input .sam files. If you want to use one of the files as control for the relative coverage, specify the file with the -c option.



-o, output file name
-s, bin size (bps)


-c, use a control for relative percent coverage calculations, specify the file name here
-u, use only samtools flagged unique reads (XT:A:U)
-m, max snps from sam field 15 - default is 5
-b, inserts empty lines between reference sequences in the result table for easier JMP parsing (do not use if the reference sequence does not contain a few major chromosomes or contigs)
-r, “remove file”, a file in sam header format of reference sequences to ignore, there is an included example file Remove-Sample.txt in the archive
-p, ploidy, default is 2 (diploid), this is used as the multiplier in the relative coverage calculation
-C, coverage only mode, this only outputs the read counts for each library, no relative coverage columns. This option cannot be used when a control library is specified


One file with a line per bin of each reference sequence and a column for each input .sam library, as well as the relative coverage per input .sam library.

Personal tools