From Comaiwiki

(Replaced content with "__notoc__ '''This karyotyping tool can now be found on github here: [https://github.com/Comai-Lab/bin-by-sam bin-by-sam] <br> <br>'''Detailed run directions and examples [...")
 
Line 1: Line 1:
 
__notoc__
 
__notoc__
'''This page contains the most recent version of our karyotyping tool [http://comailab.genomecenter.ucdavis.edu/images/6/66/Bin-by-Sam-tool.tar.gz Bin-by-Sam-tool.tgz], <br>the following briefly describes the included script and information on proper usage.'''<br>
+
'''This karyotyping tool can now be found on github here:
 +
[https://github.com/Comai-Lab/bin-by-sam bin-by-sam] <br>
 
<br>'''Detailed run directions and examples [http://comailab.genomecenter.ucdavis.edu/images/3/30/README-bin-by-sam.pdf can be found here]'''<br>
 
<br>'''Detailed run directions and examples [http://comailab.genomecenter.ucdavis.edu/images/3/30/README-bin-by-sam.pdf can be found here]'''<br>
 
<h2> Bin-by-Sam Documentation </h2>
 
<h3>Disclaimer</h3>
 
Meric Lieberman, Isabelle Henry, 2013<br>
 
This work is the property of UC Davis Genome Center - Comai Lab
 
 
Use at your own risk.<br>
 
We cannot provide support.<br>
 
All information obtained/inferred with these scripts is without any
 
implied warranty of fitness for any purpose or use whatsoever.
 
 
<h3>Summary</h3>
 
This script outputs a read coverage by bin across a reference sequence, using a directory of samtools aligned .sam files as input. <br> It can also output a measure of relative coverage compared to a control dataset. There can be two types of control data: either a control file <br> is indicated or the mean of all files in the directory is calculated and used as the control set. In both cases, the values for relative percentage <br>per bin were calculated by dividing the percentage of reads mapping to that bin for the sample at hand by the mean percentage of reads <br>mapping to that bin for the control set. Finally, all values are multiplied by the ploidy parameter (default 2) such that values for bins present<br> in X copies would oscillate around X.
 
 
This script also outputs a second small file containing the number of read processed from each sam file.
 
 
<h3>Usage: [...] denotes optional parameters, if not indicated, default parameters are used.</h3>
 
<h4>bin-by-sam.py -o output-bin-file.txt -s size-of-bins [-c control .sam file] [-u] [-m number of max snps, default is 5] [-b] [-r] [-p ploidy for relative percent calculation] [-C] </h4>
 
 
<h4>For help: bin-by-sam.py -h</h4>
 
 
<h3>Input:</h3>
 
Run in a directory with the input .sam files. If you want to use one of the files as control for the relative coverage, specify the file with the -c option.
 
 
<h3>Parameters</h3>
 
 
<h4>Required:</h4>
 
<b>-o</b>, output file name<br>
 
<b>-s</b>, bin size (bps)<br>
 
 
<h4>Optional</h4>
 
'''-c''', use a control for relative percent coverage calculations, specify the file name here<br>
 
'''-u''', use only samtools flagged unique reads (XT:A:U)<br>
 
'''-m''', max snps from sam field 15  - default is 5<br>
 
'''-b''', inserts empty lines between reference sequences in the result table for easier JMP parsing (do not use if the reference sequence does not contain a few major chromosomes or contigs)<br>
 
'''-r''', “remove file”, a file in sam header format of reference sequences to ignore, ''there is an included example file Remove-Sample.txt in the archive''<br>
 
'''-p''', ploidy, default is 2 (diploid), this is used as the multiplier in the relative coverage calculation<br>
 
'''-C''', coverage only mode, this only outputs the read counts for each library, no relative coverage columns. This option cannot be used when a control library is specified<br>
 
 
<h3>Output</h3>
 
One file with a line per bin of each reference sequence and a column for each input .sam library, as well as the relative coverage per input .sam library.
 

Latest revision as of 15:47, 21 April 2020

This karyotyping tool can now be found on github here: bin-by-sam

Detailed run directions and examples can be found here

Personal tools