From Comaiwiki

Revision as of 19:06, 30 April 2011 by Root (Talk | contribs)


Learning Python

This page is an invitation to learn Python and apply it to bioinformatics. It is designed for people with no programming experience who are interested in the possibility of learning how to program. It uses my limited experience in this area to demonstrate that even if you never were exposed to any programming it is possible to learn enough Python to write programs that analyze sequence data and produce results. In addition, programming is fun and it beats Sudoku and crossword puzzles as a constructive brain teaser and pastime. --Luca Comai

What is Python?

Python is a modern programming language that is easy to code and use. It resembles another very common language called Perl. Python code is easy to interpret because it uses indentation to separate blocks of code and to convey their hierarchy. Python is very powerful and is used by major private and public institutions such as Google and NASA.

How I learned

I first tried Perl, another programming language, and became frustrated by my inability to understand the syntax. Now that I know a little bit about programming, I am not clear why this is the case because I can understand (kind of) Perl and it is really not difficult. This just goes to show that learning programming may seem difficult at first. When that happened, I dropped the effort. A few months later, I decided to try again. Victor Missirian, a bioinformatician who works with me, showed me the Python tutorial written by the Python inventor, Guido von Rossum. It was very simple and so I decided to try again. I downloaded the Python package from, bought a couple of books and never looked back. I started writing a program that would count and report all the restriction fragments in a genome. Having a specific objective helped me focusing and motivating me. Since then I have written dozens of little programs to do all kinds of stuff, such as parsing Illumina sequencing files, grading my class, performing in silico comparative genomic hybridization and so on. Now, do not get me wrong, I am really a beginner and there is a lot that I do not know and will most likely never learn. But this is really the good news. You do not need a degree in computer sciences to have fun and be productive.

How you can learn

Get Python

Apple computers come with a version of Python installed. It is useful, however, to download a Python package from the official Python website and install it. I have version 2.5.2, but you may want to install 2.6. Version 3.0 is also available. Many programs written with 2.5 or lower will not work in 3.0 without considerable editing. So, I would stick to 2.6 or 2.5.2 for the time being. The installer will place in your computer the Python program, an Interative Developer Environment (IDE) called IDLE and plenty of documentation. Launch IDLE and start programming or get more help on IDLE.

Start practicing

With Python open in IDLE or in the Terminal (for Apple computers), follow the tutorial that came with the installation of Python, or use the provided link. I have found two books very useful. The first is Learning Python by Mark Lutz. The second is Python Cookbook by Alex Martelli. By the way, while you are working on your skills it would be good to also learn basic Unix commands: see this tutorial. The same tutorial has a great introduction to Perl, another language used in bioinformatics and comparable in many respects to Python.

10 Tips of Python

I have found the following ten strategies to be very useful in writing and troubleshooting programs.

Examples of programs

To download a program go the program page using the link below, then click or right-button click the download link. Depending on what you did the two following outcomes are possible.

  1. If your browser takes you to a new page, select all the text in the page and copy it into IDLE or Text Wrangler. Save it by adding a ".py" extension. The program will color the code to distinguish annotation from code.
  2. If you right-button click on the download link, you should be able to download the program as a ".py" file.

Note that the program contains both "live" code and annotation. The latter is any line that is flanked by triple apostrophes: ''' ''' , and any line that starts with "#". The annotation is there to explain the program or to remind the author or user about the function for each line of code.

Bin counter

It places sequencing reads in bins according to their position on a reference genome, producing an output similar to that of a tiling array. Can be used for CGH, CHIP-seq or RNA-seq analysis.

Compare sets

It takes two set of items (e.g. genes) and calculates the intersection, or other comparisons.

Barcode generator

It produces DNA barcodes of desired length and sequence distance.

GC Plotter

It produces high quality plots from gene sequences

Funding sources

The research experience on which the Python pages are based is funded by the National Science Foundation Plant Genome grant DBI-0733857 (Functional Genomics of Polyploids) and National Institutes of Health R01 GM076103-01A1 (Dosage dependent regulation in hybridization) to LC.

Personal tools