Bio
Music
Software
Pubs
CV

blog
youtube
flickr
Schroeder Lab

My main research interest is that of Computational Biology which is loosely and tightly defined in many ways, and in some circles not defined at all. Others might call it Bioinformatics (here too) or, more specifically, Computational Genomics, but quite frankly, no one can really arrive at a single, clear definition of the boundaries of the field. Fortunately, these boundaries are being blurred, as many biologists themselves are becoming what some call Bioinformaticians and many Computer Scientists and Mathemeticians are fancying themselves full out Biologists. I am neither, and may never decide which I want to be, but I nonetheless fearlessly pursue the field, as it holds much promise to finding answers to very interesting questions about life, its beginnings, ends, and cycles, in a much more timely fashion than sticking a pipette up your nose all day long.

Many Computational Biologists like to think that they might be able to come up with a clear notion of an actual biological phenomenon through some sort of silly measure as a signal-to-noise ratio, but we all know that this is badly wrong. A Computational Biologist's role as a scientist is to find methods of guessing, predicting, or describing phenomena and design tools which efficiently output the results in a format that a human can understand. A biologist may take this information and use it to actually discover something in the lab, or [s]he may become confused and leave the field altogether. Some of us try to prevent the latter from happening, others of us try to publish as many papers as we can.

I may be doing one of these two things. Let's get down to it:

Originally, the Human Genome was estimated to have over 100,000 genes, but the number is now estimated to be between 20,000 and 25,000. How did this number get so small? Were we really that off?

Not really. There are certainly that many transcripts the Human Genome, but not that many genes. Why is this? Alternative splicing.

The canonical idea of a "gene" is a segment of DNA which codes for a protein. This process is complecated Eukaryotic organisms (organisms with neucleii in their cells, like us). The DNA goes through a great deal of processing:

Namely a gene is spliced after it is transcribed to mRNA. There are many possible reasons for why this happens, but one of them is to leave room for alternative splicing.

Alternative splicing occurs when an mRNA transcript in the same region of the DNA ends up being spliced differently by the machinery. There are many reasons for this. Look them up if you're really interested.

So what can a Computational Biologist do with this? Typically, when predicting genes, we use an algorithm which parses a DNA sequence by finding a single path through a probabilistic machine that is hypothesized to have "generated" the sequence. Finding all paths through 3.2 billion bases of DNA is computationally intractible. Now that most of the gene "loci" (regions of interest) have been discovered, we must find a method to reasonably find these alternative splices.

And that's what I'm doing.