Bioinformatics Research Opportunities

All the faculty on this list should be considered to be preapproved. Students are welcome to work with other faculty after obtaining approval of one of the bioinformatics faculty supervisors, Kirk Pruhs ( or Paula Grabowski (

Ivet Bahar

Molecular biophysics; neurobiological signaling; drug discovery. If interested, please contact Ivet Bahar (

Takis Benos

The members of Benos' group are interested in modeling biological processes using computational and statistical methods. We are particularly interested in developing graph theoretical approaches to identify critical DNA polymorphisms in a gene network context. Students working with us will be exposed to a learning of both the computational methodology and the biology that underlines many complex phenomena and diseases. We routinely use statistical and computational (machine learning) methods in the lab to analyze high-throughput datasets. If interested, please contact Takis Benos (

Richard Boyce

The purpose of the TRanslational Informatics Applied to Drug Safety (TRIADS) lab is to conduct research at the intersection of bioinformatics, epidemiology, and comparative effectiveness to improve medication safety for older adults. Current projects include 1) advanced methods for integrating drug-drug interaction (DDI) predictions into clinical decision support tools, and 2) exploring the use ontologies and natural language processing to determine the clinical relevance of DDI predictions. Student researchers would gain skills in the use ontologies, NLP, and the Semantic Web (especially "linked data") for biomedical research. If interested, please contact Richard Boyce (

Jon Boyle

Using comparative genomics to understand host range expansion in a human pathogen. In my lab we are interested in understanding how the human pathogen, Toxoplasma gondii, has evolved to infect an incredible array of intermediate hosts. This feature is unique to T. gondii among eukaryotic pathogens. Our central hypothesis is that genes that have driven host range expansion in this parasite are likely to be important for causing disease in humans. To identify these genes we have taken a comparative genomics approach. Whole genome sequences are available from multiple T. gondii strains, as well as a close relative of T. gondii that has a much more restricted host range. We have also sequenced another sister species of T. gondii in the lab. We are comparing these genomes in a variety of ways, and have become particularly interested recently in the role that structural genomic variation, and in particular gene expansion, has played in the evolution of these species. We are looking for undergraduates who are motivated to tackle these newly emerging datasets using whole-genome alignment, ortholog identification, and sequence coverage analyses. Students may also be able to test the hypotheses that emerge from their bioinformatic work on the parasites that we routinely grow in the laboratory. If interested, please contact Jon Boyle (

Lillian Chong

The Chong lab develops and maintains the WESTPA (Weighted Ensemble Simulation Toolkit with Parallelization and Analysis) software package for increasing the efficiency of simulating “rare” events (e.g. protein folding, protein binding, etc.). The package is optimized for use in supercomputing environments as well as typical computing clusters. Undergraduate researchers will participate in further development efforts involving the WESTPA software, gaining experience in areas of concurrent and parallel programming, data structures and algorithms, or distributed computing. Required skills for these research opportunities are: proficiency in the Python and C programming languages, systems and network programming in the Unix environment, experience with concurrent and parallel programming, and the ability to work as part of a team. If interested, please contact Lillian Chong (

Uma Chandran

High throughput platforms including microarray and Next Generation Sequencing for gene expression, variant analysis, and ChiP Seq; integrative analysis of multiple applications; analysis of data from large consortium projects such as The Cancer Genome Atlas (TCGA). If interested, please contact Uma Chandran at

Maria Chikina

I work closely with several experimental labs (working in cancer and immunology) on analyzing genome-scale data (RNAseq, ChIPseq, microarray). There are many self-contained analysis projects that could be completed in a semester and have a good chance of leading to a publication. Some knowledge of R and a familiarity with linux is helpful but an ability to learn independently and use online resources effectively is most important. If interested, please contact Maria Chikina

Nathan Clark

Efforts to understand genes and genomes are greatly enhanced by evolutionary analyses. In our group we combine evolutionary inference with direct experiments to determine the relationships between genes and to reveal the genetic changes underlying adaptation between species. Current projects are: (1) We seek to identify coevolutionary signatures between genes that function in a common pathway or complex. We then exploit these signatures to infer new genetic interactions and reveal deeper relationships between entire genetic pathways. We perform genome-wide coevolution studies in yeasts, in Drosophila species and in mammals, and in all three groups have revealed novel genetic interactions validated by experiments. (2) We create and resolve genetic incompatibilities through transgenic experiments in baker’s yeast (Saccharomyces cerevisiae). By mutating a select few amino acids or substituting an entire protein complex from one species for that of another, we follow the effects of co-evolved amino acid changes via phenotypic and physical interaction assays. (3) We study the adaptive evolution of proteins involved in sexual reproduction and how their divergence contributes to reproductive incompatibilities between individuals. We identify historical cases of adaptive evolution in the butterfly spermatophore and detail how they have affected this important relationship between males and female. If interested, please contact Nathan Clark

Markus Dittrich

The Dittrich lab at the Pittsburgh Supercomputing Center and Carnegie Mellon University conducts state-of-the-art computational research on biological systems at the cellular level. Of particular interest to us is the study of the structure and function of synapses, the connections between nerve cells in the brain and building blocks of the nervous system. Our tools are particle based, stochastic 3D simulation approaches such as MCell ( which are developed in my lab in collaboration with scientists at the University of Pittsburgh and the Salk Institute. Interested students will be part of a vibrant research lab. Possible projects could either be focused on computational biology research or software development (such as implementing new simulation capabilities or developing novel web-based model creation, simulation and analysis frameworks). If interested please contact Markus Dittrich

Jacob Durrant

Petascale, GPU, and cloud computing are transforming computer-aided drug design (CADD) into an even more powerful tool for both medical and basic-science research. The mission of the Durrant lab at the University of Pittsburgh is to develop broadly applicable, innovative CADD techniques and to use those techniques to further drug discovery targeting infectious diseases, neurological conditions, and cancer.

Vanathi Gopalakrishnan

Prof. Vanathi Gopalakrishnan directs the PRoBE laboratory for Pattern Recognition from Biomedical Evidence within the Department of Biomedical Informatics. Our group designs and develops novel machine learning algorithms using symbolic, probabilistic and hybrid approaches to solve bioinformatics problems of clinical importance such as biomarker discovery and disease classification. Our group has vast expertise in the development and application of novel bioinformatics algorithms for the analysis of large biomedical datasets, such as genomic and imaging data. Current research projects involve the development and application of novel variants of rule learning techniques to biomarker discovery and disease prediction for early detection and better understanding of mechanisms that cause neurodegenerative diseases, heart disease, lung, breast and esophageal cancers. We are fundamentally interested in technologies for data mining and discovery that allow incorporation of prior knowledge. Methods for incorporating prior knowledge that are being researched in her laboratory include text mining and ontology construction. If interested, please contact Vanathi Gopalakrishnan (

Paula Grabowski

Alternative pre-mRNA splicing. If interested, please contact Paula Grabowski (

Graham Hatfull

The bacteriophage population is huge, diverse, dynamic, and old. We have established a large collection of bacteriophages known to infect a single common host and which are therefore in potential genetic communication with each other. The approximately 250 completely sequenced mycobacteriophage genomes are genetically diverse and contain mosaic architectures, and dissecting the complex relationships between these genomes and the approximately 25,000 genes we have identified presents a considerable bioinformatic challenge. Newly developed tools enable the rapid comparison of these genomes and their genetic relationships, but there are many unresolved questions awaiting bioinformatic solutions. If interested, please contact Graham Hatfull (

David Koes

I develop novel computational algorithms and interactive online systems that support rapid and inexpensive drug discovery. Recent community efforts (CSAR, DUD-E) have generated a wealth of structural and experimental data that can be used to calibrate and develop new drug discovery algorithms. In particular, I am interested in developing high-performance algorithms (possibly using GPUs) for screening and scoring drug-protein interactions as well as slower, higher-fidelity algorithms that include a detailed model of entropic effects. Undergraduate researchers will further develop their machine learning, programming, and statistical analysis skills while learning about protein structure and function. If interested, please contact David Koes (

Jeffrey Lawrence

Our research is directed toward elucidating the evolution of bacterial genomes, including their size, composition, variability and organization. In other words, why do genomes have the genes that they do? An understanding of the evolutionary process that leads to differences in genomes will shed light on how species themselves differentiate. We take computations, theoretical and experimental approaches to understanding how genomes evolve. If interested please contact Jeffrey Lawrence (

Miler Lee

My research addresses how gene expression programs change, leading to changes to cellular identity. During embryogenesis, widespread transcriptional and post-transcriptional regulation reprograms embryonic cells to a pluripotent identity. To deduce the molecular mechanisms that drive this transition in vertebrates, we use high-throughput sequencing to assay RNA and chromatin changes during early zebrafish development; comparative genomics to understand how embryonic gene regulation has evolved and diverged; and sequence analysis to identify DNA/RNA signals that confer regulation.

Tim Lezon

In collaboration with the Drug Discovery Institute, I am developing new techniques for analyzing experimental cell-level data. We are essentially looking to reverse-engineer signaling pathways based on images of thousands of cells in hopes that this will lead to new chemical signatures of cancer, novel therapeutic targets and effective drug combinations. Bioinformatics students working on this project would be involved in developing a software package for general use in extracting systems biology information from high content screening (HCS) data. This software should be able to take as input generic HCS data and perform a variety of tasks, such as efficiently calculating conventional statistical measures to determine whether the data set is useful and which parameters are informative, and automatically classifying data based on its distribution. The broader goal is to develop tools that can make quantitative predictions of heterogeneous cellular responses to various stimuli. If interested, please contact Tim Lezon (

Amanda Poholek

The Poholek lab studies the biological circuits that contribute to immune cell differentiation and function. Transcription factors play a major role in this process by modifying chromatin and altering epigenetic states. Using the NextGen 
Sequencing technologies of RNA-seq, ChIP-seq and ATAC (Assay for Transpose Accessible Chromatin)-seq, we use bioinformatic approaches to identify key chromatin alterations that effect cellular differentiation and function. The current 
publicly available datasets combined with our datasets generated in house provide an overwhelming amount of data related to the role of immune cells in health and disease. We also aim to develop novel methods to mine datasets to 
identify unique pathways involved in immune cell function. If interested, please contact Amanda Poholek

Jim Pipas

We are interested in developing computational strategies that can detect infectious agents that cause cancer. We aim to identify novel viruses by searching for intersections between sequence data obtained by metagenomic sampling and sequences derived from RNA-seq experiments on human cancers. Existing approaches depend on identifying infectious agents that have sequence similarity to known species. However, a number of studies have shown that the vast majority of microorganisms have not yet been identified, classified or characterized. This is especially true for the viruses. For example our recent survey of viruses in raw sewage suggested that over 99% of them represented unknown species. We hypothesize that the sequence signatures of such agents are present among the millions of sequence reads generated from metagenomic studies. The majority of metagenomic data is not similar to known species, and most likely contains a great number of uncharacterized viruses. These data sets are expanding rapidly as the studies of the human microbiome, among others, become available. We are interested in creating dynamic search platforms that can be easily updated as the metagenomic and tumor RNA-seq data sets grow. In addition, we are creating computational tools that facilitate the assembly of complete genomes of novel cancer-associated viruses. If interested, please contact Jim Pipas (

Mark Rebeiz

Molecular Evolution of Development. If interested, please contact Mark Rebeiz (

Xinghua Lu