My contact info

Assistant Professor of Research

Division of Biostatistics
Department of Preventive Medicine
Keck School of Medicine
University of Southern California
Faculty Profile
2001 Soto Street
SSB202Q, Mailcode: 9234
Los Angeles, CA 90089-9234
e-mail: gary.k.chen 'at'
office: (323) 442-7921
mobile: (714) 928-2705

Software for statistical genetics

Mendel with inversions (binaries only)

Ken Lange's popular genetic analysis program Mendel 6.0, with multipoint linkage analysis adapted to support genomic inversions. This version also computes the posterior probability that an individual has an inversion when cytogenetic data is unavailable. A Fortran90 binary is available, compiled for 64-bit Linux.

Documentation is found here. Since Mendel has evolved since our version for inversion support, some newer file formats may not be supported, so please use the example files included in this distribution as a guide. These files are the same ones used for our paper's example. For more information on the algorithm, please refer to our manuscript.

Markovian Coalescent Simulator

macs is a simulator of the coalescent process that simulates geneologies spatially across chromosomes as a Markovian process. The algorithm is similar to the SMC algorithm (McVean and Cardin, Phil Trans Soc R B 2005) in that the algorithm scales linearly in time with respect to sample size and sequence length. However, it more accurately models the true coalescent, while supporting all demographic scenarios found in the popular program MS (Hudson, Bioinformatics 2002) making this program appropriate for simulating data for structured populations in genome wide association studies. You can get more information from our paper and download software from here. Please post any issues on the GitHub website.

Reversible jump MCMC sampler for Bayesian variable selection

We have recently implemented a novel Bayesian variable selection method called pimsa that rapidly samples the posterior distribution to explore main effects and/or higher order interactions in genetic association studies. Our method empirically weight priors on each variable (e.g. SNP, environmental covariate, interaction, etc) based on the data. The software can be downloaded here. The paper describes application to higher order interactions informed by gene-expression and Gene Ontology knowledge.

Massively parallel LASSO routines for distributed Graphics Processing Units

gpu-lasso is a distributed application that enables variable selection to be carried out on large datasets where the number of variables far exceeds the number of observations. I have designed it for genetic association studies, taking advantage of the compact properties of genotype data. However it is compatible with continuous valued data as well. The program uses OpenCL to carry out data parallel calculations on GPU devices, and MPI to distribute the computational and memory load across multiple hosts. You can download the paper and supplement. Code can be downloaded here.

Imputation of genotypes using Graphics Processing Units

Genotype imputation and haplotype phasing is an extremely computational routine in genetic association studies. We have developed a program called MendelGPU that carries out imputation and phasing at speeds between 1 to 2 orders of magnitude faster than leading programs such as MaCH and IMPUTE2 at similar accuracies. The program can be found here. A paper and supplement is also available.