Posts

May 8, 2020
Safely Selecting Data Frame Columns in Your Tidyverse Code

: R, tidyverse

In my previous post “Use of the .data and .env Pronouns to Disambiguate Your Tidyverse Code”, I discussed how using the .data and .env pronouns should be used to write production-grade R code. The post was inspired by Lionel Henry’s talk titled “Interactivity and Programming in the tidyverse”.
Mar 1, 2020
Use of the .data and .env Pronouns to Disambiguate Your Tidyverse Code

: R, tidyverse

As someone who has been writing tidyverse code for a few years, I’ve always found it difficult to bring the concept of tidyeval into a production-level environment. The recent rstudio::conf 2020 talk by Lionel Henry titled “Interactivity and Programming in the tidyverse” shed some much needed light on how to write safe production-level tidyverse code.
Aug 30, 2019
Accessing Wildcard Values in a Snakemake Rule

: Snakemake

If you are familiar with Snakemake, then you will likely have used wildcards before. For instance, consider the following rule in our Snakefile:
Dec 14, 2018
The Crux of Bayesian Statistics

: R, Statistics, Bayesian Statistics

If you are in some field that has data (which is a lot of fields these days), you will have undoubtly encountered the term Bayesian statistics at some point. When I first encountered it, I did what most people probably do. I googled “What is Bayesian statistics?”. After reading through some resources and getting through the idiosynatric terms/concepts (e.g. conjugate priors, posteriors, Markov Chain Monte Carlo), I still went away not really understanding what was so important about Bayesian statistics.
Nov 4, 2017
Making Your Personal Vim Cheatsheet

: Vim

First off, major credit goes to Matt Butcher (@technosophos) and his original post on creating Vim cheatsheets. This post is to follow-up on this concept and how I got it to work with my setup.
May 13, 2017
Getting Vim + Ctags Working with R

: Ctags, R, Vim

I’ve been an avid Vim user for many years and think it’s one of the best text editors out there. My enthusiasm for Vim, along with all its great plugins, is also one of the major reasons why I have never adopted RStudio for R programming and instead still continue to use the Nvim-R plugin. This wonderful plugin provides the critical feature of being able to send commands from your R script to an R session. Along with other handy features it provides, this plugin serves as a good alternative to Rstudio for those, like me, who want to stick with Vim.
Mar 8, 2017
How to Do Bayesian Inference 101

: R, Statistics, Bayesian Statistics

Towards the end of the post Bayes’ Rule, I eluded a bit to how Bayes’ rule becomes extremely powerful in Bayesian inference. The link happens when we start to interpret the variables of Bayes’ rule as parameters ($\theta$) of a model and observed data ($D$):
Nov 12, 2016
Using Bioconductor in a Conda Environment

: Conda, Bioconductor, GEOquery

I recently switched over to using the conda package management system for R. One of the benefits of conda is that package installation can be performed very easily with the command:
Jun 4, 2016
What is the Difference Between a Hemizygous and Heterozygous Deletion?

: Cancer

I have been using the term hemizygous and heterozygous deletion interchangeably for the last few years to describe a single allele/copy deletion in cancer genomes. But it was recently brought to my attention that the term heterozygous deletion is not always correct when describing single copy deletions. The term heterozygous implies that the original two alleles of a genomic locus were different. But we may observe a single allele deletion where the original two alleles were identical. In this case, this would not be a heterozygous deletion, but rather it would be a hemizygous deletion which implies there is only copy remaining but makes no claim that the two original alleles were different.
May 12, 2016
The Basics of Survival Analysis

: Statistics, Survival Analysis

Survival analysis is a series of statistical methods that deals with variables that have both a time and event associated with it. For example, it is used in cancer clinical research if we are interested in measuring the time it takes before a patient relapses following treatment. In this case, the event we are measuring here is whether a patient relapses or not which has a time associated with when the relapse occurs.
Apr 21, 2016
Bayes' Rule

: R, Statistics, Bayesian Statistics

In a previous post on Joint, Marginal, and Conditional Probabilities, we learned about the 3 different types of probabilities. One famous probability rule that is built on these probabilities (specifically the conditional probability) is called “Bayes’ Rule” which forms the basis of bayesian statistics. In this post, we will learn about how to derive this rule and its utility.
Mar 20, 2016
Joint, Marginal, and Conditional Probabilities

: R, Statistics

Probabilities represent the chances of an event x occurring. In the classic interpretation, a probability is measured by the number of times event x occurs divided by the total number of trials; In other words, the frequency of the event occurring. There are three types of probabilities:
Mar 17, 2016
Probability Distributions and their Mass/Density Functions

: R, Statistics

A probability distribution is a way to represent the possible values and the respective probabilities of a random variable. There are two types of probability distributions: discrete and continuous probability distribution. As you might have guessed, a discrete probability distribution is used when we have a discrete random variable. A continuous probability distribution is used when we have a continuous random variable.
Feb 26, 2016
Do You Understand Random Variables?

: Statistics, Random Variables

A random variable can be a confusing concept because it is not like a traditional variable that you may have been exposed to before. I will be honest and say that I never really understood them until I started doing some more digging into them. This post hopes to clarify exactly what a random variable. Here is an overview of what will be discussed in this post.
Feb 20, 2016
Installing topicmodels - "fatal error: 'gsl/gsl_rng.h'"

: R, topicmodels

I recently tried to install the topicmodels R package (v0.2-3) on my Mac that was running OS X Yosemite (v10.10.4 - 14E46) with Xcode (v6.4 - 6E35b). My R (v3.2.2) was installed using homebrew.
Jan 13, 2016
Configuring R - "cannot compile a simple Fortran program"

: R

I recently had to install an older version of R (v3.1.2) from source for a specific project. Even though, I have done this a few dozen times it never ceases to amaze me that I still run into new errors. While trying to run configure:
Jan 3, 2016
Fitting a Mixture Model Using the Expectation-Maximization Algorithm in R

: R, Mixture Models, Expectation-Maximization

In my previous post “Using Mixture Models for Clustering in R”, I covered the concept of mixture models and how one could use a gaussian mixture model (GMM), one type of mixure model, for clustering. If you are like me, not knowing what is happening “under the hood” may bug you. What is actually happening when I run the normalmixEM function from mixtools? How does it know where to “put” the components? In this post, I will cover how we can implement your own GMM in R.
Dec 6, 2015
R Markdown to Jekyll: "Protecting" Your Math Equations

: Rmarkdown, MathJax, R

In some of the posts I’ve written in my blog (e.g. Using Mixture Models for Clustering in R), I’ve first written them in R markdown and used knitr to convert them into a markdown file to be subsequently processed by Jekyll. Nicole White made a fantastic post on how to publish an R markdown file for a Jekyll blog and this is what helped me at first.

One aspect that isn’t mentioned is whether math equations, rendered using MathJax, will work in a Jekyll. As it turns out, it’s not super straightforward and often doesn’t render. The fundamental problem is that Jekyll markdown parsers will first attempt to parse the equations which often messes them up before MathJax can intepret them. The problem is further complicated by the fact that different Jekyll markdown parsers will handle the equation blocks slightly differently.
Nov 17, 2015
"Configuration failed because cairo was not found"

: Cairo, R

I recently was trying to install Hadley Wickham’s new svglite R package on my Mac OS X Yosemite:
Oct 13, 2015
Using Mixture Models for Clustering

: Mixture Models, R

If you’ve been exposed to machine learning in your work or studies, chances are you’ve heard of the term mixture model. But what exactly is a mixture model and why should you care?
Sep 19, 2015
"VariantContextComparator.java:84" when Using Picard's SortVcf

: Bioinformatics, Picard

If you’ve ever had to sort a vcf file by the same order of as a reference file, then the SortVcf function from Picard tools is what you need. The function can be easily run with the following command:
Sep 17, 2015
Installing the V8 R Package - "No package 'libv8' found"

: R, V8

I recently tried to install the V8 R package on my Mac running Yosemite using install.packages("V8") only to run into this problem.
Sep 15, 2015
"semi-transparency is not supported on this device" in R

: R, Cairo

If you’ve ever tried to make an R plot with transparency:
Sep 2, 2015
How "Pseudoalignments" Work in kallisto

: Bioinformatics, kallisto

@lpachter’s group has recently introduced a new tool called kallisto to the bioinformatics community which marks a huge advancement in how RNA-seq analysis is done. kallisto is a “lightweight algorithm” that is super fast at quantifying the abundance of transcripts from RNA-seq data with high accuracy. The speed of the program can be attributed to the usage of “psuedoalignments” which aims to (from Lior’s blog post):
Aug 31, 2015
Installing devtools on Mac OSX - "fatal error: 'libxml/tree.h'"

: R, devtools, xml2

I recently tried to install the devtools R package (v1.8) on my Mac that was running OS X Yosemite (v10.10.4 - 14E46) with Xcode (v6.4 - 6E35b). My R (v3.2.2) was installed using homebrew.
Aug 26, 2015
Using TrAp (Tree Approach to Clonality) for Deconvoluting Evolutionary Patterns Underlying a Tumour

: Bioinformatics, TrAp, Cancer, Evolution

I recently had a chance to try out the TrAp software from the Yuval Kluger’s Lab:
Aug 25, 2015
How do I Interpret a Confidence Interval?

: Statistics

A confidence interval is a range that indicates the uncertainity of a population parameter estimate. So what does this actually mean? Say someone said the following statement:
Aug 19, 2015
Using svgPanZoom on a Pre-generated SVG

: R, svgPanZoom

@timelyportfolio has provided the R community with the fantastic svgPanZoom R htmlwidget which leverages off the svg-pan-zoom Javascript library to allow for panning and zooming of SVG in html documents.
Aug 19, 2015
Installing the xml2 R Package - "fatal error: 'libxml/tree.h'"

: R, xml2

I recently tried to install the xml2 R package (v0.1.1) on my Mac that was running OS X Yosemite (v10.10.4 - 14E46) with Xcode (v6.4 - 6E35b), I ran the following command in my R console:
Aug 12, 2015
Working with Git Submodules

: git

Git submodules provide a nifty way to integrate a git repository within a git repository. The first time I encountered git submodules was when I was browsing Heng Li’s fermikit git repository.
Jul 26, 2015
Making Your First R Package

: R

This post is inspired by a hilarious tweet that David Robinson made on June 19th, 2015:
Jul 19, 2015
Creation of My Site/Blog!

Welcome to the first post in my newly created site/blog! Looking forward to blogging about my adventures in bioinformatics, cancer, and big data research. The creation of this blog was done by using jekyll which provides a nice and easy way to get started with a static website that is blog-aware. All the content for this website is hosted on github.