Posts

Safely Selecting Data Frame Columns in Your Tidyverse Code
In my previous post “Use of the .data and .env Pronouns to Disambiguate Your Tidyverse Code”, I discussed how using the
.data
and.env
pronouns should be used to write productiongrade R code. The post was inspired by Lionel Henry’s talk titled “Interactivity and Programming in the tidyverse”. 
Use of the .data and .env Pronouns to Disambiguate Your Tidyverse Code
As someone who has been writing tidyverse code for a few years, I’ve always found it difficult to bring the concept of tidyeval into a productionlevel environment. The recent rstudio::conf 2020 talk by Lionel Henry titled “Interactivity and Programming in the tidyverse” shed some much needed light on how to write safe productionlevel tidyverse code.

Accessing Wildcard Values in a Snakemake Rule
If you are familiar with Snakemake, then you will likely have used wildcards before. For instance, consider the following rule in our
Snakefile
: 
The Crux of Bayesian Statistics
: R, Statistics, Bayesian Statistics
If you are in some field that has data (which is a lot of fields these days), you will have undoubtly encountered the term Bayesian statistics at some point. When I first encountered it, I did what most people probably do. I googled “What is Bayesian statistics?”. After reading through some resources and getting through the idiosynatric terms/concepts (e.g. conjugate priors, posteriors, Markov Chain Monte Carlo), I still went away not really understanding what was so important about Bayesian statistics.

Making Your Personal Vim Cheatsheet
: Vim
First off, major credit goes to Matt Butcher (@technosophos) and his original post on creating Vim cheatsheets. This post is to followup on this concept and how I got it to work with my setup.

Getting Vim + Ctags Working with R
I’ve been an avid Vim user for many years and think it’s one of the best text editors out there. My enthusiasm for Vim, along with all its great plugins, is also one of the major reasons why I have never adopted RStudio for R programming and instead still continue to use the NvimR plugin. This wonderful plugin provides the critical feature of being able to send commands from your R script to an R session. Along with other handy features it provides, this plugin serves as a good alternative to Rstudio for those, like me, who want to stick with Vim.

How to Do Bayesian Inference 101
: R, Statistics, Bayesian Statistics
Towards the end of the post Bayes’ Rule, I eluded a bit to how Bayes’ rule becomes extremely powerful in Bayesian inference. The link happens when we start to interpret the variables of Bayes’ rule as parameters ($\theta$) of a model and observed data ($D$):

Using Bioconductor in a Conda Environment
: Conda, Bioconductor, GEOquery
I recently switched over to using the conda package management system for R. One of the benefits of conda is that package installation can be performed very easily with the command:

What is the Difference Between a Hemizygous and Heterozygous Deletion?
: Cancer
I have been using the term hemizygous and heterozygous deletion interchangeably for the last few years to describe a single allele/copy deletion in cancer genomes. But it was recently brought to my attention that the term heterozygous deletion is not always correct when describing single copy deletions. The term heterozygous implies that the original two alleles of a genomic locus were different. But we may observe a single allele deletion where the original two alleles were identical. In this case, this would not be a heterozygous deletion, but rather it would be a hemizygous deletion which implies there is only copy remaining but makes no claim that the two original alleles were different.

The Basics of Survival Analysis
: Statistics, Survival Analysis
Survival analysis is a series of statistical methods that deals with variables that have both a time and event associated with it. For example, it is used in cancer clinical research if we are interested in measuring the time it takes before a patient relapses following treatment. In this case, the event we are measuring here is whether a patient relapses or not which has a time associated with when the relapse occurs.

Bayes' Rule
: R, Statistics, Bayesian Statistics
In a previous post on Joint, Marginal, and Conditional Probabilities, we learned about the 3 different types of probabilities. One famous probability rule that is built on these probabilities (specifically the conditional probability) is called “Bayes’ Rule” which forms the basis of bayesian statistics. In this post, we will learn about how to derive this rule and its utility.

Joint, Marginal, and Conditional Probabilities
: R, Statistics
Probabilities represent the chances of an event x occurring. In the classic interpretation, a probability is measured by the number of times event x occurs divided by the total number of trials; In other words, the frequency of the event occurring. There are three types of probabilities:

Probability Distributions and their Mass/Density Functions
: R, Statistics
A probability distribution is a way to represent the possible values and the respective probabilities of a random variable. There are two types of probability distributions: discrete and continuous probability distribution. As you might have guessed, a discrete probability distribution is used when we have a discrete random variable. A continuous probability distribution is used when we have a continuous random variable.

Do You Understand Random Variables?
: Statistics, Random Variables
A random variable can be a confusing concept because it is not like a traditional variable that you may have been exposed to before. I will be honest and say that I never really understood them until I started doing some more digging into them. This post hopes to clarify exactly what a random variable. Here is an overview of what will be discussed in this post.

Installing topicmodels  "fatal error: 'gsl/gsl_rng.h'"
: R, topicmodels
I recently tried to install the topicmodels R package (v0.23) on my Mac that was running OS X Yosemite (v10.10.4  14E46) with Xcode (v6.4  6E35b). My R (v3.2.2) was installed using homebrew.

Configuring R  "cannot compile a simple Fortran program"
: R
I recently had to install an older version of R (v3.1.2) from source for a specific project. Even though, I have done this a few dozen times it never ceases to amaze me that I still run into new errors. While trying to run
configure
: 
Fitting a Mixture Model Using the ExpectationMaximization Algorithm in R
: R, Mixture Models, ExpectationMaximization
In my previous post “Using Mixture Models for Clustering in R”, I covered the concept of mixture models and how one could use a gaussian mixture model (GMM), one type of mixure model, for clustering. If you are like me, not knowing what is happening “under the hood” may bug you. What is actually happening when I run the
normalmixEM
function from mixtools? How does it know where to “put” the components? In this post, I will cover how we can implement your own GMM in R. 
R Markdown to Jekyll: "Protecting" Your Math Equations
In some of the posts I’ve written in my blog (e.g. Using Mixture Models for Clustering in R), I’ve first written them in R markdown and used knitr to convert them into a markdown file to be subsequently processed by Jekyll. Nicole White made a fantastic post on how to publish an R markdown file for a Jekyll blog and this is what helped me at first.
One aspect that isn’t mentioned is whether math equations, rendered using MathJax, will work in a Jekyll. As it turns out, it’s not super straightforward and often doesn’t render. The fundamental problem is that Jekyll markdown parsers will first attempt to parse the equations which often messes them up before MathJax can intepret them. The problem is further complicated by the fact that different Jekyll markdown parsers will handle the equation blocks slightly differently.

"Configuration failed because cairo was not found"
I recently was trying to install Hadley Wickham’s new svglite R package on my Mac OS X Yosemite:

Using Mixture Models for Clustering
: Mixture Models, R
If you’ve been exposed to machine learning in your work or studies, chances are you’ve heard of the term mixture model. But what exactly is a mixture model and why should you care?

"VariantContextComparator.java:84" when Using Picard's SortVcf
If you’ve ever had to sort a vcf file by the same order of as a reference file, then the SortVcf function from Picard tools is what you need. The function can be easily run with the following command:

Installing the V8 R Package  "No package 'libv8' found"
I recently tried to install the V8 R package on my Mac running Yosemite using
install.packages("V8")
only to run into this problem. 
"semitransparency is not supported on this device" in R
If you’ve ever tried to make an R plot with transparency:

How "Pseudoalignments" Work in kallisto
@lpachter’s group has recently introduced a new tool called kallisto to the bioinformatics community which marks a huge advancement in how RNAseq analysis is done. kallisto is a “lightweight algorithm” that is super fast at quantifying the abundance of transcripts from RNAseq data with high accuracy. The speed of the program can be attributed to the usage of “psuedoalignments” which aims to (from Lior’s blog post):

Installing devtools on Mac OSX  "fatal error: 'libxml/tree.h'"
I recently tried to install the devtools R package (v1.8) on my Mac that was running OS X Yosemite (v10.10.4  14E46) with Xcode (v6.4  6E35b). My R (v3.2.2) was installed using homebrew.

Using TrAp (Tree Approach to Clonality) for Deconvoluting Evolutionary Patterns Underlying a Tumour
: Bioinformatics, TrAp, Cancer, Evolution
I recently had a chance to try out the TrAp software from the Yuval Kluger’s Lab:

How do I Interpret a Confidence Interval?
A confidence interval is a range that indicates the uncertainity of a population parameter estimate. So what does this actually mean? Say someone said the following statement:

Using svgPanZoom on a Pregenerated SVG
: R, svgPanZoom
@timelyportfolio has provided the R community with the fantastic svgPanZoom R htmlwidget which leverages off the svgpanzoom Javascript library to allow for panning and zooming of SVG in html documents.

Installing the xml2 R Package  "fatal error: 'libxml/tree.h'"
I recently tried to install the xml2 R package (v0.1.1) on my Mac that was running OS X Yosemite (v10.10.4  14E46) with Xcode (v6.4  6E35b), I ran the following command in my R console:

Working with Git Submodules
: git
Git submodules provide a nifty way to integrate a git repository within a git repository. The first time I encountered git submodules was when I was browsing Heng Li’s fermikit git repository.

Making Your First R Package
: R
This post is inspired by a hilarious tweet that David Robinson made on June 19th, 2015:

Creation of My Site/Blog!
Welcome to the first post in my newly created site/blog! Looking forward to blogging about my adventures in bioinformatics, cancer, and big data research. The creation of this blog was done by using jekyll which provides a nice and easy way to get started with a static website that is blogaware. All the content for this website is hosted on github.
subscribe via RSS