Written by Sebastian
on April 25, 2021

In search of a new programming language

I have been programming in R for the past 10 years, mostly as a scientist in academia. R is extremely useful to quickly validate hypotheses and generate plots to communicate results to collaborators. As I transition into industry I become increasingly frustrated with some of R idiosyncrasies that make it difficult to write error-free programs. Examples of such include recycling shorter vectors (and only warning if the size of the smaller vector is not a multiple of the longer one), and returning results when selecting using an NA value. The following snippet shows both: data.frame(1:4,1:2,1:4)[c(TRUE,NA,NA),] Do you really think this is what I meant? It is very easy for a NA value to slip into your selection operation. And it will not appear in the code. So you have to keep your code and data in your head to fully comprehend your program. More of these can be found in The R inferno.

Things I like about R are its functional nature such as mapping functions (in the form of lapply, sapply…) over vectors insted of writing loops and being able to work interactively in the REPL.

So what would be a good language to replace R with? I need it to for bioinformatics and data analysis. Let’s first set some criteria and then discuss some candidates matching these criteria, with no claims to exhaustiveness.

Want:

functional, meaning I can use a syntax close to R’s vectorized operations (apply and co.)
easy to parallelize
interactive, has a REPL
libraries for statistics, bioinformatics and plotting

Optional:

type system
compiles to native code
pleasant syntax
good library ecosystem

Candidates:

OCaml/SML/F#

Strongly typed family of functional languages. Ocaml is used in many industries and has a flexbile compiler backend. F# allows to tap into the .NET framework. Bioinformatics libraries are BioFsharp or dotnetbio and biocaml.

Haskell

Functional language with focus on purity (no side effects). Compiles to native code. There is Biohaskell but it does not seem Haskell is used much for bioinformatics. Not sure it is practical for interactive use.

Scheme

This is certainly an odd one. I recently looked into Scheme, reading the seminal book “Structure and Interpretation of Computer Programs”. I fell in love with its simplicity yet expressiveness and also its syntax (prefix notation and parentheses) which is unappealing to many. It allows a style of coding similar to R (which is considered a LISP language) and interactive development at the command line. Most implementations have good support for FFI allowing to interface with bioinformatics tools implemented in other languages. The Racket Scheme implementation probably has the largest libraries ecosystem including libraries for plotting and statistics.

Julia

Has a type system, there is a bioinformatics package Biojulia and the language is designed for fast numerics and compiles to native code. It is still young and moves quickly with new packages being added and also changes to the main language.

Rust

A memory-safe language. Not sure it has much use in bioinformatics. There is an actively developed biolibrary rust-bio.

To sum it up, there is no way around R and python when it comes to the library ecosystems with over 10 000 packages in R/Bioconductor or python for bioinformatics, plotting and scientific computing. I think Julia is a language to watch in this space, but it might still need some time to mature. Scheme would be a personal favourite at this point, but more from an ideological rather than a practical perspective.

← → Top