Lightweight force-directed graph layout

Posted on 8th October 2018

by Rupert Overall

The code described in this post was inspired by some practical problems in drawing relatively large highly-connected networks / graphs. Specifically, the data I have been dealing with are correlation matrices derived from gene expression measurements. A whole-genome expression dataset, as generated by microarray hybridisation or next-generation sequencing, typically covers in the order of 10,000–30,000 genes—each of which will become a vertex in the graph. While there are many software solutions[…]

Graph layout on a constrained grid

Posted on 1st August 2018

by Rupert Overall

A network (or in mathematical terminology, a graph) is a collection of entities (known as nodes or vertices) connected by links (also known as edges). Many types of data can be very usefully represented by a network, which allows ease of visualisation and access to analytic tools from the field of graph theory. This post is concerned with visualisation. Specifically, the problem of how best to show the connectivity of vertices in a large graph. To better explain the problem, a little more backgr[…]

Factor reordering in R

Posted on 5th March 2018

by Rupert Overall

This short post is a question I got from a student recently and which I have encountered several times before. Confusing problem but easy answer. It is described (better and in more detail) elsewhere on the web—but I will add to the info-smog anyway… In R, a factor is a way of storing a data vector which contains categories. The categories are represented as character labels and stored as pointers to these labels. For example, consider a factor contining data about pets; There are 6 pets which f[…]

Removing polymorphic probes from Affymetrix microarrays

Posted on 2nd February 2018

by Rupert Overall

Although the microarray heyday may be over, there are plenty of datasets out there that are still relevant and heavily used. The sequencing revolution has had another effect—and that is that high-quality, dense maps of single nuleotide polymorphisms (SNPs) and insertion-deletion events (indels) are now readily available. A particular problem with microarrays (if, like me, you work with genetic populations including non-reference strains) is that polymorphisms inside a microarray probe will affec[…]