The Computational Biology Research Group (CBRG) provides computing support for bioinformatics analysis at the University of Oxford. We have expertise in many aspects of bioinformatics and especially encourage collaborations that require writing custom software, bioinformatics tools and databases. An account with the CBRG gives automatic access to a large number of molecular biology computing packages.

CBRG Accounts

All members of the University of Oxford are eligible for a bioinformatics account with the CBRG. Accounts are free for researchers at WIMM and LICR.


Bioinformatics Training

The Computational Biology Research Group run training courses in sequence analysis, ChiP-Seq & RNA-Seq analysis and molecular biology software.


Analysis Tools

An account with the Computational Biology Research Group, allows you to log on to our server to use the bioinformatics tools that we provide.


Recent Papers see all

Stewart I, Radtke D, Phillips B, McGowan SJ, Bannard O.

Germinal Center B Cells Replace Their Antigen Receptors in Dark Zones and Fail Light Zone Entry when Immunoglobulin Gene Mutations are Damaging.

Immunity (2018) 49(3):477-489.

Adaptive immunity involves the development of bespoke antibodies in germinal centers (GCs) through immunoglobulin somatic hypermutation (SHM) in GC dark zones (DZs) and clonal selection in light zones (LZs). Accurate selection requires that cells fully replace surface B cell receptors (BCRs) following SHM, but whether this happens before LZ entry is not clear. We found that most GC B cells degrade pre-SHM receptors before leaving the DZ, and that B cells acquiring crippling mutations during SHM rarely reached the LZ. Instead, apoptosis was triggered preferentially in late G1, a stage wherein cells with functional BCRs re-entered cell cycle or reduced surface expression of the chemokine receptor CXCR4 to enable LZ migration. Ectopic expression of the anti-apoptotic gene Bcl2 was not sufficient for cells with damaging mutations to reach the LZ, suggesting that BCR-dependent cues may actively facilitate the transition. Thus, BCR replacement and pre-screening in DZs prevents the accumulation of clones with non-functional receptors and facilitates selection in the LZ.

Arezes J, Foy N, McHugh K, Sawant A, Quinkert D, Terraube V, Brinth A, Tam M, Lavallie E, Taylor S, Armitage AE, Pasricha SR, Cunningham O, Lambert M, Draper SJ, Jasuja R, Drakesmith H.

Erythroferrone inhibits the induction of hepcidin by BMP6

Blood (2018)

Decreased hepcidin mobilizes iron, which facilitates erythropoiesis, but excess iron is pathogenic in beta-thalassemia. Erythropoietin (EPO) enhances erythroferrone (ERFE) synthesis by erythroblasts, and ERFE suppresses hepatic hepcidin production, through an unknown mechanism. The BMP/SMAD pathway in the liver is critical for control of hepcidin, and we show that EPO suppressed hepcidin and other BMP target genes in vivo in a partially ERFE-dependent manner. Furthermore, recombinant ERFE suppressed the hepatic BMP/SMAD pathway independently of changes in serum and liver iron, and in vitro, ERFE decreased SMAD 1/5/8 phosphorylation and inhibited expression of BMP target genes. ERFE specifically abrogated the induction of hepcidin by BMP5, BMP6 and BMP7, but had no or little effect on hepcidin induction by BMP2, 4, 9 or Activin B. A neutralising anti-ERFE antibody prevented the ability of ERFE to inhibit hepcidin induction by BMP5, BMP6 and BMP7. Cell-free HTRF assays showed that BMP5, BMP6 and BMP7 competed with anti-ERFE for binding to ERFE. We conclude that ERFE suppresses hepcidin by inhibiting hepatic BMP/SMAD signalling via preferentially impairing an evolutionarily closely related BMP sub-group of BMP5, BMP6 and BMP7. ERFE can act as a natural ligand trap generated by stimulated erythropoiesis in order to regulate availability of iron.

Zhou Y, Koelling N, Fenwick AL, McGowan SJ, Calpena E, Wall SA, Smithson SF, Wilkie AOM, Twigg SRF

Disruption of TWIST1 translation by 5' UTR variants in Saethre-Chotzen syndrome

Hum Mutat. (2018) 39(10):1360-1365

Saethre-Chotzen syndrome (SCS), one of the most common forms of syndromic craniosynostosis (premature fusion of the cranial sutures), results from haploinsufficiency of TWIST1, caused by deletions of the entire gene or loss-of-function variants within the coding region. To determine whether non-coding variants also contribute to SCS, we screened 14 genetically undiagnosed SCS patients using targeted capture sequencing, and identified novel single nucleotide variants (SNVs) in the 5' untranslated region (UTR) of TWIST1 in two unrelated SCS cases. We show experimentally that these variants, which create translation start sites in the TWIST1 leader sequence, reduce translation from the main open reading frame (mORF). This is the first demonstration that non-coding SNVs of TWIST1 can cause SCS, and highlights the importance of screening the 5' UTR in clinically diagnosed SCS patients without a coding mutation. Similar 5' UTR variants, particularly of haploinsufficient genes, may represent an under-ascertained cause of monogenic disease.

Pellagatti A, Armstrong RN, Steeples V, Sharma E, Repapi E, Singh S, Sanchi A, Radujkovic A, Horn P, Dolatshad H, Roy S, Broxholme J, Lockstone H, Taylor S, Giagounidis A, Vyas P, Schuh A, Hamblin A, Papaemmanuil E, Killick S, Malcovati L, Hennrich ML, Gavin AC, Ho AD, Luft T, Hellström-Lindberg E, Cazzola M, Smith CWJ, Smith S, Boultwood J.

Impact of spliceosome mutations on RNA splicing in myelodysplasia: dysregulated genes/pathways and clinical associations.

Blood (2018) 132(12):1225-1240

SF3B1, SRSF2, and U2AF1 are the most frequently mutated splicing factor genes in the myelodysplastic syndromes (MDS). We have performed a comprehensive and systematic analysis to determine the effect of these commonly mutated splicing factors on pre-mRNA splicing in the bone marrow stem/progenitor cells and in the erythroid and myeloid precursors in splicing factor mutant MDS. Using RNA-seq, we determined the aberrantly spliced genes and dysregulated pathways in CD34+ cells of 84 patients with MDS. Splicing factor mutations result in different alterations in splicing and largely affect different genes, but these converge in common dysregulated pathways and cellular processes, focused on RNA splicing, protein synthesis, and mitochondrial dysfunction, suggesting common mechanisms of action in MDS. Many of these dysregulated pathways and cellular processes can be linked to the known disease pathophysiology associated with splicing factor mutations in MDS, whereas several others have not been previously associated with MDS, such as sirtuin signaling. We identified aberrantly spliced events associated with clinical variables, and isoforms that independently predict survival in MDS and implicate dysregulation of focal adhesion and extracellular exosomes as drivers of poor survival. Aberrantly spliced genes and dysregulated pathways were identified in the MDS-affected lineages in splicing factor mutant MDS. Functional studies demonstrated that knockdown of the mitosis regulators SEPT2 and AKAP8, aberrantly spliced target genes of SF3B1 and SRSF2 mutations, respectively, led to impaired erythroid cell growth and differentiation. This study illuminates the effect of the common spliceosome mutations on the MDS phenotype and provides novel insights into disease pathophysiology.

Marie R, Pødenphant M, Koprowska K, Bærlocher L, Vulders RCM, Wilding J, Ashley N, McGowan SJ, van Strijp D, van Hemert F, Olesen T, Agersnap N, Bilenberg B, Sabatel C, Schira J, Kristensen A, Bodmer W, van der Zaag PJ, Mir KU

Sequencing of human genomes extracted from single cancer cells isolated in a valveless microfluidic device

Lab Chip (2018) 18:1891-1902

Sequencing the genomes of individual cells enables the direct determination of genetic heterogeneity amongst cells within a population. We have developed an injection-moulded valveless microfluidic device in which single cells from colorectal cancer derived cell lines (LS174T, LS180 and RKO) and fresh colorectal tumors have been individually trapped, their genomes extracted and prepared for sequencing using multiple displacement amplification (MDA). Ninety nine percent of the DNA sequences obtained mapped to a reference human genome, indicating that there was effectively no contamination of these samples from non-human sources. In addition, most of the reads are correctly paired, with a low percentage of singletons (0.17 ± 0.06%) and we obtain genome coverages approaching 90%. To achieve this high quality, our device design and process shows that amplification can be conducted in microliter volumes as long as the lysis is in sub-nanoliter volumes. Our data thus demonstrates that high quality whole genome sequencing of single cells can be achieved using a relatively simple, inexpensive and scalable device. Detection of genetic heterogeneity at the single cell level, as we have demonstrated for freshly obtained single cancer cells, could soon become available as a clinical tool to precisely match treatment with the properties of a patient's own tumor.

Reijnders MRF, Miller KA, Alvi M, Goos JAC, Lees MM, de Burca A, Henderson A, Kraus A, Mikat B, de Vries BBA, Isidor B, Kerr B, Marcelis C, Schluth-Bolard C, Deshpande C, Ruivenkamp CAL, Wieczorek D; Deciphering Developmental Disorders Study, Baralle D, Blair EM, Engels H, Lüdecke HJ, Eason J, Santen GWE, Clayton-Smith J, Chandler K, Tatton-Brown K, Payne K, Helbig K, Radtke K, Nugent KM, Cremer K, Strom TM, Bird LM, Sinnema M, Bitner-Glindzicz M, van Dooren MF, Alders M, Koopmans M, Brick L, Kozenko M, Harline ML, Klaassens M, Steinraths M, Cooper NS, Edery P, Yap P, Terhal PA, van der Spek PJ, Lakeman P, Taylor RL, Littlejohn RO, Pfundt R, Mercimek-Andrews S, Stegmann APA, Kant SG, McLean S, Joss S, Swagemakers SMA, Douzgou S, Wall SA, Küry S, Calpena E, Koelling N, McGowan SJ, Twigg SRF, Mathijssen IMJ, Nellaker C, Brunner HG, Wilkie AOM.

De Novo and Inherited Loss-of-Function Variants in TLK2: Clinical and Genotype-Phenotype Evaluation of a Distinct Neurodevelopmental Disorder.

Am J Hum Genet. (2018) 102:1195-1203

Next-generation sequencing is a powerful tool for the discovery of genes related to neurodevelopmental disorders (NDDs). Here, we report the identification of a distinct syndrome due to de novo or inherited heterozygous mutations in Tousled-like kinase 2 (TLK2) in 38 unrelated individuals and two affected mothers, using whole-exome and whole-genome sequencing technologies, matchmaker databases, and international collaborations. Affected individuals had a consistent phenotype, characterized by mild-borderline neurodevelopmental delay (86%), behavioral disorders (68%), severe gastro-intestinal problems (63%), and facial dysmorphism including blepharophimosis (82%), telecanthus (74%), prominent nasal bridge (68%), broad nasal tip (66%), thin vermilion of the upper lip (62%), and upslanting palpebral fissures (55%). Analysis of cell lines from three affected individuals showed that mutations act through a loss-of-function mechanism in at least two case subjects. Genotype-phenotype analysis and comparison of computationally modeled faces showed that phenotypes of these and other individuals with loss-of-function variants significantly overlapped with phenotypes of individuals with other variant types (missense and C-terminal truncating). This suggests that haploinsufficiency of TLK2 is the most likely underlying disease mechanism, leading to a consistent neurodevelopmental phenotype. This work illustrates the power of international data sharing, by the identification of 40 individuals from 26 different centers in 7 different countries, allowing the identification, clinical delineation, and genotype-phenotype evaluation of a distinct NDD caused by mutations in TLK2.

Aleksic T, Gray NE, Wu X, Rieunier G, Osher E, Mills J, Verrill C, Bryant RJ, Han C, Hutchinson K, Lambert A, Kumar R, Hamdy FC, Weyer-Czernilofsky U, Sanderson M, Bogenrieder T, Taylor S, Macaulay VM

Nuclear IGF-1R interacts with regulatory regions of chromatin to promote RNA polymerase II recruitment and gene expression associated with advanced tumor stage.

Cancer Res. (2018) 78:3497-3509

Internalization of ligand-activated type 1 IGF receptor (IGF-1R) is followed by recycling to the plasma membrane, degradation or nuclear translocation. Nuclear IGF-1R reportedly associates with clinical response to IGF-1R inhibitory drugs, yet its role in the nucleus is poorly characterized. Here we investigated the significance of nuclear IGF-1R in clinical cancers and cell line models. In prostate cancers, IGF-1R was predominantly membrane-localized in benign glands, while malignant epithelium contained prominent internalized (nuclear/cytoplasmic) IGF-1R, and nuclear IGF-1R associated significantly with advanced tumor stage. Using ChIP-seq to assess global chromatin occupancy, we identified IGF-1R binding sites at or near transcription start sites of genes including JUN and FAM21, most sites coinciding with occupancy by RNA polymerase II (RNAPol2) and histone marks of active enhancers/promoters. IGF-1R was inducibly recruited to chromatin, directly binding DNA and interacting with RNAPol2 to upregulate expression of JUN and FAM21, shown to mediate tumor cell survival and IGF-induced migration. IGF-1 also enriched RNAPol2 on promoters containing IGF-1R binding sites. These functions were inhibited by IGF-1/2 neutralizing antibody xentuzumab (BI 836845), or by blocking receptor internalization. We detected nuclear IGF-1R on JUN and FAM21 promoters in fresh prostate cancers that contained abundant nuclear IGF-1R, with evidence of correlation between nuclear IGF-1R content and JUN expression in malignant prostatic epithelium. Taken together, these data reveal previously unrecognized molecular mechanisms through which IGFs promote tumorigenesis, with implications for therapeutic evaluation of anti-IGF drugs.

Farmery JHR, Smith ML; NIHR BioResource - Rare Diseases, Lynch AG.

Telomerecat: A ploidy-agnostic method for estimating telomere length from whole genome sequencing data.

Sci Rep. (2018) 8(1):1300

Telomere length is a risk factor in disease and the dynamics of telomere length are crucial to our understanding of cell replication and vitality. The proliferation of whole genome sequencing represents an unprecedented opportunity to glean new insights into telomere biology on a previously unimaginable scale. To this end, a number of approaches for estimating telomere length from whole-genome sequencing data have been proposed. Here we present Telomerecat, a novel approach to the estimation of telomere length. Previous methods have been dependent on the number of telomeres present in a cell being known, which may be problematic when analysing aneuploid cancer data and non-human samples. Telomerecat is designed to be agnostic to the number of telomeres present, making it suited for the purpose of estimating telomere length in cancer studies. Telomerecat also accounts for interstitial telomeric reads and presents a novel approach to dealing with sequencing errors. We show that Telomerecat performs well at telomere length estimation when compared to leading experimental and computational methods. Furthermore, we show that it detects expected patterns in longitudinal data, repeated measurements, and cross-species comparisons. We also apply the method to a cancer cell data, uncovering an interesting relationship with the underlying telomerase genotype.

Duarte S, Woll PS, Buza-Vidas N, Chin DWL, Boukarabila H, Luís TC, Stenson L, Bouriez-Jones T, Ferry H, Mead AJ, Atkinson D, Jin S, Clark SA, Wu B, Repapi E, Gray N, Taylor S, Mutvei AP, Tsoi YL, Nerlov C, Lendahl U, Jacobsen SEW.

Canonical Notch signaling is dispensible for adult steady-state and stress myelo-erythropoiesis.

Blood (2018) 131:1712-1719

While an essential role for canonical Notch signaling in generation of hematopoietic stem cells in the embryo and in thymic T cell development is well established, its role in adult bone marrow (BM) myelopoiesis remains unclear. Some studies, analyzing myeloid progenitors in adult mice with inhibited Notch signaling, implicated distinct roles of canonical Notch signaling in regulation of progenitors for the megakaryocyte, erythroid and granulocyte-macrophage cell lineages. However, these studies might also have targeted other pathways. Therefore, we specifically deleted, in adult BM, the transcription factor recombination signal-binding protein J kappa (Rbpj), which canonical signaling through all Notch receptors converges. Notably, detailed progenitor staging established that canonical Notch signaling is fully dispensable for all investigated stages of megakaryocyte, erythroid and myeloid progenitors, in steady state unperturbed hematopoiesis, following competitive BM transplantation and in stress-induced erythropoiesis. Moreover, expression of key regulators of these hematopoietic lineages and Notch target genes were unaffected by Rbpj-deficiency in BM progenitor cells.

Karamitros D, Stoilova B, Aboukhalil Z, Hamey F, Reinisch A, Samitsch M, Quek L, Otto G, Repapi E, Doondeea J, Usukhbayar B, Calvo J, Taylor S, Goardon N, Six E, Pflumio F, Porcher C, Majeti R, Göttgens B, Vyas P.

Single-cell analysis reveals the continuum of human lympho-myeloid progenitor cells.

Nat Immunol. (2018) 19(1):85-97

The hierarchy of human hemopoietic progenitor cells that produce lymphoid and granulocytic-monocytic (myeloid) lineages is unclear. Multiple progenitor populations produce lymphoid and myeloid cells, but they remain incompletely characterized. Here we demonstrated that lympho-myeloid progenitor populations in cord blood - lymphoid-primed multi-potential progenitors (LMPPs), granulocyte-macrophage progenitors (GMPs) and multi-lymphoid progenitors (MLPs) - were functionally and transcriptionally distinct and heterogeneous at the clonal level, with progenitors of many different functional potentials present. Although most progenitors had the potential to develop into only one mature cell type ('uni-lineage potential'), bi- and rarer multi-lineage progenitors were present among LMPPs, GMPs and MLPs. Those findings, coupled with single-cell expression analyses, suggest that a continuum of progenitors execute lymphoid and myeloid differentiation, rather than only uni-lineage progenitors' being present downstream of stem cells.


Hardman CS, Chen YL, Salimi M, Jarrett R, Johnson D, Järvinen VJ, Owens RJ, Repapi E, Cousins DJ, Barlow JL, McKenzie ANJ, Ogg G.

CD1a presentation of endogenous antigens by group 2 innate lymphoid cells

Sci Immunol. (2017) 2(18) pii: eaan5918

Group 2 innate lymphoid cells (ILC2) are effectors of barrier immunity, with roles in infection, wound healing, and allergy. A proportion of ILC2 express MHCII (major histocompatibility complex II) and are capable of presenting peptide antigens to T cells and amplifying the subsequent adaptive immune response. Recent studies have highlighted the importance of CD1a-reactive T cells in allergy and infection, activated by the presentation of endogenous neolipid antigens and bacterial components. Using a human skin challenge model, we unexpectedly show that human skin-derived ILC2 can express CD1a and are capable of presenting endogenous antigens to T cells. CD1a expression is up-regulated by TSLP (thymic stromal lymphopoietin) at levels observed in the skin of patients with atopic dermatitis, and the response is dependent on PLA2G4A. Furthermore, this pathway is used to sense Staphylococcus aureus by promoting Toll-like receptor-dependent CD1a-reactive T cell responses to endogenous ligands. These findings define a previously unrecognized role for ILC2 in lipid surveillance and identify shared pathways of CD1a- and PLA2G4A-dependent ILC2 inflammation amenable to therapeutic intervention.

Zinecker H, Ouaret D, Ebner D, Gaidt MM, Taylor S, Aulicino A, Jagielowicz M, Hornung V, Simmons A

ICG-001 affects DRP1 activity and ER stress correlative with its anti-proliferative effect.

Oncotarget (2017) 8(63): 106764-106777.

Mitochondria form a highly dynamic network driven by opposing scission and fusion events. DRP1 is an essential modulator of mitochondrial fission and dynamics within mammalian cells. Its fission activity is regulated by posttranslational modifications such as activating phosphorylation at serine 616. DRP1 activity has recently been implicated as being dysregulated in numerous human disorders such as cancer and neurodegenerative diseases. Here we describe the development of a cell-based screening assay to detect DRP1 activation. We utilized this to undertake focused compound library screening and identified potent modulators that affected DRP1 activity including ICG-001, which is described as WNT/β-catenin signaling inhibitor. Our findings elucidate novel details about ICG-001's mechanism of action (MOA) in mediating anti-proliferative activity. We show ICG-001 both inhibits mitochondrial fission and activates an early endoplasmic reticulum (ER) stress response to induce cell death in susceptible colorectal cancer cell lines.

Schwessinger R, Suciu MC, McGowan SJ, Telenius J, Taylor S, Higgs DR, Hughes JR

Sasquatch: predicting the impact of regulatory SNPs on transcription factor binding from cell- and tissue-specific DNase footprints.

Genome Res. (2017) 10: 1730-1742

In the era of genome-wide association studies (GWAS) and personalized medicine, predicting the impact of single nucleotide polymorphisms (SNPs) in regulatory elements is an important goal. Current approaches to determine the potential of regulatory SNPs depend on inadequate knowledge of cell-specific DNA binding motifs. Here, we present Sasquatch, a new computational approach that uses DNase footprint data to estimate and visualize the effects of noncoding variants on transcription factor binding. Sasquatch performs a comprehensive k-mer-based analysis of DNase footprints to determine any k-mer's potential for protein binding in a specific cell type and how this may be changed by sequence variants. Therefore, Sasquatch uses an unbiased approach, independent of known transcription factor binding sites and motifs. Sasquatch only requires a single DNase-seq data set per cell type, from any genotype, and produces consistent predictions from data generated by different experimental procedures and at different sequence depths. Here we demonstrate the effectiveness of Sasquatch using previously validated functional SNPs and benchmark its performance against existing approaches. Sasquatch is available as a versatile webtool incorporating publicly available data, including the human ENCODE collection. Thus, Sasquatch provides a powerful tool and repository for prioritizing likely regulatory SNPs in the noncoding genome.