Bioinformatics


  • Mark F Rogers, Tom R Gaunt and Colin Campbell. Prediction of driver variants in the cancer genome via machine learning methodologies. Briefings in Bioinformatics (OUP). Volume 122, pages: 1467–1476 (2020), bbaa250, https://doi.org/10.1093/bib/bbaa250.
  • This is an overview and outline of the approach used in the construction of cancer-specific disease-driver predictors, such as CScape and CScape-somatic, cited below.

  • Mark F Rogers, Tom R Gaunt and Colin Campbell. CScape-somatic: distinguishing driver and passenger point mutations in the cancer genome. Bioinforamtics. Volume 36, Issue 12 , pages: 3637–3644 (2020). The CScape-somatic predictor is located here.
  • Bogdan Luca, Vincent Moulton, Christopher Ellis, Dylan R Edwards, Colin Campbell, Rosalin Cooper, Jeremy Clark, Daniel Brewer and Colin Cooper. A Novel Stratification Framework for Predicting Outcome in Patients with Prostate Cancer. British Journal of Cancer (Nature). Volume 122, pages: 1467–1476 (2020).
  • Based on the Latent Process Decomposition method proposed earlier (see below) a method for resolving prostate cancer into aggressive versus indolent disease course is proposed.

  • Madeleine Darbyshire, Zachary du Toit, Mark F. Rogers, Tom Gaunt, and Colin Campbell. Estimating the Frequency of Single Point Driver Mutations across Common Solid Tumours. Scientific Reports (Nature) 9, article number: 13452, (2019) (main paper and supplementary, some additional plots are located here). Based on the use of our CScape predictor, referenced below, we argue that the average number of coding single nucleotide variants in the human cancer genome, driving disease, is very small in size, though very variable by cancer type. Hypermutation is excluded from our study and the above claim has been argued by other authors. To a certain extent these drivers are identifiable by the machine-learning-based tool proposed (CScape). The paper also discusses point mutation drivers in non-coding regions of the cancer genome, driver genes and the influence of stage on the driver count (coding single point mutations).
  • Mark Rogers, Hashem Shihab, Tom Gaunt, and Colin Campbell. CScape: a tool for predicting oncogenic single-point mutations in the cancer genome. Scientific Reports (Nature) 7, article number: 11597 (2017) (main paper and supplementary). This method uses integrative machine learning methods to propose a classifier for predicting if a single point mutation in the cancer genome is a disease-driver or neutral, for mutations in both non-coding and coding regions (predictions are based on reference GRCh37/hg19 (ENSEMBL release 87) of the human genome). Our CScape predictor is located here and uses a wide variety of data sources to predict disease-driver status.
  • Mark F. Rogers, Hashem A. Shihab, Matthew Mort, David N. Cooper, Tom R. Gaunt and Colin Campbell. FATHMM-XF: accurate prediction of pathogenic point mutations via extended features. Bioinformatics (2018) 34(3) p. 511-513. Using machine learning methods we propose a classifier for predicting if single point mutations in the human genome are disease-drivers or neutral: the method gives a confidence measure associated with each predicted class label. The FATHMM-XF server for GRCh37/hg19 is available here. This predictor uses more types of data than our earlier FATHMM-MKL predictor, with some methodology improvement in addition.
  • Michael Ferlaino, Mark F. Rogers, Hashem A. Shihab, Matthew Mort, David N. Cooper, Tom R. Gaunt and Colin Campbell. An integrative approach to predicting the functional effects of small indels in non-coding regions of the human genome. BMC Bioinformatics 18:442 (2017). Using machine learning methods we propose a classifier for predicting if small indels in the human genome are disease-drivers or neutral. The FATHMM-indel server is available here.
  • Su-Yi Loh, Thomas Jahans-Price, Michael Greenwood, Mingkwan Greenwood, See-Ziau Hoe, Agnieszka Konopacka, Colin Campbell, David Murphy, and Charles Hindmarch. Unsupervised network analysis of the plastic supraoptic nucleus transcriptome predicts Caprin-2 regulatory interactions. eNeuro 0243-17 (2017). We use a graphical lasso algorithm with microarray data to find hub nodes (genes) playing a major role in the regulation of hypertension (this investigation has lead to a 1.3 million BBSRC-funded follow-through project).
  • Fatma Alim et al. Seasonal adaptations of the hypothalamo-neurohypophyseal system of the dromedary camel . PLoS ONE 14(6): e0216679 (2019). A project connected with our collaboration with David Murphy, related to hypertension.
  • Hashem Shihab, Mark Rogers, Colin Campbell and Tom Gaunt. HIPred: an integrative approach to predicting haploinsufficient genes. Bioinformatics (2017) 33 (12): 1751-1757. We use machine learning methods to present a state-of-the-art predictor for haploinsufficient genes.
  • Hashem A. Shihab, Mark F. Rogers, Michael Ferlaino, Colin Campbell and Tom R. Gaunt. GTB - an online genome tolerance browser. BMC Bioinformatics 2017, 18:20, DOI: 10.1186/s12859-016-1436-4. The Genome Tolerance Browser enables visualisation of predicted tolerance of genomic regions to mutational variation. It includes 13 genome-wide prediction algorithms and conservation scores, 12 non-synonymous prediction algorithms and four cancer-specific algorithms.
  • Tom G Richardson, Nicholas J Timpson, Colin Campbell and Tom R Gaunt. A pathway-centric approach to rare variant association analysis. European Journal of Human Genetics (www.nature.com/ejhg), (2016), 1-7, doi:10.1038/ejhg.2016.113.
  • Richardson T.G., Campbell C., Timpson N.J. and Gaunt T.R. Incorporating Non-Coding Annotations into Rare Variant Analysis. PLoS ONE 11(4) (2016): e0154181.
  • Richardson T.G., Shihab H.A., Rivas M.A., McCarthy M.I., Campbell C., Timpson N.J. and Gaunt T.R. A Protein Domain and Family Based Approach to Rare Variant Association Analysis. PLoS ONE 11(4) (2016): e0153803.
  • Richardson T.G. et al. Collapsed Methylation Quantitative Trait Loci analysis for Low Frequency and Rare variants. Human Molecular Genetics (2016) doi: 10.1093/hmg/ddw283.
  • Lulu Jiang, Charles C. T. Hindmarch, Mark Rogers, Colin Campbell, Christy Waterfall, Jane Coghill, Peter W. Mathieson and Gavin I. Welsh. RNA sequencing analysis of human podocytes reveals glucocorticoid regulated gene networks targeting non-immune pathways. Scientific Reports (Nature) 6, article number: 35671 (2016) doi:10.1038/srep35671.
  • Hannah Scott, Mark F. Rogers, Helen L. Scott, Colin Campbell, Elizabeth C. Warburton and James B. Uney. Recognition memory-induced gene expression in the perirhinal cortex: A transcriptomic analysis. Behavioural Brain Research (2017) 328:1-12.
  • Carlos Fernandez-Lozano, Jose A. Seoane, Marcos Gestal, Tom R. Gaunt, Julain Dorado, Alejandro Pazos and Colin Campbell. Texture analysis in gel electrophoresis images using an integrative kernel-based approach. Scientific Reports (Nature), 6, Article number 19256 (2016).
  • C. Rivers, H. Scott, M. Rogers, Y. Lee, G. Toye, J. Idris, J. Gaunt, C. Hales, T. Curk, C. Campbell, J. Ule, M. Norman, J. B. Uney. iCLIP identifies novel neuronal roles for SAFB1 in regulating RNA processing and neuronal function. BMC Biology 13:111 (2015)

  • Hashem Shihab, Mark Rogers, Julian Gough, Matthew Mort, David Cooper, Ian Day, Tom Gaunt and Colin Campbell. An Integrative Approach to Predicting the Functional Effects of Non-Coding and Coding Sequence Variation Bioinformatics 31(10): 1536-1543 (2015). Supplementary information and website for the coding/non-coding predictor. This method uses integrative machine learning methods to predict if single nucleotide variants in the human genome are likely functional in disease. The predictor outputs a confidence label associated with the prediction and it gives predictions for sequence variants in both the coding and non-coding regions of the human genome. There is further information about this approach and various extensions of this project in the Available software submenu on the left.
  • Mark Rogers, Hashem Shihab, Michael Ferlaino, Tom Gaunt and Colin Campbell. Predicting the Pathogenic Impact of Sequence Variation in the Human Genome . Studies in Health Technology and Informatics (IOS Press) Vol. 235 p. 91-95 (2017), DOI 10.3233/978-1-61499-753-5-91. Book chapter summary of various current projects.

  • Mark Rogers, Hashem Shihab, Tom Gaunt, Matthew Mort, David Cooper, and Colin Campbell, Sequential Data Selection for Predicting the Pathogenic Effects of Sequence Variation, Proceedings, 2015 IEEE International Conference on Bioinformatics and Biomedicine (IEEE BIBM 2015, B394)

  • Jose Seoane, Colin Campbell, Ian Day, Juan Casas, Tom Gaunt. Canonical correlation analysis for gene-based pleiotropy discovery. PLOS Computational Biology DOI: 10.1371/journal.pcbi.1003876. Vol. 10, issue 10, e1003876 (2014).

  • M. Rogers, C. Campbell and Y. Ying. Probabilistic inference of biological networks via data integration BioMed Research International Article ID 707453 (2014).

  • Jose Seoane, Ian Day, Tom Gaunt and Colin Campbell. A pathway-based data integration framework for prediction of disease progression. Bioinformatics (2014) 30 (6): 838-845.

  • Colin Campbell. Machine Learning Methodology in Bioinformatics. Handbook of Bio- and Neuro-informatics, ed. Irwin King and Kaizhu Huang. Spinger-Verlag, 2012, pages 185-206.

  • Yiming Ying, Kaizhu Huang and Colin Campbell. Enhanced Protein Fold Recognition through a Novel Data Integration Approach. BMC Bioinformatics, 2009, 10:267.

    Download the pdf. Also available is a NIPS2009 Workshop Abstract pdf summarising the multi-kernel learning methods in this paper.

  • Yiming Ying, Colin Campbell, Theodoros Damoulas and Mark Girolami. Class Prediction from Disparate Biological Data Sources using an Iterative Multi-kernel Algorithm. Lecture Notes in Bioinformatics 5780 (2009) pp.427-438.

    Download the pdf

  • Phaedra Agius, Yiming Ying and Colin Campbell. Bayesian Unsupervised Learning with Multiple Data Types. Statistical Applications in Genetics and Molecular Biology: Volume 8, Issue 1, Article 27 (2009).

    Download the pdf

  • Yiming Ying, Peng Li and Colin Campbell. A marginalized variational Bayesian approach to the analysis of array data. BMC Proceedings, 2008, 2(Suppl 4):S7.

    Download the pdf

  • Theodoros Damoulas, Yiming Ying, Mark Girolami and Colin Campbell. Inferring Sparse Kernel Combinations and Relevance Vectors: An application to subcellular localization of proteins. Proceedings of the Seventh International Conference on Machine Learning and Applications (ICMLA'08), San Diego, California.

    Download the pdf

  • Colin S Cooper, Colin Campbell and Sameer Jhavar. Mechanisms of Disease: biomarkers and molecular targets from microarray gene expression studies in prostate cancer. Nature Reviews Urology. (2007) Volume 4, pages 677-687.

  • Peng Li, Yiming Ying and Colin Campbell. A Variational Approach to Semi-Supervised Clustering. Proceedings, ESANN2009, p. 11-16.

    Download the pdf. A fuller length report is available here.

  • Luke Carrivick, Simon Rogers, Jeremy Clark, Colin Campbell, Mark Girolami and Colin Cooper. Identification of Prognostic Signatures in Breast Cancer Microarray Data using Bayesian Techniques. Journal of the Royal Society: Interface Vol. 3 (2006) pages 367-381.

    Two new Bayesian unsupervised learning methods are applied to four microarray datasets for breast cancer. The analysis suggests a minimum 4 or 5 subtypes for sporadic breast cancer, each with quite distinct clinical outcomes. One subtype is purely indolent. The genes GRB7 and ERBB2 (HER2) only over-express in one subtype. The most aggressive subtype is the most distinct and associated with the basaloid or basal-like subtype of breast cancer: it is marked by a distinct reciprocity relation for the forkhead transcription factor genes: FOXA1 and FOXC1 (for more detail see our paper in Statistical Applications in Genetics and Molecular Biology above). The paper illustrates the important insights gained from using Bayesian methods in this context.

    Download the pdf or
    Journal pdf

  • Luke Carrivick and Colin Campbell. A Bayesian Approach to the Analysis of Microarray Datasets using Variational Inference. Technical Report TR-CI-2006 1st February, 2006.

    This Technical Report gives details of the variational Bayes approach to clustering used in subsequent papers. However, note that the alpha-update was not implemented in this TR. This TR gives some further detail of the distinctive genetic signature of the basaloid subtype of breast cancer (see above paper) and proposes the use of a normalised ratio of FOXC1 over FOXA1 as a biomarker for this subtype. The role of microRNA within this subtype is further discussed in our paper `Bayesian Unsupervised Learning with Multiple Data Types' above.

    Download the pdf

  • Luke Carrivick. Probabilistic Models in the Biomedical Sciences. PhD thesis (2006).

    Download the pdf (4.5MB)

  • Simon Rogers, Mark Girolami, Colin Campbell and Rainer Breitling. The Latent Process Decomposition of cDNA Microarray Datasets. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2005, Vol. 2, pages 143-156.

    Download the pdf

  • Zsofia Kote-Jarai, Lucy Matthews, Ana Osorio, Susan Shanley, Ian Giddings, Francois Moreews, Imogen Locke, D. Gareth Evans, Diana Eccles, Carrier Clinic Collaborators, Richard D. Williams, Mark Girolami, Colin Campbell and Ros Eeles. Accurate prediction of BRCA1 and BRCA2 heterozygous genotype using expression profiling after induced DNA damage, Clinical Cancer Research, 2006, Vol. 12(13), pages 3896-3901.

  • Sashi Kommu and Colin Campbell. The Impact of Bioinformatics in Uro-oncology, BJU International, 2006, Volume 98(2), pages 249-251 (Editorial Comment).

  • Richard D Williams, Sandra N. Hing, Braden T. Greer, Craig C., Whiteford, Jun S. Wei, Rachael Natrajan, Anna Kelsey, Simon Rogers, Colin Campbell, Kathy Pritchard-Jones and Javed Khan. Prognostic Classification of Relapsing Favourable Histology Wilms Tumour using cDNA Microarray Expression Profiling and Support Vector Machines. Genes, Chromosomes and Cancer, 2004, Volume 41, Issue 1, pages 65 - 79.

    Download the pdf

  • Simon Rogers, Richard D. Williams and Colin Campbell. Class Prediction with Microarray Datasets, in U. Seiffert, L.C. Jain and P. Schweizer (eds), Bioinformatics using Computational Intelligence Paradigms, Springer, 2005, pages p. 119-141.

    Download the pdf

  • Sandra Edwards, Colin Campbell, Penny Flohr, Janet Shipley, Ian Giddings, Robert te-Poele, Andrew Dodson, Christophe Foster, Jeremy Clark, Sameer Jhavar, Gyula Kovacs and Colin S Cooper. Expression analysis onto microarrays of randomly selected cDNA clones highlights HOXB13 as a marker of human prostate cancer . British Journal of Cancer, Vol. 92, 2005, pages 376-381.

  • Kote-Jarai Z, Williams RD, Cattini N, Copeland M, Giddings I, Wooster R, tePoele RH, Workman P, Gusterson B, Peacock J, Gui G, Campbell C, Eeles R. Gene expression profiling after radiation-induced DNA damage is strongly predictive of BRCA1 mutation carrier status. Clinical Cancer Research 10(2004) 958-63.

  • S. Rogers, M. Girolami and C. Campbell. A Latent Process Decomposition Model for Interpreting cDNA Microarray Datasets. "Currents in Computational Molecular Biology 2004", Eigth Annual International Conference on Research in Computational Molecular Biology (RECOMB 2004), San Diego.

  • Simon Rogers. Machine learning techniques for microarray analysis. PhD thesis (2004). Download the pdf (2.4MB)

  • Y.-J.Lu, D. Williamson, R. Wang, B. Summersgill, S. Rodriguez, S. Rogers, K. Pritchard-Jones, C. Campbell, J. Shipley. Expression profiling targeting chromosomes for tumor classification and prediction of clinical behavior Genes, Chromosomes and Cancer 2003, 38: 207-214.

    Download the pdf

  • J. Clark, S. Edwards, A. Feber, P. Flohr, M. John, I. Giddings, S. Crossland, M. R Stratton, R. Wooster, C. Campbell, C.S. Cooper. Genome-wide screening for complete genetic loss in prostate cancer by comparative hybridization onto cDNA microarrays. Oncogene (Nature Publishing Group) 2003, 22: 1247-1252.

  • J. Clark, S. Edwards, M. John, P. Flohr, T. Gordon, K. Maillard, I. Giddings, C. Brown, A. Bagherzadeh, C. Campbell, J.Shipley, R. Wooster, C. S. Cooper. Identification of amplified and expressed genes in breast cancer by comparative hybridization onto microarrays of randomly selected cDNA clones Genes, Chromosomes and Cancer 2002, 34:104-114.

  • S. Mukherjee, P. Tamayo, S. Rogers, R. Rifkin, A. Engle, C.Campbell, T. Golub and J. Mesirov, Estimating Dataset Size Requirements for Classifying DNA Microarray Data, Journal of Computational Biology, 2003, 10: 119-142.

  • Y. Li, C. Campbell and M. Tipping. Bayesian automatic relevance determination algorithms for classifying gene expression data. Bioinformatics 2002 18: 1332-1339.

    Outlines two Bayesian ARD algorithms for classifying gene expression data. The algorithms perform feature selection and build an accurate hypothesis using relatively few features. They are evaluated on three cancer datasets (colon cancer, ovarian cancer and leukemia).

    Download the pdf

  • Support Vector Machine Classification and Validation of Cancer Tissue Samples using Microarray Expression Data. T. Furey, N. Cristianini, N. Duffy, D. Bednarski, Michel Schummer and D. Haussler Bioinformatics, 2000, 16:906-914.
    Applies SVMs to classifying gene expression data for cancer.

  • Knowledge-based Analysis of Microarray Gene Expression Data using Support Vector Machines. M. Brown, W. Grundy, D. Lin, N. Cristianini C. Sugnet, T. Furey, M. Ares Jr., D. Haussler Proceedings of the National Academy of Sciences 2000, 97(1) p. 262-267.
    Application of SVMs to a gene expression dataset for the budding yeast S. Cerevisiae.

  • C. Campbell, An Introduction to Kernel Methods. Chapter 7 in Radial Basis Function Networks: Design and Applications. R.J. Howlett and L.C. Jain (eds), Physica Verlag, 2001.

  • The Application of Support Vector Machines to Medical Decision Support: A Case Study. K. Veropoulos, N. Cristianini and C. Campbell. In Proceedings of the ECCAI Advanced Course in Artificial Intelligence, Chania, Greece, 1999 (ACAI99) Workshop W10, p. 17-21.

  • The Automated Identification of Tubercle Bacilli in Sputum: A Preliminary Investigation. K.Veropoulos, G.Learmonth, C.Campbell, B.Knight, J.Simpson. Analytical and Quantitative Cytology and Histology, 21(4):277-281 (1999).

    Download the pdf.

  • The Automated Identification of Tubercle Bacilli using Image Processing and Neural Computing Techniques. K.Veropoulos, C.Campbell, G.Learmonth. ICANN '98: International Conference on Artificial Neural Networks, vol 2, Springer, 1998 p. 797-802.

  • Image Processing and Neural Computing used in the Diagnosis of Tuberculosis. K.Veropoulos, C.Campbell, G.Learmonth. IEE Control Division: Intelligent Methods in Healthcare and Medical Applications, Digest No:98/514, 1998 p. 8/1-8/4.