Bioinformatics
Mark F Rogers, Tom R Gaunt and Colin Campbell. Prediction of driver variants
in the cancer genome via machine learning methodologies. Briefings in Bioinformatics (OUP). Volume 122, pages: 1467–1476 (2020), bbaa250, https://doi.org/10.1093/bib/bbaa250.
This is an overview and outline of the approach used in the construction of cancer-specific disease-driver predictors, such as
CScape and
CScape-somatic, cited below.
Mark F Rogers, Tom R Gaunt and Colin Campbell.
CScape-somatic: distinguishing driver and passenger point mutations in the cancer genome. Bioinforamtics. Volume 36, Issue 12 ,
pages: 3637–3644 (2020). The CScape-somatic predictor is located here.
Bogdan Luca, Vincent Moulton, Christopher Ellis, Dylan R Edwards, Colin Campbell, Rosalin Cooper, Jeremy Clark,
Daniel Brewer and Colin Cooper. A Novel Stratification Framework for Predicting Outcome in Patients with
Prostate Cancer. British Journal of Cancer (Nature). Volume 122, pages: 1467–1476 (2020).
Based on the Latent Process Decomposition method proposed earlier (see below) a method for resolving
prostate cancer into aggressive versus indolent disease course is proposed.
Madeleine Darbyshire, Zachary du Toit, Mark F. Rogers, Tom Gaunt, and Colin Campbell.
Estimating the Frequency
of Single Point Driver Mutations across Common Solid Tumours. Scientific Reports
(Nature) 9, article number: 13452, (2019)
(main paper and
supplementary, some additional plots are
located here). Based on the use of our CScape
predictor, referenced below, we argue that the average number of coding single nucleotide variants
in the human cancer genome, driving disease, is very small in size, though very variable by cancer type.
Hypermutation is excluded from our study and the above claim has been argued by other authors.
To a certain extent these drivers are identifiable by the machine-learning-based tool proposed
(CScape). The paper also discusses point mutation drivers in non-coding regions of the cancer
genome, driver genes and the influence of stage on the driver count (coding single point mutations).
Mark Rogers, Hashem Shihab, Tom Gaunt, and Colin Campbell.
CScape: a
tool for predicting oncogenic single-point mutations in the cancer genome.
Scientific Reports (Nature) 7, article number: 11597 (2017) (main paper and
supplementary). This method uses integrative machine learning methods to propose a classifier for
predicting if a single point mutation in the cancer genome is a disease-driver or neutral, for mutations in both non-coding and coding regions
(predictions are based on reference GRCh37/hg19 (ENSEMBL release 87) of the human genome). Our CScape predictor is located here and uses a wide variety of data sources to predict disease-driver status.
Mark F. Rogers, Hashem A. Shihab, Matthew Mort, David N. Cooper, Tom R. Gaunt and Colin Campbell. FATHMM-XF: accurate prediction of pathogenic point mutations via extended features. Bioinformatics (2018) 34(3) p. 511-513. Using machine learning methods we propose a classifier for predicting if single point mutations in the human genome are disease-drivers or neutral: the method gives a confidence measure associated with each predicted class label. The FATHMM-XF server for GRCh37/hg19 is available here. This predictor uses more types of data than our earlier FATHMM-MKL predictor, with some methodology improvement in addition.
Michael Ferlaino, Mark F. Rogers, Hashem A. Shihab, Matthew Mort, David N. Cooper, Tom R. Gaunt and Colin Campbell. An integrative approach to predicting the functional effects of small indels in non-coding regions of the human genome. BMC Bioinformatics 18:442 (2017). Using machine learning methods we propose a classifier for predicting if small indels in the human genome are disease-drivers or neutral. The FATHMM-indel server is available here.
Su-Yi Loh, Thomas Jahans-Price, Michael Greenwood, Mingkwan Greenwood, See-Ziau Hoe, Agnieszka Konopacka, Colin Campbell, David Murphy, and Charles Hindmarch.
Unsupervised network analysis of the plastic supraoptic nucleus transcriptome predicts Caprin-2 regulatory interactions. eNeuro 0243-17 (2017).
We use a graphical lasso algorithm with microarray data to find hub nodes (genes) playing a major role in the regulation of hypertension (this investigation has lead to a 1.3 million BBSRC-funded follow-through project).
Fatma Alim et al.
Seasonal adaptations of the hypothalamo-neurohypophyseal system of the dromedary camel
. PLoS ONE 14(6): e0216679 (2019). A project connected with our collaboration with David Murphy, related to hypertension.
Hashem Shihab, Mark Rogers, Colin Campbell and Tom Gaunt.
HIPred: an integrative approach to predicting haploinsufficient genes. Bioinformatics (2017) 33 (12): 1751-1757. We use machine learning methods to present a state-of-the-art predictor for haploinsufficient genes.
Hashem A. Shihab, Mark F. Rogers, Michael Ferlaino, Colin Campbell and Tom R. Gaunt.
GTB - an online genome tolerance browser.
BMC Bioinformatics 2017, 18:20, DOI: 10.1186/s12859-016-1436-4. The
Genome Tolerance Browser enables visualisation
of predicted tolerance of genomic regions to mutational variation.
It includes 13 genome-wide prediction algorithms and conservation scores,
12 non-synonymous prediction algorithms and four cancer-specific algorithms.
Tom G Richardson, Nicholas J Timpson, Colin Campbell and Tom R Gaunt.
A pathway-centric approach to rare variant association analysis. European Journal of Human Genetics (www.nature.com/ejhg), (2016), 1-7, doi:10.1038/ejhg.2016.113.
Richardson T.G., Campbell C., Timpson N.J. and Gaunt T.R.
Incorporating Non-Coding Annotations into Rare Variant Analysis. PLoS ONE 11(4) (2016): e0154181.
Richardson T.G., Shihab H.A., Rivas M.A., McCarthy M.I., Campbell C., Timpson N.J. and Gaunt T.R.
A Protein Domain and Family Based Approach
to Rare Variant Association Analysis. PLoS ONE 11(4) (2016): e0153803.
Richardson T.G. et al.
Collapsed Methylation Quantitative Trait Loci analysis for Low Frequency and Rare variants. Human Molecular Genetics
(2016) doi: 10.1093/hmg/ddw283.
Lulu Jiang, Charles C. T. Hindmarch, Mark Rogers, Colin Campbell, Christy Waterfall, Jane Coghill, Peter W. Mathieson and Gavin I. Welsh.
RNA sequencing analysis of human podocytes reveals glucocorticoid regulated gene networks targeting non-immune pathways.
Scientific Reports (Nature) 6, article number: 35671 (2016)
doi:10.1038/srep35671.
Hannah Scott, Mark F. Rogers, Helen L. Scott, Colin Campbell, Elizabeth C. Warburton and James B. Uney.
Recognition memory-induced gene expression in the perirhinal cortex: A transcriptomic analysis.
Behavioural Brain Research (2017) 328:1-12.
Carlos Fernandez-Lozano, Jose A. Seoane, Marcos Gestal, Tom R. Gaunt, Julain Dorado, Alejandro Pazos and Colin Campbell.
Texture analysis in gel electrophoresis images using an integrative kernel-based approach. Scientific Reports (Nature), 6, Article number 19256 (2016).
C. Rivers, H. Scott, M. Rogers, Y. Lee, G. Toye, J. Idris, J. Gaunt, C. Hales, T. Curk, C. Campbell, J. Ule, M. Norman, J. B. Uney.
iCLIP identifies novel neuronal roles for SAFB1 in regulating RNA processing and neuronal function. BMC Biology 13:111 (2015)
Hashem Shihab, Mark Rogers, Julian Gough, Matthew Mort, David Cooper, Ian Day, Tom Gaunt and Colin Campbell.
An Integrative Approach to Predicting the Functional Effects of Non-Coding and Coding Sequence Variation
Bioinformatics 31(10): 1536-1543 (2015). Supplementary information and website for the coding/non-coding predictor. This method uses integrative machine learning methods to predict if single nucleotide variants in the human genome are likely functional in disease. The predictor outputs a confidence label
associated with the prediction and it gives predictions for sequence variants in both the
coding and non-coding regions of the human genome. There is further information about this approach and various extensions of this project in the Available software submenu on the left.
Mark Rogers, Hashem Shihab, Michael Ferlaino, Tom Gaunt and Colin Campbell. Predicting the Pathogenic Impact of Sequence Variation in the Human Genome
. Studies in Health Technology and Informatics
(IOS Press) Vol. 235 p. 91-95 (2017), DOI 10.3233/978-1-61499-753-5-91.
Book chapter summary of various current projects.
Mark Rogers, Hashem Shihab, Tom Gaunt, Matthew Mort, David Cooper, and Colin Campbell, Sequential Data Selection for Predicting the Pathogenic Effects of Sequence Variation, Proceedings, 2015 IEEE International Conference on Bioinformatics and Biomedicine (IEEE BIBM 2015, B394)
Jose Seoane, Colin Campbell, Ian Day, Juan Casas,
Tom Gaunt.
Canonical correlation analysis for gene-based pleiotropy discovery. PLOS Computational
Biology DOI: 10.1371/journal.pcbi.1003876. Vol. 10, issue 10, e1003876 (2014).
M. Rogers, C. Campbell and Y. Ying.
Probabilistic inference of biological networks via data integration
BioMed Research International Article ID 707453 (2014).
Jose Seoane, Ian Day, Tom Gaunt and Colin Campbell.
A pathway-based
data integration framework for prediction of disease progression. Bioinformatics (2014) 30 (6): 838-845.
Colin Campbell.
Machine Learning Methodology in Bioinformatics. Handbook of Bio- and Neuro-informatics, ed. Irwin King and Kaizhu Huang.
Spinger-Verlag, 2012, pages 185-206.
Yiming Ying, Kaizhu Huang and Colin Campbell.
Enhanced Protein Fold Recognition through a Novel Data
Integration Approach. BMC Bioinformatics, 2009, 10:267.
Download the
pdf.
Also available is a NIPS2009 Workshop Abstract pdf
summarising the multi-kernel learning methods in this paper.
Yiming Ying, Colin Campbell, Theodoros Damoulas and Mark Girolami.
Class Prediction from Disparate Biological Data
Sources using an Iterative Multi-kernel
Algorithm. Lecture Notes in Bioinformatics 5780 (2009) pp.427-438.
Download the pdf
Phaedra Agius, Yiming Ying and Colin Campbell.
Bayesian Unsupervised Learning with Multiple Data Types.
Statistical Applications in Genetics and Molecular Biology:
Volume 8, Issue 1, Article 27 (2009).
Download the pdf
Yiming Ying, Peng Li and Colin Campbell. A marginalized variational Bayesian
approach to the analysis of array data. BMC Proceedings, 2008, 2(Suppl 4):S7.
Download the pdf
Theodoros Damoulas, Yiming Ying, Mark Girolami and Colin Campbell.
Inferring Sparse Kernel Combinations and Relevance Vectors:
An application to subcellular localization of proteins.
Proceedings of the Seventh International Conference on
Machine Learning and Applications (ICMLA'08), San Diego, California.
Download the pdf
Colin S Cooper, Colin Campbell and Sameer Jhavar.
Mechanisms of Disease:
biomarkers and molecular targets from microarray gene expression studies in prostate cancer.
Nature Reviews Urology. (2007) Volume 4, pages 677-687.
Peng Li, Yiming Ying and Colin Campbell.
A Variational Approach to Semi-Supervised Clustering.
Proceedings, ESANN2009, p. 11-16.
Download the pdf.
A fuller length report is available here.
Luke Carrivick, Simon Rogers, Jeremy Clark, Colin Campbell,
Mark Girolami and Colin Cooper. Identification of Prognostic
Signatures in Breast Cancer Microarray Data using Bayesian
Techniques. Journal of the Royal Society: Interface
Vol. 3 (2006) pages 367-381.
Two new Bayesian unsupervised learning methods are
applied to four microarray datasets for breast cancer. The analysis
suggests a minimum 4 or 5 subtypes for sporadic breast cancer, each
with quite distinct clinical outcomes. One subtype is purely
indolent. The genes GRB7 and ERBB2 (HER2) only over-express in one
subtype. The most aggressive subtype is the most distinct and
associated with the basaloid or basal-like subtype of breast cancer:
it is marked by a distinct reciprocity relation for the forkhead transcription
factor genes: FOXA1 and FOXC1 (for more detail see our paper in
Statistical Applications in Genetics and Molecular
Biology above). The paper illustrates the important
insights gained from using Bayesian methods in this context.
Download the pdf
or
Journal pdf
Luke Carrivick and Colin Campbell. A Bayesian Approach to the Analysis of Microarray Datasets
using Variational Inference. Technical Report TR-CI-2006 1st February, 2006.
This Technical Report gives details of the variational Bayes approach to
clustering used in subsequent papers. However, note that the alpha-update was not
implemented in this TR. This TR gives some further detail of the
distinctive genetic signature of the basaloid subtype of breast cancer (see above paper) and proposes the
use of a normalised ratio of FOXC1 over FOXA1 as a biomarker for this subtype. The role of microRNA
within this subtype is further discussed in our paper `Bayesian Unsupervised Learning with Multiple
Data Types' above.
Download the pdf
Luke Carrivick. Probabilistic Models in the
Biomedical Sciences. PhD thesis (2006).
Download the pdf
(4.5MB)
Simon Rogers, Mark Girolami, Colin Campbell and Rainer Breitling.
The Latent Process Decomposition of cDNA Microarray
Datasets. IEEE/ACM Transactions on Computational Biology
and Bioinformatics, 2005, Vol. 2, pages 143-156.
Download the pdf
Zsofia Kote-Jarai, Lucy Matthews, Ana Osorio, Susan Shanley,
Ian Giddings, Francois Moreews, Imogen Locke, D. Gareth Evans,
Diana Eccles, Carrier Clinic Collaborators, Richard D. Williams,
Mark Girolami, Colin Campbell and Ros Eeles. Accurate
prediction of BRCA1 and BRCA2 heterozygous genotype using
expression profiling after induced DNA damage,
Clinical Cancer Research, 2006, Vol. 12(13), pages 3896-3901.
Sashi Kommu and Colin Campbell. The Impact of Bioinformatics in Uro-oncology, BJU
International, 2006, Volume 98(2), pages 249-251 (Editorial Comment).
Richard D Williams, Sandra N. Hing, Braden T. Greer, Craig C.,
Whiteford, Jun S. Wei, Rachael Natrajan, Anna Kelsey, Simon
Rogers, Colin Campbell, Kathy Pritchard-Jones and Javed Khan.
Prognostic Classification of Relapsing Favourable
Histology Wilms Tumour using cDNA Microarray Expression Profiling
and Support Vector Machines. Genes, Chromosomes and
Cancer, 2004, Volume 41, Issue 1, pages 65 - 79.
Download the pdf
Simon Rogers, Richard D. Williams and Colin Campbell. Class
Prediction with Microarray Datasets, in U. Seiffert,
L.C. Jain and P. Schweizer (eds), Bioinformatics using
Computational Intelligence Paradigms, Springer, 2005, pages p.
119-141.
Download the pdf
Sandra Edwards, Colin Campbell, Penny Flohr, Janet Shipley,
Ian Giddings, Robert te-Poele, Andrew Dodson, Christophe Foster,
Jeremy Clark, Sameer Jhavar, Gyula Kovacs and Colin S Cooper.
Expression analysis onto microarrays of randomly selected cDNA
clones highlights HOXB13 as a marker of human prostate cancer
. British Journal of Cancer, Vol. 92, 2005, pages
376-381.
Kote-Jarai Z, Williams RD, Cattini N, Copeland M, Giddings I,
Wooster R, tePoele RH, Workman P, Gusterson B, Peacock J, Gui G,
Campbell C, Eeles R.
Gene expression profiling after radiation-induced DNA damage is
strongly predictive of BRCA1 mutation carrier status.
Clinical Cancer Research 10(2004) 958-63.
S. Rogers, M. Girolami and C. Campbell. A Latent Process
Decomposition Model for Interpreting cDNA Microarray
Datasets. "Currents in Computational Molecular Biology 2004",
Eigth Annual International Conference on Research in
Computational Molecular Biology (RECOMB 2004), San Diego.
Simon Rogers.
Machine learning techniques for microarray
analysis. PhD thesis (2004). Download the
pdf
(2.4MB)
Y.-J.Lu, D. Williamson, R. Wang, B. Summersgill, S.
Rodriguez, S. Rogers, K. Pritchard-Jones, C. Campbell, J. Shipley.
Expression profiling targeting chromosomes for tumor classification
and prediction of clinical behavior Genes,
Chromosomes and Cancer 2003, 38: 207-214.
Download the pdf
J. Clark, S. Edwards, A. Feber, P. Flohr, M. John, I. Giddings,
S. Crossland, M. R Stratton, R. Wooster, C.
Campbell, C.S. Cooper. Genome-wide screening for complete genetic loss in prostate cancer by comparative hybridization onto cDNA microarrays. Oncogene (Nature Publishing Group) 2003, 22: 1247-1252.
J. Clark, S. Edwards, M. John, P. Flohr, T. Gordon, K. Maillard,
I. Giddings, C. Brown, A. Bagherzadeh, C.
Campbell, J.Shipley, R. Wooster, C. S. Cooper.
Identification of amplified and expressed genes in breast cancer by
comparative hybridization onto microarrays of randomly selected cDNA
clones Genes, Chromosomes and Cancer 2002,
34:104-114.
S. Mukherjee, P. Tamayo, S. Rogers, R. Rifkin, A. Engle,
C.Campbell, T. Golub and J. Mesirov,
Estimating Dataset Size Requirements for Classifying DNA Microarray
Data, Journal of Computational Biology, 2003, 10:
119-142.
Y. Li, C. Campbell and M. Tipping.
Bayesian automatic relevance determination algorithms for
classifying gene expression data. Bioinformatics 2002 18:
1332-1339.
Outlines two Bayesian ARD algorithms for classifying gene
expression data. The algorithms perform feature selection and
build an accurate hypothesis using relatively few
features. They are evaluated on three cancer datasets (colon cancer,
ovarian cancer and leukemia).
Download the pdf
Support Vector Machine Classification and Validation
of Cancer Tissue Samples using Microarray Expression Data. T. Furey,
N. Cristianini, N. Duffy, D. Bednarski, Michel Schummer and
D. Haussler Bioinformatics, 2000, 16:906-914.
Applies
SVMs to classifying gene expression data for cancer.
Knowledge-based Analysis of Microarray Gene Expression
Data using Support Vector Machines. M. Brown, W. Grundy, D.
Lin, N. Cristianini C. Sugnet, T. Furey, M. Ares Jr., D. Haussler
Proceedings of the National Academy of Sciences 2000,
97(1) p. 262-267.
Application of SVMs to a gene expression
dataset for the budding yeast S. Cerevisiae.
C. Campbell, An Introduction to Kernel Methods. Chapter 7 in
Radial Basis Function Networks: Design and Applications.
R.J. Howlett and L.C. Jain (eds), Physica Verlag, 2001.
The Application of Support Vector Machines
to Medical Decision Support: A Case Study. K. Veropoulos,
N. Cristianini and C. Campbell. In Proceedings of the
ECCAI Advanced Course in Artificial Intelligence, Chania,
Greece, 1999 (ACAI99) Workshop W10, p. 17-21.
The Automated Identification of Tubercle
Bacilli in Sputum: A Preliminary Investigation.
K.Veropoulos, G.Learmonth, C.Campbell, B.Knight, J.Simpson.
Analytical and Quantitative Cytology and
Histology, 21(4):277-281 (1999).
Download the
pdf.
The Automated Identification of Tubercle
Bacilli using Image Processing and Neural Computing
Techniques. K.Veropoulos, C.Campbell, G.Learmonth.
ICANN '98: International Conference on Artificial
Neural Networks, vol 2, Springer, 1998 p. 797-802.
Image Processing and Neural Computing used
in the Diagnosis of Tuberculosis. K.Veropoulos, C.Campbell,
G.Learmonth. IEE Control Division: Intelligent Methods
in Healthcare and Medical Applications, Digest No:98/514,
1998 p. 8/1-8/4.