Bioinformatics, with a case study of p53 binding sites in the human genome
Speaker: Daniel Barker (School of Biology)
Biology is increasingly dominated by computation, not merely as a servant (data storage and time-saving analyses), but fundamentally, at the conceptual as well as the practical level. Many discoveries are only possible due to appropriate computational-statistical analyses. Bioinformatics unapologetically revels in these opportunities for insight, building on over a century of work in biometry and genetics and over two centuries of evolutionary thought – but with the benefits of modern statistics and computational power. As a case study of bioinformatics research, I present our combined-evidence approach to predicting a class of functional regions within the human genome that have important implications for cancer (p53 binding sites). In a model fitted by Firth logistic regression, a combination of DNA sequence data with non-sequence (chromatin modification) data allows better predictions than sequence data alone. Our combined-evidence model, using publicly available data as input, also gives predictions of greater biological relevance than a laboratory method. Laboratory methods detect biochemical activity, but this is not always important from a functional or fitness point of view. Detecting the subset of biochemical activity that has biological consequences requires a more subtle approach, perhaps along the lines of our combined-evidence model. The relative merits of computational ‘predictions’ and laboratory ‘discoveries’ are discussed. A script implementing our combined-evidence model is available via: http://eggg.st-andrews.ac.uk/flrtfm.