Accessibility navigation


Association analysis of genomic sequences

Alshammari, A. O. (2022) Association analysis of genomic sequences. PhD thesis, University of Reading

[img] Text - Thesis
· Restricted to Repository staff only until 25 May 2024.

1MB
[img] Text - Thesis Deposit Form
· Restricted to Repository staff only

2MB

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

To link to this item DOI: 10.48683/1926.00113855

Abstract/Summary

Studying genetic variations can help improve understanding of cancer aetiology and provide scientists with inspirational perspectives of tumour cells growth. Somatic mutations play a significant role in the development of cancer. Therefore, substantial effort has been expanded in order to identify somatic mutations. In light of this, in this research, we develop a novel method for detecting the impact of somatic mutations by matching tumour and normal sequences taken from an individual based on the score test and implementing the generalised higher criticism (GHC) test correction. The proposed score test is appraised and compared to the binomial exact test by utilising simulations. Results of a wide range of simulations show that our method controls type I error and is more effective than the binomial exact test. Another way we propose with regard to association analysis of somatic mutations is to account for the uncertainty of discovering mutations. Since standard association methods do not take into account possible calling errors for somatic mutations, they are limited in their suitability for investigating functional consequences of such mutations. A recent somatic mutation association test with measurement errors (SAME) that addresses this issue via the likelihood ratio test has shown that taking account of uncertainty in somatic mutation calling improves power for detecting an association. In the spirit of SAME, the proposed score test procedure in this thesis models actual somatic mutation as an unobservable variable and uses read-depth to increase the mutation calls. The score test is computationally efficient as only optimisation under the null model is required for each genetic variant. Additionally, the risk of non-convergence of optimisation routines is reduced. These computational advantages are particularly beneficial in genomewide settings. The score test is evaluated using simulations. Results of extensive evaluations and comparisons with the SAME procedure and GLM that does not consider mutation calling errors reveal that our proposed approach preserves type I error and is more efficient than the SAME and GLM methods.

Item Type:Thesis (PhD)
Thesis Supervisor:Baksh, F.
Thesis/Report Department:School of Mathematical, Physical and Computational Sciences
Identification Number/DOI:https://doi.org/10.48683/1926.00113855
Divisions:Science > School of Mathematical, Physical and Computational Sciences > Department of Mathematics and Statistics
ID Code:113855

University Staff: Request a correction | Centaur Editors: Update this record

Page navigation