Error, reproducibility and sensitivity: a pipeline for data processing of Agilent oligonucleotide expression arrays

Chain, Benjamin; Bowen, Helen; Hammond, John; Posch, Wilfried; Rasaiyaah, Jane; Tsang, Jhen; Noursadeghi, Mahdad

Download

Full text not archived in this repository.

Advice

Please see our End User Agreement.

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

Tools

Lists

Chain, B., Bowen, H., Hammond, J. ORCID: https://orcid.org/0000-0002-6241-3551, Posch, W., Rasaiyaah, J., Tsang, J. and Noursadeghi, M. (2010) Error, reproducibility and sensitivity: a pipeline for data processing of Agilent oligonucleotide expression arrays. BMC Bioinformatics, 11. 344. ISSN 1471-2105 doi: 10.1186/1471-2105-11-344

Abstract/Summary

Background: Expression microarrays are increasingly used to obtain large scale transcriptomic information on a wide range of biological samples. Nevertheless, there is still much debate on the best ways to process data, to design experiments and analyse the output. Furthermore, many of the more sophisticated mathematical approaches to data analysis in the literature remain inaccessible to much of the biological research community. In this study we examine ways of extracting and analysing a large data set obtained using the Agilent long oligonucleotide transcriptomics platform, applied to a set of human macrophage and dendritic cell samples. Results: We describe and validate a series of data extraction, transformation and normalisation steps which are implemented via a new R function. Analysis of replicate normalised reference data demonstrate that intrarray variability is small (only around 2 of the mean log signal), while interarray variability from replicate array measurements has a standard deviation (SD) of around 0.5 log(2) units (6 of mean). The common practise of working with ratios of Cy5/Cy3 signal offers little further improvement in terms of reducing error. Comparison to expression data obtained using Arabidopsis samples demonstrates that the large number of genes in each sample showing a low level of transcription reflect the real complexity of the cellular transcriptome. Multidimensional scaling is used to show that the processed data identifies an underlying structure which reflect some of the key biological variables which define the data set. This structure is robust, allowing reliable comparison of samples collected over a number of years and collected by a variety of operators. Conclusions: This study outlines a robust and easily implemented pipeline for extracting, transforming normalising and visualising transcriptomic array data from Agilent expression platform. The analysis is used to obtain quantitative estimates of the SD arising from experimental (non biological) intra- and interarray variability, and for a lower threshold for determining whether an individual gene is expressed. The study provides a reliable basis for further more extensive studies of the systems biology of eukaryotic cells.

Altmetric Badge

Dimensions Badge

Item Type	Article
URI	https://centaur.reading.ac.uk/id/eprint/33849
Identification Number/DOI	10.1186/1471-2105-11-344
Refereed	Yes
Divisions	Life Sciences > School of Biological Sciences > Department of Bio-Engineering No Reading authors. Back catalogue items Interdisciplinary centres and themes > Centre for Food Security Life Sciences > School of Agriculture, Policy and Development > Department of Crop Science
Uncontrolled Keywords	microarray; Agilent; spotted gene expression; transcriptomics analysis; normalisation; oligonucleotide data
Publisher	BioMed Central
Download/View statistics	View download statistics for this item

Deposit Details

CORE (COnnecting REpositories)

University Staff: Request a correction | Centaur Editors: Update this record

Date Deposited:	07 Oct 2013 08:08	Date item deposited into CentAUR
Last Modified:	07 Feb 2026 02:36	Date item last modified