Accessibility navigation

Crowdsourcing in health evidence synthesis: the distribution of small parts of the problem

Noel-Storr, A. (2022) Crowdsourcing in health evidence synthesis: the distribution of small parts of the problem. PhD thesis, University of Reading

Text (Redacted) - Thesis
· Please see our End User Agreement before downloading.

[img] Text - Thesis
· Restricted to Repository staff only

Text - Thesis Deposit Form
· Please see our End User Agreement before downloading.


It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

To link to this item DOI: 10.48683/1926.00112412


Scientific output doubles every nine years. This rising torrent of information has placed the evidence synthesis process under increasing strain, contributing to lengthy production times and impacting the translation of health research into practice and policy. The process of evidence synthesis is extremely resource intensive, often taking small research teams years to complete. Updating reviews as new evidence becomes available, has also proved challenging with many remaining static publications, reporting outdated or even inaccurate information. A critical stage in the evidence synthesis process is the identification of evidence for inclusion. The advent of bibliographic databases such as PubMed and Embase marked a step-change in information retrieval practices. However, a myriad of problems including poor reporting of primary research, inconsistent indexing, and lack of standardised record formatting, compounded to produce a significant specificity problem in information retrieval for health evidence syntheses. In short, the process is inefficient and wasteful. Using crowdsourcing for the study identification stages of review production may help to remove this bottleneck. Crowdsourcing is the engagement of a large group of people, usually via the internet, in a problem-solving or idea-generating activity. It can take a range of forms depending on the nature of the problem and the required output. One such crowd model is the crowdsourcing of human computation, or micro, tasks. This involves the manual classification of large data sets that have been broken down into smaller (micro) units and distributed via an open call to willing contributors. The importance of being systematic, and the very rule-driven processes involved in producing robust health evidence, lends itself well to the breaking down of larger tasks to a micro format, and distributing them to anyone with an interest in health and an internet connection. This applied research aimed to develop, evaluate, and deploy a hybridised model of contribution using crowdsourcing and machine learning within the context of health evidence production. My specific objectives were to investigate the conditions under which each modality (crowd or machine) performed optimally, with a focus on outcome measures related to data quality, efficiency, engagement and capacity. The first three papers (Chapters 2, 3 and 4) form a collection that focus on the identification of reports of randomised trials. Paper 1 looks at the development and evaluation of crowdsourcing this task; Paper 2, at developing and evaluating machine learning capability; and Paper 3 at the performance of a hybrid workflow that uses both components. Papers 4 and 5 are feasibility studies looking at crowd performance when tasked with a different, potentially more 7 challenging, question and dataset. Systematic reviews are becoming increasingly complex, and evidence based on randomised trials is often not applicable or appropriate. Papers 6, 7 and 8 are set within a COVID-19 context. Paper 7 evaluates a crowd tasked with identifying studies across a range of review question types and under tight time constraints; Paper 8, adopting a similar methodology developed in Paper 2, describes the development and evaluation of a machine learning classifier designed to identify COVID-19 related primary research. Taken together, this body of work has furthered our understanding of the role crowdsourcing and machine learning can play in the production of health evidence. Specifically, it has contributed new knowledge on the types of tasks suitable as well as methods related to aggregating crowd contributions to achieve high quality data output. In practical terms, crowdsourcing is now implemented into Cochrane review production processes both within the current information retrieval paradigm, in terms of assessing sets of search results retrieved for individual reviews, but also in terms of helping to produce and maintain highly curated repositories of studies as part of Cochrane’s Evidence Pipeline. This collection can be leveraged by researchers, academics and practitioners to enable the successful application of such a model across multiple domain areas grappling with information overload.

Item Type:Thesis (PhD)
Thesis Supervisor:Li, W. (V.)
Thesis/Report Department:Henley Business School
Identification Number/DOI:
Divisions:Henley Business School
ID Code:112412


Downloads per month over past year

University Staff: Request a correction | Centaur Editors: Update this record

Page navigation