Evaluating firearm examiner testimony using large language models: a comparison of standard and knowledge-enhanced AI systems

Pompedda, Francesco; Santtila, Pekka; Di Maso, Eleanor; Nyman, Thomas J.; Sun, Yongjie; Zappalà, Angelo

Evaluating firearm examiner testimony using large language models: a comparison of standard and knowledge-enhanced AI systems

Lists

Tools

Pompedda, F., Santtila, P., Di Maso, E., Nyman, T. J. ORCID: https://orcid.org/0000-0002-6409-2528, Sun, Y. and Zappalà, A. (2025) Evaluating firearm examiner testimony using large language models: a comparison of standard and knowledge-enhanced AI systems. Journal of Psychology and AI. ISSN 2997-4100 (In Press)

[thumbnail of Pompedda et al.(2025).pdf]

Text - Accepted Version
· Restricted to Repository staff only
· The Copyright of this document has not been checked yet. This may affect its availability.
654kB

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

Abstract/Summary

Objective: This study evaluated the decision-making of Large Language Models (LLMs) in interpreting firearm examiner testimony by comparing a standard LLM to one enhanced with forensic science knowledge. Method: Following the experimental paradigm of Garrett et al. (2020), we assessed whether LLMs mirrored human decision patterns and if specialized knowledge led to more critical evaluations of forensic claims. We employed a 2 × 2 × 7 between-subjects design with three independent variables: LLM configuration (standard vs. knowledge-enhanced), cross-examination presence (yes vs. no), and conclusion language (seven variations). Each model condition performed 200 repetitions per scenario. This yielded a total of 5,600 measures of binary verdicts, guilt probability ratings, and credibility assessments. Results: LLMs showed low conviction rates (9.4%) across conditions, with logical variations as a function of the way in which the firearm expert’s conclusion was formulated. Cross-examination produced lower guilt assessments and scientific credibility ratings. Importantly, knowledge-enhanced LLMs demonstrated significantly more conservative evaluations of firearm evidence across all match conditions compared to standard LLMs. Conclusions: LLMs, particularly when enhanced with domain-specific knowledge, showed advantages in evaluating complex scientific evidence compared to human jurors in Garrett et al. (2020), suggesting potential applications for AI systems in supporting legal decision-making.

Item Type:	Article
Refereed:	Yes
Divisions:	Life Sciences > School of Psychology and Clinical Language Sciences > Department of Psychology
ID Code:	122786
Uncontrolled Keywords:	Firearm Examination Evidence, Large Language Models (LLMs), Knowledge-Enhanced Artificial Intelligence (AI) Systems, Legal Decision-Making, Expert Testimony Evaluation
Publisher:	Taylor & Francis

Deposit Details

University Staff: Request a correction | Centaur Editors: Update this record

University of Reading

CentAUR: Central Archive at the University of Reading

Accessibility navigation

Evaluating firearm examiner testimony using large language models: a comparison of standard and knowledge-enhanced AI systems

Abstract/Summary

Page navigation

See also

Footer navigation