Pompedda, F., Santtila, P., Di Maso, E., Nyman, T. J.
ORCID: https://orcid.org/0000-0002-6409-2528, Sun, Y. and Zappalà, A.
(2025)
Evaluating firearm examiner testimony using large language models: a comparison of standard and knowledge-enhanced AI systems.
Journal of Psychology and AI, 1.
2503343.
ISSN 2997-4100
Preview |
Text (Open Access)
- Published Version
· Available under License Creative Commons Attribution Non-commercial. · Please see our End User Agreement before downloading. 1MB |
|
Text
- Accepted Version
· Restricted to Repository staff only · The Copyright of this document has not been checked yet. This may affect its availability. 654kB |
It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.
To link to this item DOI: 10.1080/29974100.2025.2503343
Abstract/Summary
Objective: This study evaluated the decision-making of Large Language Models (LLMs) in interpreting firearm examiner testimony by comparing a standard LLM to one enhanced with forensic science knowledge. Method: Following the experimental paradigm of Garrett et al. (2020), we assessed whether LLMs mirrored human decision patterns and if specialized knowledge led to more critical evaluations of forensic claims. We employed a 2 × 2 × 7 between-subjects design with three independent variables: LLM configuration (standard vs. knowledge-enhanced), cross-examination presence (yes vs. no), and conclusion language (seven variations). Each model condition performed 200 repetitions per scenario. This yielded a total of 5,600 measures of binary verdicts, guilt probability ratings, and credibility assessments. Results: LLMs showed low conviction rates (9.4%) across conditions, with logical variations as a function of the way in which the firearm expert’s conclusion was formulated. Cross-examination produced lower guilt assessments and scientific credibility ratings. Importantly, knowledge-enhanced LLMs demonstrated significantly more conservative evaluations of firearm evidence across all match conditions compared to standard LLMs. Conclusions: LLMs, particularly when enhanced with domain-specific knowledge, showed advantages in evaluating complex scientific evidence compared to human jurors in Garrett et al. (2020), suggesting potential applications for AI systems in supporting legal decision-making.
| Item Type: | Article |
|---|---|
| Refereed: | Yes |
| Divisions: | Life Sciences > School of Psychology and Clinical Language Sciences > Department of Psychology |
| ID Code: | 122786 |
| Uncontrolled Keywords: | Firearm Examination Evidence, Large Language Models (LLMs), Knowledge-Enhanced Artificial Intelligence (AI) Systems, Legal Decision-Making, Expert Testimony Evaluation |
| Publisher: | Taylor & Francis |
Downloads
Downloads per month over past year
University Staff: Request a correction | Centaur Editors: Update this record
Tools
Tools