Anthony Hughes

Anthony Hughes

PhD Student, Computer Science
ajhughes3 at sheffield.ac.uk

Hello, I'm Anthony! 👋

I'm a Ph.D. student at the University of Sheffield, supported by a UKRI scholarship and supervised by Ning Ma and Nikos Aletras. I'm currently interning at the Vector Institute with Gautam Kamath and Jacob Imola, on unlearning in ERMs, LLMs, and related topics.

Research: I work on the security and privacy of machine learning, focused on the relationship between a model and its training data. That spans what models leak and memorise (privacy), what adversaries can inject into them (backdoors and data poisoning), and what can be provably removed after training (unlearning).

Recent work: My EMNLP 2025 paper examines information leakage in abstractive summarisation; my EACL 2026 paper uses activation patching and circuit discovery to understand and mitigate PII leakage in language models; and most recently I've been studying membership inference attacks against safety classifiers. As a SPAR AI safety fellow I worked with Andrew Draganov on backdoor detection in LLMs, covering both black- and white-box techniques. Separately, I worked on building better activation probes and on whether LLMs can evade them.

Recent Publications

Detecting Whether an LLM Has Been Poisoned
A Hughes, N Xing, A Kim, C Francel, A Draganov
ICML Mechanistic Interpretability Workshop 2026
Boundary-targeted Membership Inference Attacks on Safety Classifiers
A Hughes, A Goldberg, P Jha, A Perer, N Aletras, N Mireshghallah
Under Review @ NeurIPS 2026
PATCH: Mitigating PII Leakage in Language Models with Privacy-Aware Targeted Circuit PatcHing
A Hughes, V Duddu, N Asokan, N Aletras, N Ma
EACL 2026 - Findings
How Private are Language Models in Abstractive Summarisation?
A Hughes, N Aletras, N Ma
EMNLP 2025 - Main Proceedings

Selected Publications in Health/Medicine

Large Language Models to Improve Understanding of Radiology Reports
S Alabed et al. including A Hughes.
The Lancet Digital Health
Identifying and Aligning Medical Claims Made on Social Media with Medical Evidence
A Hughes, Xingyi Song
LREC-COLING 2024
Understanding Inflicted Injuries in Young Children: Toward an Ontology-based Approach
F Maikore et al. including A Hughes
EKAW 2024

Research Experience

Visiting Graduate Researcher
Cryptography, Security and Privacy Group (CrySP), University of Waterloo
July 2025 - September 2025

Collaborators/Mentors: Vasisht Duddu, N. Asokan. Research on mechanistic approaches to understanding PII leakage in language models.

Selected Awards

UKRI PhD Scholarship
University of Sheffield, 2023-2027
NIHR/BRC Sheffield Funding
April 2025 - September 2025
Best Poster
Insigneo Showcase, July 2025