Anthony Hughes

Hello, I'm Anthony! 👋

I'm a Ph.D. student at the University of Sheffield, supported by a UKRI scholarship and supervised by Ning Ma and Nikos Aletras. I'm currently interning at the Vector Institute with Gautam Kamath and Jacob Imola, on unlearning in ERMs, LLMs, and related topics.

Research: I work on the security and privacy of machine learning, focused on the relationship between a model and its training data. That spans what models leak and memorise (privacy), what adversaries can inject into them (backdoors and data poisoning), and what can be provably removed after training (unlearning).

Recent work: My EMNLP 2025 paper examines information leakage in abstractive summarisation; my EACL 2026 paper uses activation patching and circuit discovery to understand and mitigate PII leakage in language models; and most recently I've been studying membership inference attacks against safety classifiers. As a SPAR AI safety fellow I worked with Andrew Draganov on backdoor detection in LLMs, covering both black- and white-box techniques. Separately, I worked on building better activation probes and on whether LLMs can evade them.

Recent Publications

Detecting Whether an LLM Has Been Poisoned

A Hughes, N Xing, A Kim, C Francel, A Draganov

ICML Mechanistic Interpretability Workshop 2026

Check out our poster here.

Boundary-targeted Membership Inference Attacks on Safety Classifiers

A Hughes, A Goldberg, P Jha, A Perer, N Aletras, N Mireshghallah

Under Review @ NeurIPS 2026

https://arxiv.org/abs/2605.22373

PATCH: Mitigating PII Leakage in Language Models with Privacy-Aware Targeted Circuit PatcHing

A Hughes, V Duddu, N Asokan, N Aletras, N Ma

EACL 2026 - Findings

https://arxiv.org/abs/2510.07452v1

How Private are Language Models in Abstractive Summarisation?

A Hughes, N Aletras, N Ma

EMNLP 2025 - Main Proceedings

https://arxiv.org/abs/2412.12040

Selected Publications in Health/Medicine

Large Language Models to Improve Understanding of Radiology Reports

S Alabed et al. including A Hughes.

The Lancet Digital Health

https://authors.elsevier.com/sd/article/S2589-7500(25)00142-6

Identifying and Aligning Medical Claims Made on Social Media with Medical Evidence

A Hughes, Xingyi Song

LREC-COLING 2024

https://aclanthology.org/2024.lrec-main.753/

Understanding Inflicted Injuries in Young Children: Toward an Ontology-based Approach

F Maikore et al. including A Hughes

EKAW 2024

https://link.springer.com/chapter/10.1007/978-3-031-77792-9_16

Research Experience

Visiting Graduate Researcher

Cryptography, Security and Privacy Group (CrySP), University of Waterloo

July 2025 - September 2025

Collaborators/Mentors: Vasisht Duddu, N. Asokan. Research on mechanistic approaches to understanding PII leakage in language models.

Industry Experience

Lead Software Engineer / NLP Engineer

Data Language

January 2014 - Current

Led development of NLP-based products including automated text classification SaaS, data visualization tools, and integration of language models into data platforms. Built production systems serving FTSE clients.

Software Engineer

Ontoba

June 2013 - June 2014

Software Engineer working on services solving data silo issues with linked data.

Software Engineer (Internship)

Press Association

July 2011 - September 2012

Worked on a new digital platform centered around semantic web technologies.

Education

PhD Computer Science

University of Sheffield, 2024 - Current

PgDip Speech and Language Technologies

University of Sheffield, 2023 - 2024

MSc Computational Linguistics (Distinction)

University of Wolverhampton, 2021 - 2023

BSc Computer Science (1st Class, Hons)

Nottingham Trent University, 2009 - 2013

Selected Awards

UKRI PhD Scholarship

University of Sheffield, 2023-2027

NIHR/BRC Sheffield Funding

April 2025 - September 2025

Best Poster

Insigneo Showcase, July 2025