Publications | Tom Lamb

2026

Tight PAC-Bayes Generalisation Guarantees for Large Language Model Safety Monitoring

Tom A Lamb, Philip Torr, and Tim G J Rudner

In ICML Workshop on Foundations of Generalization in Modern Machine Learning (FoGen), May 2026

Abs Bib HTML

We develop PAC-Bayes certification methods for LLM-based safety monitoring that produce non-vacuous generalisation guarantees even with limited safety alignment data. Our key finding is that highly compressed PEFT adaptations yield extremely short adaptation description lengths, enabling informative and often tight guarantees that certify both classification risk and predictive uncertainty. We introduce LoRA-GT, a quantisation technique that reduces adapter description length while maintaining performance, and show empirically that the compressibility of adaptations directly correlates with certification strength: compressed adaptations exhibit minimal degradation in performance while enabling substantially stronger certification, suggesting that simpler safety adaptations may naturally be more certifiable. We further show that functional distortion serves as a practical tool for identifying simple, certifiable safety models.
@inproceedings{lamb2026tight, title = {Tight {PAC}-{Bayes} Generalisation Guarantees for Large Language Model Safety Monitoring}, author = {Lamb, Tom A and Torr, Philip and Rudner, Tim G J}, booktitle = {ICML Workshop on Foundations of Generalization in Modern Machine Learning (FoGen)}, year = {2026}, month = may, }
Improving Semantic Uncertainty Quantification in Language Model Question-Answering via Token-Level Temperature Scaling

Tom A Lamb, Desi R Ivanova, Philip H S Torr, and Tim G J Rudner

In The 29th International Conference on Artificial Intelligence and Statistics, May 2026

Abs arXiv Bib HTML

Calibration and discrimination are both crucial aspects of semantic confidence in language models. Existing methods, particularly those using fixed-temperature heuristics, produce systematically miscalibrated and poorly discriminative semantic confidence distributions. We propose optimizing a single scalar temperature parameter as an effective solution. Our comprehensive testing shows this approach consistently improves semantic calibration, discrimination, and downstream entropy, yielding superior results compared to both heuristic baselines and more sophisticated token-level recalibration techniques when applied to question-answering tasks.
@inproceedings{lamb2026improving, title = {Improving Semantic Uncertainty Quantification in Language Model Question-Answering via Token-Level Temperature Scaling}, author = {Lamb, Tom A and Ivanova, Desi R and Torr, Philip H S and Rudner, Tim G J}, booktitle = {The 29th International Conference on Artificial Intelligence and Statistics}, year = {2026}, }

2025

Detecting LLM Hallucination Through Layer-wise Information Deficiency: Analysis of Ambiguous Prompts and Unanswerable Questions

Hazel Kim, Tom A Lamb, Adel Bibi, Philip Torr, and Yarin Gal

In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, May 2025

Abs arXiv Bib HTML

Language models often produce confident but inaccurate responses. We develop a test-time detection method examining information flow across model layers to identify hallucinations when models encounter ambiguous or insufficient inputs. Our key finding shows that hallucination manifests as usable information deficiencies in inter-layer transmissions. Rather than analyzing only final outputs, we track cross-layer dynamics to assess model reliability. The approach integrates with existing models without requiring retraining or architectural changes.
@inproceedings{kim2025detecting, title = {Detecting {LLM} Hallucination Through Layer-wise Information Deficiency: Analysis of Ambiguous Prompts and Unanswerable Questions}, author = {Kim, Hazel and Lamb, Tom A and Bibi, Adel and Torr, Philip and Gal, Yarin}, booktitle = {Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing}, year = {2025}, }
Can Large Language Models Express Uncertainty Like Human?

Linwei Tao, Yi-Fan Yeh, Bo Kai, Minjing Dong, Tao Huang, and 4 more authors

arXiv preprint arXiv:2509.24202, May 2025

Abs arXiv Bib HTML

Large language models increasingly appear in high-stakes contexts where overconfident outputs risk misleading users. We revisit linguistic confidence – having models convey doubt through hedging expressions like “probably” or “might” – as an alternative to traditional confidence estimation methods. We release the first diverse, large-scale dataset of hedging expressions with human-annotated confidence scores, develop a lightweight tool converting hedges into confidence scores, conduct the first systematic study of linguistic confidence across modern LLMs and QA benchmarks, and introduce a fine-tuning framework to enhance reliability. Our findings reveal most LLMs struggle with reliable linguistic confidence, though targeted prompting can achieve competitive calibration and discriminability.
@article{tao2025linguistic, title = {Can Large Language Models Express Uncertainty Like Human?}, author = {Tao, Linwei and Yeh, Yi-Fan and Kai, Bo and Dong, Minjing and Huang, Tao and Lamb, Tom A and Yu, Jialin and Torr, Philip H S and Xu, Chang}, journal = {arXiv preprint arXiv:2509.24202}, year = {2025}, }
Towards Label-Free Biological Reasoning Synthetic Dataset Creation via Uncertainty Filtering

Josefa Lia Stoisser, Lawrence Phillips, Aditya Misra, Tom A Lamb, Philip Torr, and 3 more authors

In NeurIPS Workshop on Efficient Reasoning, May 2025

Abs arXiv Bib HTML

We propose using a model’s internal confidence metrics as an alternative to expensive ground-truth labels when curating synthetic reasoning datasets. Rather than requiring manual annotation, we employ uncertainty-based filtering, which uses a model’s own confidence – quantified through established uncertainty metrics like self-consistency and predictive perplexity. When applied to biological perturbation prediction tasks, our approach demonstrates that filtered synthetic data yields superior performance compared to unfiltered alternatives, approaches ground-truth training effectiveness, and outperforms existing large reasoning model baselines.
@inproceedings{stoisser2025labelfree, title = {Towards Label-Free Biological Reasoning Synthetic Dataset Creation via Uncertainty Filtering}, author = {Stoisser, Josefa Lia and Phillips, Lawrence and Misra, Aditya and Lamb, Tom A and Torr, Philip and Martell, Marc Boubnovski and Fauqueur, Julien and M{\"a}rtens, Kaspar}, booktitle = {NeurIPS Workshop on Efficient Reasoning}, year = {2025}, }
Focus On This, Not That! Steering LLMs with Adaptive Feature Specification

Tom A Lamb, Adam Davies, Alasdair Paren, Philip Torr, and Francesco Pinto

In Forty-second International Conference on Machine Learning, May 2025

Abs arXiv Bib HTML

Despite the success of Instruction Tuning (IT) in training large language models (LLMs), such models often leverage spurious or biased features learnt from their training data and can become misaligned, leading to undesired behaviours. While existing techniques can steer model behaviour at inference-time, they are often post-hoc and do not embed steering as an intrinsic model feature. In this work, we introduce Focus Instruction Tuning (FIT), which trains LLMs to condition their responses by focusing on specific features whilst ignoring others, leading to different behaviours based on what features are specified. Across diverse benchmarks, we demonstrate that FIT: (i) successfully steers behaviour at inference time; (ii) increases robustness by amplifying core task signals and down-weighting spurious cues; (iii) mitigates social bias by suppressing demographic attributes; and (iv) generalises under distribution shifts and to previously unseen focus features. FIT therefore offers a lightweight, intrinsic mechanism for building more robust, fair, and easily controllable LLMs.
@inproceedings{lambfocus, title = {Focus On This, Not That! Steering {LLMs} with Adaptive Feature Specification}, author = {Lamb, Tom A and Davies, Adam and Paren, Alasdair and Torr, Philip and Pinto, Francesco}, booktitle = {Forty-second International Conference on Machine Learning}, year = {2025}, }
Semantic-Level Confidence Calibration of Language Models via Temperature Scaling

Tom A Lamb, Desi R Ivanova, Philip Torr, and Tim GJ Rudner

In ICLR Workshop: Quantify Uncertainty and Hallucination in Foundation Models: The Next Frontier in Reliable AI, May 2025

Abs Bib HTML

Calibration of language models is typically studied at the token level, with scalar temperature scaling serving as the primary approach for recalibrating models. Recent multi-sampling techniques allow us to elicit semantic uncertainty measures from language models. However, these techniques focus on summary statistics of the limited existing semantic confidence distributions rather than on how well-calibrated these distributions are, a crucial factor in ensuring that the resulting semantic likelihoods are both meaningful and reliable. In this paper, we investigate whether and how temperature scaling, which directly influences generative diversity and token-level calibration, affects semantic calibration. We address these question by investigating semantic-level calibration in both pre-trained and fine-tuned models. In particular, we introduce a framework for assessing semantic confidence that incorporates both existing and novel confidence measures, comparing them to a single-generation confidence measure. Furthermore, we investigate various temperature scaling methods and their effect on semantic calibration. Our experiments span both open-book and closed-book question answering datasets. Our empirical findings demonstrate that scalar temperature scaling, when appropriately applied, provides a simple, widely applicable, and effective method for improving semantic calibration in language models.
@inproceedings{lamb2025semantic, title = {Semantic-Level Confidence Calibration of Language Models via Temperature Scaling}, author = {Lamb, Tom A and Ivanova, Desi R and Torr, Philip and Rudner, Tim GJ}, booktitle = {ICLR Workshop: Quantify Uncertainty and Hallucination in Foundation Models: The Next Frontier in Reliable AI}, year = {2025}, }

2024

Universal In-Context Approximation by Prompting Fully Recurrent Models

Aleksandar Petrov, Tom Lamb, Alasdair Paren, Philip Torr, and Adel Bibi

Advances in Neural Information Processing Systems, May 2024

Abs Bib HTML

Zero-shot and in-context learning enable solving tasks without model fine-tuning, making them essential for developing generative model solutions. Therefore, it is crucial to understand whether a pretrained model can be prompted to approximate any function, i.e., whether it is a universal in-context approximator. While it was recently shown that transformer models do possess this property, these results rely on their attention mechanism. Hence, these findings do not apply to fully recurrent architectures like RNNs, LSTMs, and the increasingly popular SSMs. We demonstrate that RNNs, LSTMs, GRUs, Linear RNNs, and linear gated architectures such as Mamba and Hawk/Griffin can also serve be universal in-context approximators. To streamline our argument, we introduce a programming language called LSRL that compiles to these fully recurrent architectures. LSRL may be of independent interest for further studies of fully recurrent models, such as constructing interpretability benchmarks.
@article{petrov2024universal, title = {Universal In-Context Approximation by Prompting Fully Recurrent Models}, author = {Petrov, Aleksandar and Lamb, Tom and Paren, Alasdair and Torr, Philip and Bibi, Adel}, journal = {Advances in Neural Information Processing Systems}, volume = {37}, pages = {72061--72093}, year = {2024}, }
Hidden in Plain Sight: Evaluating Abstract Shape Recognition in Vision-Language Models

Arshia Hemmat, Adam Davies, Tom A Lamb, Jianhao Yuan, Philip Torr, and 2 more authors

In The Thirty-eighth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, May 2024

Abs Bib HTML

Despite the importance of shape perception in human vision, early neural image classifiers relied less on shape information for object recognition than other (often spurious) features. While recent research suggests that current large Vision-Language Models (VLMs) exhibit more reliance on shape, we find them to still be seriously limited in this regard. To quantify such limitations, we introduce IllusionBench, a dataset that challenges current cutting-edge VLMs to decipher shape information when the shape is represented by an arrangement of visual elements in a scene. Our extensive evaluations reveal that, while these shapes are easily detectable by human annotators, current VLMs struggle to recognize them, indicating important avenues for future work in developing more robust visual perception systems.
@inproceedings{hemmathidden, title = {Hidden in Plain Sight: Evaluating Abstract Shape Recognition in Vision-Language Models}, author = {Hemmat, Arshia and Davies, Adam and Lamb, Tom A and Yuan, Jianhao and Torr, Philip and Khakzar, Ashkan and Pinto, Francesco}, booktitle = {The Thirty-eighth Conference on Neural Information Processing Systems Datasets and Benchmarks Track}, year = {2024}, }

2023

Faithful Knowledge Distillation

Tom A Lamb, Rudy Brunel, Krishnamurthy DJ Dvijotham, M Pawan Kumar, Philip H S Torr, and 1 more author

arXiv preprint arXiv:2306.04431, May 2023

Abs arXiv Bib HTML

Knowledge distillation (KD) has received much attention due to its success in compressing networks to allow for their deployment in resource-constrained systems. While the problem of adversarial robustness has been studied before in the KD setting, previous works overlook what we term the relative calibration of the student network with respect to its teacher in terms of soft confidences. In particular, we focus on two crucial questions with regard to a teacher-student pair: (i) do the teacher and student disagree at points close to correctly classified dataset examples, and (ii) is the distilled student as confident as the teacher around dataset examples? These are critical questions when considering the deployment of a smaller student network trained from a robust teacher within a safety-critical setting. To address these questions, we introduce a faithful imitation framework to discuss the relative calibration of confidences and provide empirical and certified methods to evaluate the relative calibration of a student w.r.t. its teacher. Further, to verifiably align the relative calibration incentives of the student to those of its teacher, we introduce faithful distillation. Our experiments on the MNIST, Fashion-MNIST and CIFAR-10 datasets demonstrate the need for such an analysis and the advantages of the increased verifiability of faithful distillation over alternative adversarial distillation methods.
@article{lamb2023faithful, title = {Faithful Knowledge Distillation}, author = {Lamb, Tom A and Brunel, Rudy and Dvijotham, Krishnamurthy DJ and Kumar, M Pawan and Torr, Philip H S and Eiras, Francisco}, journal = {arXiv preprint arXiv:2306.04431}, year = {2023}, }